源码解析RocketMQ优化（二）——文件预热_java

系列文章：

源码解析RocketMQ优化（一）——内存预映射机制
源码解析RocketMQ优化（二）——文件预热

参考：
rocketMQ零拷贝+kafka零拷贝+netty零拷贝分析 - 知乎 (zhihu.com)
RoecketMQ存储–映射文件预热【源码笔记】 - 云+社区 - 腾讯云 (tencent.com)
RoecketMQ存储–映射文件预热【源码笔记】 - 云+社区 - 腾讯云 (tencent.com)

文章目录

源码分析
一、mlock()内存锁定有什么作用？
二、为什么 MappedByteBuffer 每隔 4KB 写入一个 0 byte？

RocketMQ 使用文件预热优化后，在进行内存映射后，会预先写入数据到文件中，并且将文件内容加载到 page cache，当消息写入或者读取的时候，可以直接命中 page cache，避免多次缺页中断。

源码分析

之前内存映射机制中最后一个函数org/apache/rocketmq/store/AllocateMappedFileService#mmapOperation中在新建mappedFile后进行了文件预热：

// pre write mappedFile
if (mappedFile.getFileSize() >= this.messageStore.getMessageStoreConfig()
    .getMappedFileSizeCommitLog()
    &&
    this.messageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
    mappedFile.warmMappedFile(this.messageStore.getMessageStoreConfig().getFlushDiskType(),
                              this.messageStore.getMessageStoreConfig().getFlushLeastPagesWhenWarmMapedFile());
}

@ImportantField
private FlushDiskType flushDiskType = FlushDiskType.ASYNC_FLUSH;

// Flush page size when the disk in warming state
private int flushLeastPagesWhenWarmMapedFile = 1024 / 4 * 16;

默认：

刷盘策略为异步刷盘
缓存中未刷盘的页数超过4096页时执行一次刷盘

其中调用warmMappedFile：

org/apache/rocketmq/store/MappedFile#warmMappedFile

public void warmMappedFile(FlushDiskType type, int pages) {
    long beginTime = System.currentTimeMillis();
    // 1. 创建一个新的字节缓冲区
    // 新缓冲区的内容将从该缓冲区的当前位置开始。对该缓冲区内容的更改将在新缓冲区中可见，
    ByteBuffer byteBuffer = this.mappedByteBuffer.slice();
    int flush = 0;
    long time = System.currentTimeMillis();
    // OS_PAGE_SIZE为4KB
    for (int i = 0, j = 0; i < this.fileSize; i += MappedFile.OS_PAGE_SIZE, j++) {
        // 2. MappedByteBuffer 每隔 4KB 就写入一个 0 byte
        byteBuffer.put(i, (byte) 0);
        
        // 3. 如果为同步刷盘策略，则执行强制刷盘
        // 缓存中未刷盘的页数超过4096页时执行一次刷盘
        // 4096 * 4KB = 16MB, 也就是未落盘数据超过16MB就执行一次刷盘
        // force flush when flush disk type is sync
        if (type == FlushDiskType.SYNC_FLUSH) {
            if ((i / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE) >= pages) {
                flush = i;
                mappedByteBuffer.force();
            }
        }

        // 4. 每写入1000个字节时就执行Thread.sleep(0)
        // 让线程放弃CPU，防止时间未用完的时候还占用CPU不让优先级低的线程使用CPU
        // prevent gc
        if (j % 1000 == 0) {
            log.info("j={}, costTime={}", j, System.currentTimeMillis() - time);
            time = System.currentTimeMillis();
            try {
                Thread.sleep(0);
            } catch (InterruptedException e) {
                log.error("Interrupted", e);
            }
        }
    }
	
    // 5. 如果为同步刷盘策略，则将还未落盘的数据落盘
    // force flush when prepare load finished
    if (type == FlushDiskType.SYNC_FLUSH) {
        log.info("mapped file warm-up done, force to disk, mappedFile={}, costTime={}",
            this.getFileName(), System.currentTimeMillis() - beginTime);
        mappedByteBuffer.force();
    }
    log.info("mapped file warm-up done. mappedFile={}, costTime={}", this.getFileName(),
        System.currentTimeMillis() - beginTime);
	
    // 6. 内存锁定
    this.mlock();
}

这里有几个疑问：

为什么 MappedByteBuffer 每隔 4KB 写入一个 0 byte？
mlock()内存锁定有什么作用？

一、mlock()内存锁定有什么作用？

mlock()内存锁定可以将进程使用的部分或全部的地址空间锁定在物理内存中，防止其被交换到swap空间。

对于RocketMQ这种的高吞吐量的分布式消息队列来说，追求的是消息读写低延迟，那么肯定希望要使用的数据在物理内存不被交换到swap空间，这样能提高数据读写访问的 *** 作效率。

public void mlock() {
    final long beginTime = System.currentTimeMillis();
    // 1. 获取虚拟内存地址
    final long address = ((DirectBuffer) (this.mappedByteBuffer)).address();
    Pointer pointer = new Pointer(address);
    {
        // 2. 内存锁定
        int ret = LibC.INSTANCE.mlock(pointer, new NativeLong(this.fileSize));
        log.info("mlock {} {} {} ret = {} time consuming = {}", address, this.fileName, this.fileSize, ret, System.currentTimeMillis() - beginTime);
    }

    {
        // 3. 向内核提出关于使用内存的建议，建议使用MADV_WILLNEED模式
        int ret = LibC.INSTANCE.madvise(pointer, new NativeLong(this.fileSize), LibC.MADV_WILLNEED);
        log.info("madvise {} {} {} ret = {} time consuming = {}", address, this.fileName, this.fileSize, ret, System.currentTimeMillis() - beginTime);
    }
}

LibC.INSTANCE.mlock：将锁住指定的内存区域避免被 *** 作系统调到swap空间中。
LibC.INSTANCE.madvise：向内核提供一个针对于地址区间的I/O的建议，内核可能会采纳这个建议，会做一些预读的 *** 作。例如MADV_WILLNEED表示预计不久将会被访问，建议OS做一次内存映射后对应的文件数据尽可能多的预加载至内存中，这样可以减少了缺页异常的产生。从而达到内存预热的效果。

二、为什么 MappedByteBuffer 每隔 4KB 写入一个 0 byte？

调用Mmap进行内存映射后，OS只是建立虚拟内存地址至物理地址的映射关系，实际上并不会加载任何MappedFile数据至内存中。

而如果不加载任何MappedFile数据至内存中的话，程序要访问数据时OS会检查该部分的分页是否已经在内存中，如果不在，则发出一次缺页中断。这样的话，1GB的CommitLog需要发生26w多次缺页中断，才能使得对应的数据才能完全加载至物理内存中（X86的Linux中一个标准页面大小是4KB）。

所以有必要对每个内存页面中写入一个假的值（byte 0）。在上面的warmMappedFile源码中可以看到MappedByteBuffer 每隔 4KB 就写入一个 0 byte，而4KB刚好是一个页的大小，这样就刚好把一个MappedFile文件数据调入内存中，也就是进行文件预热了。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/799477.html