The mmapbench from viktorleis

Atomically read `/proc`

Reading from the /proc/ pseudo-filesystem using a buffered reader such as std::fstream and/or partially reading the file causes race conditions. The kernel may alter the /proc file after only parts have been read. This results in incorrect readings of the respective values (e.g. TLB shootdowns).

To my understanding, the only sound approach is to atomically read the entire /proc file once, then parse it for the needed counter(s).

According to the UnixWare 7 documentation:

Although process state and consequently the contents of /proc files can change from instant to instant, a single read(2) of a /proc file is guaranteed to return a ``sane'' representation of state, that is, the read will be an atomic snapshot of the state of the process. No such guarantee applies to successive reads applied to a /proc file for a running process.
I assume this applies to Linux, too.

The following code snippet roughly demonstrates how this can be done:

int fd_proc_interrupts;

void open_interrupts()
{
    fd_proc_interrupts = open("/proc/interrupts", O_RDONLY|O_DIRECT); // avoid kernel-side buffering with O_DIRECT, but I am uncertain whether this has an effect on pseudo-fs...
    if (fd_proc_interrupts == -1) { // some form of error handling
        const auto errsv = errno;
        std::cerr << "Failed to open '/proc/interrupts': " << std::strerror(errno) << std::endl;
        std::exit(EXIT_FAILURE);
    }
}

const char * read_interrupts()
{
    static char buf[20000]; // user-space I/O buffer
    pread(fd_proc_interrupts, buf, sizeof(buf), /* offset= */ 0); // *atomically* read entire file; make sure `buf` is sufficiently large
    return buf;
}

/** Call this with the pointer returned by `read_interrupts()`. */
uint64_t get_TLB_shootdowns(const char *str) { /* TODO: parse and return count */ }

// finally, close `fd_proc_interrupts`

The recommended changes should be applied to

Accumulating I/O Bytes gathered from `/proc/diskstats` may account for unused devices

The function readIObytes() searches for all lines containing the string "nvme" and accumulates the values of field 6, i.e. sectors read.

I think this can cause incorrect measurements when more than one NVMe device is used by the system. While the benchmark is running on one drive, the OS or user processes may perform I/O on another NVMe drive. This way of accumulating the sectors read of all lines containing "nvme" will falsely add the I/O of other devices to the overall count.

To fix this, I think the most sane approach is to check whether the given path to file is either a regular file or a block device. In the former case, we should identify the block device that the file is stored on. In the latter case, the path directly identifies the block device. After identifying the block device that we are reading from, we can read its particular line from /proc/diskstats

viktorleis / mmapbench Goto Github PK

mmapbench's People

Stargazers

Watchers

Forkers

mmapbench's Issues

Atomically read `/proc`

Accumulating I/O Bytes gathered from `/proc/diskstats` may account for unused devices

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent