Code Monkey home page Code Monkey logo

Comments (3)

fvalasiad avatar fvalasiad commented on May 29, 2024

This resolves $1.

Now why are we still getting segfaults here and there? Well turns out we still stumble upon scenarios where
hash != NULL
finfo[i].hash == NULL
finfo[i].was_hash_printed == 1

(potentially should rename was_hash_printed to was_hash_computed)

Uncertain how this happens. Only folders should be getting a NULL hash. Will look into it.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr Now that I've come to understand sync(2) and syncfs(2) for that matter, do you assume that the scenario mentioned above is only to be expected if and only if there is a sync(2) or syncfs(2) in there?

Like this:

open f1 = "/some/file" for writing
//write
syncfs(f1)
open f2 = "/some/file" for reading
close f1(write)
close f2(read)

And if yes, do you propose that we should trace sync(2) calls as well and store the hash of the file they sync upon meeting? That's quite easy in terms of syncfs(2) but if we were to talk about sync(2) we would need to know which files have pending file operations to avoid re-hashing everything, which will be a performance hit.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr Okay story time!

The Bug

Remember this issue? The bug $1 i supposedly fixed? Well turns out it was also the root of bug $2(who would have thought)!

Here is the code for handle open:

static void
handle_open(PROCESS_INFO *pi, int fd, int dirfd, void *path, int purpose)
{
    path = get_str_from_process(pi->pid, path);
    char *abspath = absolutepath(pi->pid, dirfd, path);

    if (abspath == NULL)
	error(EXIT_FAILURE, errno, "on handle_open absolutepath");

    char *hash = NULL;

    FILE_INFO *f = NULL;

    if ((purpose & O_ACCMODE) == O_RDONLY) {
	hash = get_file_hash(abspath);
	f = find_finfo(abspath, hash);
    }

    if (!f) {
	f = next_finfo();
	finfo_new(f, path, abspath, hash);
    } else {
	free(path);
	free(abspath);
	free(hash);
    }
    *finfo_at(pi, fd) = f - finfo;

    record_fileuse(pi->outname, f->outname, purpose);
    if (!f->was_hash_printed && (purpose & O_ACCMODE) == O_RDONLY) {
	f->was_hash_printed = 1;
	record_hash(f->outname, hash);
    }
}

And also find_finfo

FILE_INFO *
find_finfo(char *abspath, char *hash)
{
    int i = numfinfo;

    while (i >= 0 && !(!strcmp(abspath, finfo[i].abspath)
		       && (!(finfo[i].was_hash_printed)
			   || (hash == NULL && finfo[i].hash == NULL)
			   || !strcmp(hash, finfo[i].hash)))) {
	--i;
    }

    if (i < 0) {
	return NULL;
    }

    return finfo + i;
}

To find out the bug we simply have to follow the path of handle_open(3) given that f1 is now opened for reading.

We get that find_finfo(3) thanks to the !(finfo[i].was_hash_printed) line actually returns the half-completed file used for writing. Here:

if ((purpose & O_ACCMODE) == O_RDONLY) {
	hash = get_file_hash(abspath);
	f = find_finfo(abspath, hash);
}

Some code later we reach here:

if (!f->was_hash_printed && (purpose & O_ACCMODE) == O_RDONLY) {
	f->was_hash_printed = 1;
	record_hash(f->outname, hash);
}

f, which was initially a file opened for writing but not yet close(2)d. Actually has its was_hash_printed field set to 1. As a result f is now an entry with its hash field equal to NULL and its was_hash_printed set to 1. Recipe for disaster if f was to be re-used by a future read.

Solution

Given our discussion about how to handle the sync(2) family of system calls, the way i handled bug $1 was wrong.

The way it should be handled is that if an entry has its was_hash_printed field set to 0(that is, on writes), it should be skipped altogether from the re-usage find_finfo(3) call.

from build-recorder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.