Code Monkey home page Code Monkey logo

Comments (12)

zvr avatar zvr commented on May 29, 2024 1

Yes, f0 will most probably be one of the previously opened files (although it might not) and f1 will be a new entry. You can reuse the hash value, if you have one in f0.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr So are you suggesting that we have a global map of file_info entities and then each process_info will just hold pointers to the values of that map?

With each file's path as the key?

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

In that sense rename(2) will perform a removal of the rename-from file and re-insertion of the same file with rename-to as the key this time.

RDF format could also be changed to just:

'fN b:rename `

where fN is the global identifier for the given file.

Now this actually makes me wonder why didn't we do this in the current scenario as well? We didn't need to mess with unnamed modules and other weird stuff. Given a file f1 it would go like this:

f1 a file
f1 b:name <name>
...
f1 b:rename <new_name>

Only obstacle was the fact that we would lose all info on f1 before b:rename happens. But this is something you are trying to tackle with your solution.

To be honest this model was what i was thinking from the get-go, it seems simpler, helps us find duplicates, etc.....
If i was to do this project from scratch I'd start with a global container holding all the files found without duplicates, and oh well the fact that we added the per-process trick would just be a means to identifying which processes used a file. Didn't think we would end up with each process owning their corresponding file usages.

from build-recorder.

zvr avatar zvr commented on May 29, 2024

There is need for unnamed nodes because you want to record more than two pieces of information: process p renames file f1 to file f2.

from build-recorder.

zvr avatar zvr commented on May 29, 2024

As for the implementation, yes, the file info table should be used across all processes.
The key to determine whether something is already there (and therefore a previous b:file entry can be reused) should be a mtach of both filename/path and hash.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr Why create a new file entry for the renamed file? It's the same file after all, wouldn't it make sense to just rename the same file entry?

For your given example yes there are two pieces of information, first the former path and then the new path. But given the fact that the former path was already recorded in a previous entry, we could just go ahead and only record the file's new name.

For a file f1 we could have the following information:

f1 b:name <path>
f1 b:hash <hash>
f1 b:name <new_path>

these three statements contain the information we want to record without introducing new types. This could even be expanded to multiple renames. Now i understand that we picked RDF because of the convenience that order doesn't matter, but is there an actual reason for which this is bad?

from build-recorder.

zvr avatar zvr commented on May 29, 2024

There are objects ("file") and actions ("rename"), and we want to record both.

As you note, RDF facts do not express sequence of time-indexed events, so this definitely not the correct way to record this. Your example above merely states that the object f1 has two name values.

And you're still missing recording the fact about which process was the one that performed the rename action.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr Alright, so do you propose that we go with the former fact solution:

p0 b:rename _:rename0
_rename0 b:rename-from f0
_rename0 b:rename-to f1

where f0 will be fetched from the global table of previously opened files and f1 will be created as a new file on the spot?

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr So basically:

if <rename-from> path exists as an open file f0, use it, if not create a new entry and use that instead.
<rename-to> always creates a new entry since its undefined for an entry to already exist.

Some of the issues i can come up with:

  1. We need to store full absolute paths in FILE_INFO in order to test if a path has a file entry associated with it. Once again #74 handles that.

  2. With the current rename implementation two file entries can share the same path. Searching for a file entry wont be as easy as a simple linear search.

For example:

f0 b:name <path>
<rename f0>...
f1 b:name <another_path>
<rename f1 to <another_path>>

Now i understand why we wanna create new entries for each renamed file, but we also need to think of a way to tackle this.

from build-recorder.

zvr avatar zvr commented on May 29, 2024

As I wrote before:

The key to determine whether something is already there (and therefore a previous b:file entry can be reused) should be a match of both filename/path and hash.

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

@zvr I am almost done implementing it but I've faced an issue.

Say we track an open(2) and it's about to open file f1.c. Till now we've been computing its hash at its corresponding close(2) syscall, problem with this is that we now need the hash at open(2) because as you said, we need to test for both name and hash to tell if a file has been recorded before.

Now about the possible scenarios:

  1. The file has previously been opened for reading, we open it also for reading, it's correct to hash it at open(2).
  2. The file has previously been opened for reading, we open it for writing, it's incorrect to hash it at open(2).
  3. The file was previously opened for writing, we open it for reading, it's correct to hash it at open(2).
  4. The file was previously opened for writing, we open it also for writing, it's incorect to hash it at open(2).

Generally speaking, how do we deal with writes? If a file f0 gets reused for writing(that is, it's hash changes by the end of it's usage) is it now considered a new file?

from build-recorder.

fvalasiad avatar fvalasiad commented on May 29, 2024

I think I've figured out the answer on my own.

If a file that has previously been opened is now opened for writing, it's gonna be a new file.

That is because of RDF.

The case where it's opened for the first time remains the same.

from build-recorder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.