Until now, the system records files that have been open(2)</

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Recording files about build-recorder HOT 12 CLOSED

zvr commented on May 29, 2024 1

Recording files

from build-recorder.

Comments (12)

zvr commented on May 29, 2024 1

Yes, f0 will most probably be one of the previously opened files (although it might not) and f1 will be a new entry. You can reuse the hash value, if you have one in f0.

from build-recorder.

fvalasiad commented on May 29, 2024

@zvr So are you suggesting that we have a global map of file_info entities and then each process_info will just hold pointers to the values of that map?

With each file's path as the key?

from build-recorder.

fvalasiad commented on May 29, 2024

In that sense rename(2) will perform a removal of the rename-from file and re-insertion of the same file with rename-to as the key this time.

RDF format could also be changed to just:

'fN b:rename `

where fN is the global identifier for the given file.

Now this actually makes me wonder why didn't we do this in the current scenario as well? We didn't need to mess with unnamed modules and other weird stuff. Given a file f1 it would go like this:

f1 a file
f1 b:name <name>
...
f1 b:rename <new_name>

Only obstacle was the fact that we would lose all info on f1 before b:rename happens. But this is something you are trying to tackle with your solution.

To be honest this model was what i was thinking from the get-go, it seems simpler, helps us find duplicates, etc.....
If i was to do this project from scratch I'd start with a global container holding all the files found without duplicates, and oh well the fact that we added the per-process trick would just be a means to identifying which processes used a file. Didn't think we would end up with each process owning their corresponding file usages.

from build-recorder.

zvr commented on May 29, 2024

There is need for unnamed nodes because you want to record more than two pieces of information: process p renames file f1 to file f2.

from build-recorder.

zvr commented on May 29, 2024

As for the implementation, yes, the file info table should be used across all processes.
The key to determine whether something is already there (and therefore a previous b:file entry can be reused) should be a mtach of both filename/path and hash.

from build-recorder.

fvalasiad commented on May 29, 2024

@zvr Why create a new file entry for the renamed file? It's the same file after all, wouldn't it make sense to just rename the same file entry?

For your given example yes there are two pieces of information, first the former path and then the new path. But given the fact that the former path was already recorded in a previous entry, we could just go ahead and only record the file's new name.

For a file f1 we could have the following information:

f1 b:name <path>
f1 b:hash <hash>
f1 b:name <new_path>

these three statements contain the information we want to record without introducing new types. This could even be expanded to multiple renames. Now i understand that we picked RDF because of the convenience that order doesn't matter, but is there an actual reason for which this is bad?

from build-recorder.

zvr commented on May 29, 2024

There are objects ("file") and actions ("rename"), and we want to record both.

As you note, RDF facts do not express sequence of time-indexed events, so this definitely not the correct way to record this. Your example above merely states that the object f1 has two name values.

And you're still missing recording the fact about which process was the one that performed the rename action.

from build-recorder.

fvalasiad commented on May 29, 2024

@zvr Alright, so do you propose that we go with the former fact solution:

p0 b:rename _:rename0
_rename0 b:rename-from f0
_rename0 b:rename-to f1

where f0 will be fetched from the global table of previously opened files and f1 will be created as a new file on the spot?

from build-recorder.

fvalasiad commented on May 29, 2024

@zvr So basically:

if <rename-from> path exists as an open file f0, use it, if not create a new entry and use that instead.
<rename-to> always creates a new entry since its undefined for an entry to already exist.

Some of the issues i can come up with:

We need to store full absolute paths in FILE_INFO in order to test if a path has a file entry associated with it. Once again #74 handles that.
With the current rename implementation two file entries can share the same path. Searching for a file entry wont be as easy as a simple linear search.

For example:

f0 b:name <path>
<rename f0>...
f1 b:name <another_path>
<rename f1 to <another_path>>

Now i understand why we wanna create new entries for each renamed file, but we also need to think of a way to tackle this.

from build-recorder.

zvr commented on May 29, 2024

As I wrote before:

The key to determine whether something is already there (and therefore a previous b:file entry can be reused) should be a match of both filename/path and hash.

from build-recorder.

fvalasiad commented on May 29, 2024

@zvr I am almost done implementing it but I've faced an issue.

Say we track an open(2) and it's about to open file f1.c. Till now we've been computing its hash at its corresponding close(2) syscall, problem with this is that we now need the hash at open(2) because as you said, we need to test for both name and hash to tell if a file has been recorded before.

Now about the possible scenarios:

The file has previously been opened for reading, we open it also for reading, it's correct to hash it at open(2).
The file has previously been opened for reading, we open it for writing, it's incorrect to hash it at open(2).
The file was previously opened for writing, we open it for reading, it's correct to hash it at open(2).
The file was previously opened for writing, we open it also for writing, it's incorect to hash it at open(2).

Generally speaking, how do we deal with writes? If a file f0 gets reused for writing(that is, it's hash changes by the end of it's usage) is it now considered a new file?

from build-recorder.

fvalasiad commented on May 29, 2024

I think I've figured out the answer on my own.

If a file that has previously been opened is now opened for writing, it's gonna be a new file.

That is because of RDF.

The case where it's opened for the first time remains the same.

from build-recorder.

Recording files about build-recorder HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent