Comments (12)
Yes, f0
will most probably be one of the previously opened files (although it might not) and f1
will be a new entry. You can reuse the hash value, if you have one in f0
.
from build-recorder.
@zvr So are you suggesting that we have a global map of file_info
entities and then each process_info
will just hold pointers to the values of that map?
With each file's path as the key?
from build-recorder.
In that sense rename(2)
will perform a removal of the rename-from
file and re-insertion of the same file with rename-to
as the key this time.
RDF format could also be changed to just:
'fN b:rename `
where fN
is the global identifier for the given file.
Now this actually makes me wonder why didn't we do this in the current scenario as well? We didn't need to mess with unnamed modules and other weird stuff. Given a file f1
it would go like this:
f1 a file
f1 b:name <name>
...
f1 b:rename <new_name>
Only obstacle was the fact that we would lose all info on f1
before b:rename
happens. But this is something you are trying to tackle with your solution.
To be honest this model was what i was thinking from the get-go, it seems simpler, helps us find duplicates, etc.....
If i was to do this project from scratch I'd start with a global container holding all the files found without duplicates, and oh well the fact that we added the per-process trick would just be a means to identifying which processes used a file. Didn't think we would end up with each process owning their corresponding file usages.
from build-recorder.
There is need for unnamed nodes because you want to record more than two pieces of information: process p
renames file f1
to file f2
.
from build-recorder.
As for the implementation, yes, the file info table should be used across all processes.
The key to determine whether something is already there (and therefore a previous b:file
entry can be reused) should be a mtach of both filename/path and hash.
from build-recorder.
@zvr Why create a new file entry for the renamed file? It's the same file after all, wouldn't it make sense to just rename the same file entry?
For your given example yes there are two pieces of information, first the former path and then the new path. But given the fact that the former path was already recorded in a previous entry, we could just go ahead and only record the file's new name.
For a file f1 we could have the following information:
f1 b:name <path>
f1 b:hash <hash>
f1 b:name <new_path>
these three statements contain the information we want to record without introducing new types. This could even be expanded to multiple renames. Now i understand that we picked RDF
because of the convenience that order doesn't matter, but is there an actual reason for which this is bad?
from build-recorder.
There are objects ("file") and actions ("rename"), and we want to record both.
As you note, RDF facts do not express sequence of time-indexed events, so this definitely not the correct way to record this. Your example above merely states that the object f1
has two name
values.
And you're still missing recording the fact about which process was the one that performed the rename action.
from build-recorder.
@zvr Alright, so do you propose that we go with the former fact solution:
p0 b:rename _:rename0
_rename0 b:rename-from f0
_rename0 b:rename-to f1
where f0 will be fetched from the global table of previously opened files and f1 will be created as a new file on the spot?
from build-recorder.
@zvr So basically:
if <rename-from>
path exists as an open file f0
, use it, if not create a new entry and use that instead.
<rename-to>
always creates a new entry since its undefined for an entry to already exist.
Some of the issues i can come up with:
-
We need to store full absolute paths in
FILE_INFO
in order to test if a path has a file entry associated with it. Once again #74 handles that. -
With the current
rename
implementation two file entries can share the samepath
. Searching for a file entry wont be as easy as a simple linear search.
For example:
f0 b:name <path>
<rename f0>...
f1 b:name <another_path>
<rename f1 to <another_path>>
Now i understand why we wanna create new entries for each renamed file, but we also need to think of a way to tackle this.
from build-recorder.
As I wrote before:
The key to determine whether something is already there (and therefore a previous
b:file
entry can be reused) should be a match of both filename/path and hash.
from build-recorder.
@zvr I am almost done implementing it but I've faced an issue.
Say we track an open(2)
and it's about to open file f1.c
. Till now we've been computing its hash
at its corresponding close(2)
syscall, problem with this is that we now need the hash at open(2)
because as you said, we need to test for both name
and hash
to tell if a file has been recorded before.
Now about the possible scenarios:
- The file has previously been opened for reading, we open it also for reading, it's correct to hash it at
open(2)
. - The file has previously been opened for reading, we open it for writing, it's incorrect to hash it at
open(2)
. - The file was previously opened for writing, we open it for reading, it's correct to hash it at
open(2)
. - The file was previously opened for writing, we open it also for writing, it's incorect to hash it at
open(2)
.
Generally speaking, how do we deal with writes? If a file f0
gets reused for writing(that is, it's hash changes by the end of it's usage) is it now considered a new file?
from build-recorder.
I think I've figured out the answer on my own.
If a file that has previously been opened is now opened for writing, it's gonna be a new file.
That is because of RDF.
The case where it's opened for the first time remains the same.
from build-recorder.
Related Issues (20)
- configure should check everything HOT 1
- `rename(2)` entry handler is invalid HOT 1
- `PROCESS_INFO` structure issues, missing free on PROCESS_INFO::entry_info in case of syscall failure
- Missing man page HOT 1
- Tests makefile integration HOT 2
- Standardized benchmarking HOT 16
- Extended functionality for build systems that interact with remote repositories.
- Implement tests HOT 1
- `build-recorder` build failure. HOT 2
- Output has to use the schema HOT 2
- `time(1)` is confused with the shell's `time` built-in keyword.
- Invalid use of `AC_CHECK_PROGS` HOT 4
- Rebuilding stuff after SOME changes.
- `time` `awk` and `wget/curl` unnecessarily required to build `build-recorder` HOT 2
- Failure building `build-recorder` with automated build & packaging `xbps-src` tool. HOT 2
- possible useful background information HOT 1
- Cannot build from first release
- xxd -n option HOT 8
- `build-recorder` executable placed under `src/` instead of toplevel. HOT 2
- First release tarball isn't updated after #213
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from build-recorder.