Code Monkey home page Code Monkey logo

Comments (39)

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 10

Here's a short update for the progress of adding monitoring support:

This might not sound or look like much, but it was quite some work to get here. So here's the first video demonstrating how FSearch updates the search results as files are being removed with the terminal:

Screencast.from.2023-02-28.18-42-59.webm

In order for that to work the database was rewritten completely in the last couple of weeks and now I'm step by step porting the code from the monitor prototypes to FSearch.

I hope to get the first alpha versions with full monitoring support out by the end of next month. However, note that those will likely not be usable as a daily driver, e.g. some features might still be missing or the database format on disk will likely change a couple of times.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 5

Yes, I know this is a really important feature to me too, but inotify support (the technology to automatically update the database) is planned for version 0.3, which will be released in a couple of months, depending on how long it takes me to release 0.2.

Edit: Of course help is always welcome :)

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 5

Can't wait to see fsearch become one more of such "diagnostic" tools.

Yeah, me too.

So I just finished the fanotify backend. It was a bit more complicated than anticipated. The fanotify documentation isn't the best, but fortunately there were some great demo implementations out there and with some additional trial and error it's now working quite well.

I think I also found a solution to insert/update/delete database entries much more efficiently. I expect that this will boost the performance up from 1000 updates/sec to at least 100.000 updates/sec.

Currently the bottle neck is the data structure being used for the database; a couple of large arrays (one for each sort type). This means that each time you add or delete a file, FSearch needs to memmove a huge block of memory (up to 8 MB for every million db entries) to the left or right by a few bytes, which is really inefficient. My idea now is to simply split those arrays into much smaller ones (something like 32k elements per array). This way search performance will still be great and memmoves will become much much faster, because much less memory needs to be moved around. I'm really curious how this turns out and I hope to get this done by the end of the week.

from fsearch.

bozicschucky avatar bozicschucky commented on July 19, 2024 4

In the next build add the feature of the application being able to auto-update the database automatically so that it can add files automatically by itself.Because when i download even a small document, fsearch can't index it till i manually update the database myself.

But this is a good project.
I think am gonna go back to study C and contribute. Regards bro.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 3

I'm not sure if I understand you correctly, but FSearch does index all entries in a database. At launch this database is loaded again.

However, FSearch doesn't automatically detect changes made to the file system and update its index then. This is on the roadmap (it's called inotify support) but it'll never work as smooth as Everything on Windows, because the Linux kernel isn't particularly good at reporting filesystem changes.

from fsearch.

mailinglists35 avatar mailinglists35 commented on July 19, 2024 3

dear author,
please give yourself time to make inotify your number 1 priority for this project.
without inotify, this app is totally useless as I already can query the mlocate via cli and mlocate is already up to date via automated cron jobs.
thanks.

from fsearch.

shao113 avatar shao113 commented on July 19, 2024 3

The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use.

Not sure if this would help, but have you seen these recent changes to fanotify?
torvalds/linux@235328d

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 3

Next update: I've been running FSearch now for a few hours while it's monitoring my home folder with 1.2 million entries and it works surprisingly well. No obvious memory leaks, no crashes, no excessive CPU usage most of the time, ...

It's also really interesting to see how many files constantly get changed when you perform certain actions on your system. This is now super easy to spot when you sort by date modified.

Next up I'm going to:

  • add the fanotify backend. This won't be much work, because fanotify events can be easily translated to inotify events (and vice versa).
  • optimize the performance: I've noticed that there are some applications which often create and immediately delete folder structures with thousands of entries. Ironically on Windows those same applications don't to that. At the moment the FSearch database can only process around 1000 creations/deletions per second on my system, so this must become way faster.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 2

@spsf64, yes, that's a good idea. Since Ctrl+R is already used (enable regex mode), I'll probably use Ctrl+Shift+R instead.

In the future I'm adding the ability to configure shortcuts for all actions anyway, then users can choose whatever key combinations they happen to like.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 2

@spsf64, done. f2f7a7c

from fsearch.

robert1826 avatar robert1826 commented on July 19, 2024 2

isn't it possible to even build a script just to update the database that i can run regularly via cron (like the one with angrysearch) ?

from fsearch.

phil294 avatar phil294 commented on July 19, 2024 2

The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use.

Not sure if this would help, but have you seen these recent changes to fanotify?
torvalds/linux@235328d

this looks promising! And now @cboxdoerfer is tagged too :P

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 2

great news, 60x faster will still make a huge difference :)

Yes, the app feels much more responsive now when lots of stuff happens on the system.

I've also found and fixed another performance bottle neck. When a folder is renamed FSearch needs to find all its sub-directories and sub-files in the database as well (because their sort order changes as well, as they now have a different path). This took about 0.1 seconds up until now with a database of one million entries. When lots of folders get renamed in a short period of time this can quickly add up and make the database busy for a while; so this needed to be improved. Fortunately the fix I came up with wasn't difficult and worked really well; it's now 10x faster (100ms -> 10ms).

All of those performance issues really made me appreciate again how snappy Everything on Windows is. Its developers really put a lot of thought into its design.

Next I'll be working on the preferences dialog, so you can actually enable/disable file system monitoring for individual folders from the GUI.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 1

I know that the kernel is capable of file system notifications, I've used inotify extensively already. But inotify has lots of limitations, that's the problem. And most solutions (GFileMonitor, FS Event (libuv), ...) are just a nicer frontend for inotify.

Like I said in my other post: The Linux kernel is really the limiting factor here, and there's currently nothing that I can do to bring the smooth and fast experience Everything offers on Windows. inotify is just much slower, it requires much more memory, it's less reliable and harder to use. If you are interested you can read about some of that in the inotify documentation: http://man7.org/linux/man-pages/man7/inotify.7.html#NOTES

But slow and memory hungry notifications are better than none, so of course I'm going to add that one way or another. Just don't expect FSearch being able to monitor the whole file system (/), because that's going to be really slow - and most certainly the kernel wont even allow it since it reaches the limit of available inotify watches per user.

from fsearch.

spsf64 avatar spsf64 commented on July 19, 2024 1

@cboxdoerfer
A bit off topic, but how about an accelerator/shortcut to update database?
Like F5 or Ctrl+R?
Maybe also add a message in the background where it says "Press Ctrl+F and start typing" like:
"Press Ctrl+F and start typing or Ctrl+R to update database"

from fsearch.

spsf64 avatar spsf64 commented on July 19, 2024 1

@cboxdoerfer
Wow, this one was fast! Just built the new package (using archlinux / aur) and it works perfect.
Thank you!

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 1

@spsf64, no problem ;)

from fsearch.

kupiqu avatar kupiqu commented on July 19, 2024 1

I think the idea of the script to use with cron is very cool, indeed!

I am currently using it in angrysearch, so the database is automatically updated every 6 hours. I think it may be a good compromise

from fsearch.

robert1826 avatar robert1826 commented on July 19, 2024 1

I've also came across fswatch with allows the recursive monitoring for directories .. just wanted to let you know

from fsearch.

dlong500 avatar dlong500 commented on July 19, 2024 1

@cboxdoerfer Have you looked into the eBPF capabilities yet? This does sound promising.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 1

Thanks, I'll keep the selection then for now. If this turns out to be controversial I can still add a config options for it.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024 1

I expect that this will boost the performance up from 1000 updates/sec to at least 100.000 updates/sec.

Just finished the first prototype of the new data structure and it seems I was a bit too optimistic here. However the performance still made a significant jump; it went up from 1.000 updates/sec to around 60.000 updates/sec (so roughly 60 times faster) and fortunately memmove is no longer the bottle neck.

There's still more room for improvements (for example by using multiple threads to apply database updates), but for now I think the performance is fine.

from fsearch.

robert1826 avatar robert1826 commented on July 19, 2024

i'm here on ubuntu 16.10 and there is a software that is called "gamin" and it says here in its description "File and directory monitoring system Gamin is a file and directory monitoring system which allows applications to detect when a file or a directory has been added, removed or modified by somebody else."
i don't know if you can use this but i think it's promising and easy too as you shouldn't implement everything from scratch here. also i think there are other alternatives as well.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

@robert1826, thx, I'll have a look at that. But first impression isn't that good, because Gamin seems to be pretty much dead - there have just been 5 commits in the past 8 years. However, chances are that's because Gamin is feature complete and rock solid. Only way to find out is by trying it.

from fsearch.

robert1826 avatar robert1826 commented on July 19, 2024

@cboxdoerfer ok, but the point is that the linux kernel actually supporting notifying applications about file system changes and that technology is called inotify maybe gamin is the best choice here but i'm sure that there are other alternatives that uses inotify. also i'll keep searching and will notify you if i found one

from fsearch.

robert1826 avatar robert1826 commented on July 19, 2024

Hi @cboxdoerfer i was wondering about a way to implement an incremental database update at least for now till someone can figure out how to make a 'proper' folder monitor ... the idea is we crawl the folders keeping a time that the previous database was built and we compare the time of the current database with the modification time of the target folder if we found that its older that the our database time we skip that folder else we recursively crawl that directory our resume with whatever way you are doing .... hope this idea helps or at least inspire someone else to help thx again for this awesome piece of software

from fsearch.

endolith avatar endolith commented on July 19, 2024

Yeah this really doesn't work like Everything if you have to spend several minutes updating the database before each search. :/

from fsearch.

danielkrajnik avatar danielkrajnik commented on July 19, 2024

Would eBPF for monitoring/tracing file system changes be worth considering? Example projects that seem to use it to monitor file system changes:

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

@danielkrajnik, thanks I've not heard of that before. I'll have a look at it.

from fsearch.

danielkrajnik avatar danielkrajnik commented on July 19, 2024

Thanks, I hope that it could be faster than fanotify and substitute what USN Journal provides on NTFS. Here is another interesting project from this area: https://github.com/kanurag94/filemonitor

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

@dlong500 yes, I experimented a bit with it. It's incredibly powerful and flexible, but it's also more complex to implement and at least in my demo had a performance overhead compared to fanotify and inotify (but this might be fixable).

So for the next 0.3 release I decided to use fanotify as the default backend (which works really well in my testing) and inotify as a fallback. An eBPF backend, if it turns out to be an improvement compared to the others, can then be added later. This way I'm not unnecessarily delaying the release of 0.3 any further.

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

I'm currently adding the file move/rename handling and ran into the following question: What's supposed to happen with the selection when a file gets renamed? Should the file (with the new name) keep the selection state it previously had or should it automatically become un-selected?

from fsearch.

danielkrajnik avatar danielkrajnik commented on July 19, 2024

Thanks for asking, I'd keep the previous selection (common operation for me would be renaming a file and then copying/moving it to somewhere else).

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

So it turned out that remembering file selection for moved/renamed files is a bit more difficult than anticipated and I've put it on hold for the moment.

The problem is that it is quite difficult to detect true move or rename events with inotify. The general idea of inotify is that whenever you rename or move a file inotify creates two events for you: IN_MOVED_FROM and IN_MOVED_TO.

The first minor problem is that there can be other events in between those two. The fix for that is quite simple: remember all IN_MOVED_FROM events until their matching IN_MOVED_TO event happens.

But the big problem is that inotify doesn't always create proper pairs. Sometimes you only get a IN_MOVED_FROM and never the corresponding IN_MOVED_TO event and vice versa. This happens when files move between un-watched and watched directories. E.g. when you're monitoring /home/user/Downloads and you move one of its files to the un-monitored trash directory, then you'll only get a IN_MOVED_FROM event and never a IN_MOVED_TO event.

There are two ways how this can be fixed, as far as I know:

  1. Assume that if there's still no matching IN_MOVED_TO event after some time, that there won't ever be one and we then interpret the former IN_MOVED_FROM event as a moved out of our monitored directory event and simply remove the file from our index. The longer you wait, the more reliable this approach gets, but also your index and search results remain inconsistent with the file system for longer. There's probably some good middle ground for that, but it remains guess work.

  2. The most reliable fix I can think of is to simply treat every IN_MOVED_FROM event immediately as a delete event and an IN_MOVED_TO event as a created event. This just works, since there's no guess work necessary for how long to wait for the next event, but it comes at the cost of being a bit more resource intensive and it's not possible to remember the selection for actually moved/renamed files.

So currently I'm favoring and using the second approach, simply because it's reliable and simplifies the code. But I'll revisit the first approach again. If anyone knows of an alternative solution, let me know.

from fsearch.

danielkrajnik avatar danielkrajnik commented on July 19, 2024

Great news, thanks for your hard work. I always find it interesting to see how much code runs on a seemingly idle system. Can't wait to see fsearch become one more of such "diagnostic" tools.

from fsearch.

danielkrajnik avatar danielkrajnik commented on July 19, 2024

great news, 60x faster will still make a huge difference :)

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

Can anyone think of a good use case for allowing the same directory being included in the database multiple times(with slightly different settings)? For example a situation like this:
Screenshot from 2023-03-27 16-19-12

Currently it's still possible, but since it would simplify a few things in the code and it seems pretty pointless, I'm thinking about removing that option.

from fsearch.

rankaiyx avatar rankaiyx commented on July 19, 2024

Look forward to this feature! You're a hero. Is version 0.3 ready?

from fsearch.

cboxdoerfer avatar cboxdoerfer commented on July 19, 2024

Look forward to this feature! You're a hero. Is version 0.3 ready?

No, unfortunately not yet. I've been quite busy recently (new job etc.). But now that things are slowly going back to normal, I'll be able spend more time on FSearch again.

from fsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.