Code Monkey home page Code Monkey logo

Comments (13)

fthevenet avatar fthevenet commented on June 13, 2024 1

You raised some interesting points that are very relevant to a discussion about what approach to log content tokenization we should adopt.

So I took the liberty to paste your last comment there and I propose that we continue discussing that topic over there if you are interested, rather than in this issue where the finer points made might become lost once the issue is closed.

from binjr.

fthevenet avatar fthevenet commented on June 13, 2024

Hi,

Thanks a lot for your feedback, it is very much appreciated.

With regard to the data you're trying to visualize using the CSV adapter with the following format:

Feb 19 12:34:12 17.19
Feb 19 12:34:13 17.19
Feb 19 12:34:14 17.19
Feb 19 12:34:15 17.12

You will not be able to parse it properly as CSV because it does not have a single character that can be declared unambiguously as a delimiter; you could set the space char to be the delimiter, but then you'd need to use quotes to materialize the time stamp boundaries, as such:

"Feb 19 12:34:12" 17.19
"Feb 19 12:34:13" 17.19
"Feb 19 12:34:14" 17.19
"Feb 19 12:34:15" 17.12

Alternatively, use any other character to separate the time data for the other field.


In Log format:
I pick the folder, it shows the files, but my file is greyed out for some reason
I rename the file to mega.log, it's still greyed out

I am not sure what you mean here, could you please give more details, and/or add a screenshot?


When looking for a test data source just to try out the navigation features in binjr, the quickest way is to download and install Demo Adapter; it contains some ready-to-use sample data sources:
https://binjr.eu/documentation/getting-started/#trying-it-out

Hope this helps.

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

Here's a screenshot showing the Log format file picker:
image
For some reason, files are greyed-out, directories are not. It doesn't allow the user to select a file.
I've made sure the permissions are such that the files are readable, and I've tried it in my home directory where everything is owned by me.

I'm able to type in the path on the previous dialog box and read files, so I'm not completely blocked, but it's odd:
image
Note: I've changed the extensions box from *.log to just *, but it makes no difference.

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

Update: I've just updated my laptop to Fedora 37, and now when I type in a path name and click OK, it appears to load, but there's no log stream to load into a new worksheet....?

image

Here's the console log:

$ binjr
(Start: 2023-02-22 07:41:57)
[2023-02-22 07:41:58.951] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Starting...
 ╭─╮   ╭─╮       ╭─╮
 │ ╰──╮╰─┤╭────╮ ╰─┤╭──╮
 │ ╭╮ ││ ││ ╭╮ │ │ ││ ╭╯
 │ ╰╯ ││ ││ ││ │ │ ││ │
 ╰────╯└─┘└─┘└─┘╭╯ │└─┘  v3.11.0
                ╰──╯
[2023-02-22 07:41:58.966] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Version=3.11.0 (build #20230201.3)
[2023-02-22 07:41:58.966] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Version=19.0.2
[2023-02-22 07:41:58.966] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] JavaFX Version=19.0.2.1
[2023-02-22 07:41:58.967] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Vendor=Eclipse Adoptium
[2023-02-22 07:41:58.967] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java VM name=OpenJDK 64-Bit Server VM (19.0.2+7)
[2023-02-22 07:41:58.967] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Java Home=/opt/binjr/lib/runtime
[2023-02-22 07:41:58.967] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Operating System=Linux (6.1.12-200.fc37.x86_64)
[2023-02-22 07:41:58.968] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] System Architecture=amd64
[2023-02-22 07:41:58.968] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] JVM Heap Stats=Max: 4096MB | Committed: 508MB | Used: 28MB
[2023-02-22 07:41:58.968] [INFO ] [JavaFX Application Thread] [eu.binjr.core.Binjr] Garbage Collectors=Shenandoah Pauses, Shenandoah Cycles
[2023-02-22 07:41:59.456] [WARN ] [JavaFX Application Thread] [eu.binjr.common.plugins.ServiceLoaderHelper] External location /opt/binjr/plugins does not exist.
[2023-02-22 07:41:59.570] [INFO ] [JavaFX Application Thread] [org.reflections.Reflections] Reflections took 99 ms to scan 6 urls, producing 115 keys and 350 values 
[2023-02-22 07:41:59.626] [INFO ] [JavaFX Application Thread] [org.reflections.Reflections] Reflections took 44 ms to scan 6 urls, producing 115 keys and 350 values 
[2023-02-22 07:42:09.837] [ERROR] [ForkJoinPool-1-worker-1] [stderr] Feb 22, 2023 7:42:09 A.M. org.apache.lucene.store.MMapDirectory lookupProvider
WARNING: You are running with Java 19. To make full use of MMapDirectory, please pass '--enable-preview' to the Java command line.

from binjr.

Arkanosis avatar Arkanosis commented on June 13, 2024

Hi @PenelopeFudd

For some reason, files are greyed-out, directories are not.
It doesn't allow the user to select a file.

When you click on “Browse”, you should be presented with two choices: “Zip file” or “Folder”. If you chose “Folder”, what you get is a directory picker, not a file picker. I assume that's what happens here, which explains why you can't select files. I'm surprised though that the files are visible at all (even if grayed out) — maybe there's some GTK option which causes that?

Anyway, the expected workflow here is that the user selects either a directory or a zip file, but not directly a log file. Only after that, the user can drag-and-drop individual files from that directory or zip file. I guess the UI could be improved, but I'm not sure how…

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

Oh, I see; I didn't realize that it was browsing for directories, I just thought that I didn't want to examine a zip file, and I thought the only other option was just an interesting wording of "open individual log file".

Maybe add an option to load just one file, or put some text in the dialog box or a tooltip explaining that individual files can't be opened, only directories. I'm not clear why that is so, but that's the way it's designed. 😄

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

Ok, I've opened my /var/log directory, and now I can load those files that are formatted in syslog (Mon day hh:mm:ss) format, but there are a lot of files that have different timestamp formats that can't be read.

Also, I searched the main (/var/log/messages) logfile for "warn", and found:

Feb 20 19:37:56 thinkpad slack[4207]: [02/20/23, 19:37:56:307] warn: (TA5S2KDB4) Notification (message) suppressed because:
Feb 20 19:37:56 thinkpad slack[4207]: [02/20/23, 19:37:56:668] warn: (TA5S2KDB4) Notification (message) suppressed because:

I don't see a way to view the next line(s), which is leaving me in suspense! :-/

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

Ok, searching for the timestamp "Feb 20 19:37:" showed the surrounding lines. Oddly, searching for "Feb 20 19:3" didn't match, so I guess you're using Lucene.

from binjr.

fthevenet avatar fthevenet commented on June 13, 2024

The top search bar is really a filter bar: it basically filters out from the main view everything that doesn't match the query.
This bar is toggled by the button with the funnel icon, or pressing Ctrl+k:
image

If you need to find and highlight all the occurrence of a pattern in that view (but still have the rest visible) then you'll want to use the "Find" function (the looking glass button, or Ctrl+f):
image

You can use both features in conjunction with one another.

Please note that there are some important differences in the goal and the implementation of both of these features: the filter is, as you guessed implemented using a lucene index, and the query needs to obey that particular syntax. It applies to all lgos files in the current tab in their entirety and can filter gigabytes of logs in seconds, and it works alongside the other filtering widgets, like the timestamp histogram and the facets for severity and file paths. When there are too many lines left after filtering, the view is paginated so that it can be displayed without using too much memory.
The find feature is a lot more straight forward; it simply highlights occurrences of a given pattern (can be a regex) but its scope is limited to the currently displayed page (by default 10000 lines).

from binjr.

fthevenet avatar fthevenet commented on June 13, 2024

Oh, I see; I didn't realize that it was browsing for directories, I just thought that I didn't want to examine a zip file, and I thought the only other option was just an interesting wording of "open individual log file".

Maybe add an option to load just one file, or put some text in the dialog box or a tooltip explaining that individual files can't be opened, only directories. I'm not clear why that is so, but that's the way it's designed. 😄

Yes, I can understand this might not be obvious at first glance.
The reason the UI is designed this way, is that we mostly use binjr to look into many files at the same time, often recovered from remote machines as a zip archives, that can then be visualized all at once in the same tab, or separately.

At some point I started to work on a single file selection mode in the dialog, but I got side tracked and never finished it, but I do agree offering that option would help new users as it is the least surprising.

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

I'm a hardcore grep user, and lean heavily on its "show n lines before/after" feature. :-)

Last night I upgraded my system, and noticed plocate, a newer, faster version of mlocate, which was a newer, faster version of locate, which in turn was a newer, faster replacement for find. :-) It'd be interesting to harness (harvest? steal?) the algorithm it's using. I'm not saying that Lucene is bad, but when searching for a substring, it's not the right tool.

Alternatively, there's the kind of indexing and compression done by specialized tools like CLP, although if you go too far you end up with bloated try-to-please-everyone tools, <names withheld to protect the guilty!> :-)

I haven't read much of the manual yet; is there a way to use the logs to annotate time series graphs, assuming there's not too many log entries?

from binjr.

fthevenet avatar fthevenet commented on June 13, 2024

Last night I upgraded my system, and noticed plocate, a newer, faster version of mlocate, which was a newer, faster version of locate, which in turn was a newer, faster replacement for find. :-) It'd be interesting to harness (harvest? steal?) the algorithm it's using. I'm not saying that Lucene is bad, but when searching for a substring, it's not the right tool.

It's not really Lucene per se that makes it hard to look for exact substrings, but rather the way field tokenization is generally configured; so for instance to match everything that contains "Feb 20 19:3", you could use its regex syntax, something like /.*" "Feb 20 19:3".*/. Except that in our case, that's no going to give you anything because currently each line is tokenized in such a way that "Feb" "20" "19" etc... are split across as many tokens, and none of these will individually satisfy the expression.

Changing the default tokenization to satisfy that particular need would be detrimental to the more common case of looking for a single word. But that gave me an idea, though: we could add an extra field that index the whole line as single token to perform regex (or any other sub query that would benefit from that).
Say that we call that new field "regex", then you could type regex:/.*"Feb 20 19:3".*/ should have the expected effect.

@Arkanosis w.d.y.t? If find how to do a prefix handler in lucene, we could even take out the ceremonial /.* and .*/

from binjr.

PenelopeFudd avatar PenelopeFudd commented on June 13, 2024

I'd suspect that putting the whole line into a token would slow things down a lot. Lucene would index the regex field, which on the plus side would make searching for a prefix of the field fast, but on the minus side, the index would be of no use for regular expression searching.

It'd be better to break the search string (possibly regular expressions, if they aren't too complicated) into tokens, find matching records, and then search for the literal string or regex in the matched lines. If some joker like me has entered a partial token, maybe ignore tokens that don't match anything (e.g. 3) or match too many things (e.g. Feb). Actually, this sorta-kinda sounds like what happens already, doesn't it?

The locate family of commands break input lines into three-character 'trigrams', and index those. They support regular expressions, but I'm not sure how they make it so fast. At one point, I wanted to index an arbitrary file, so I took the updatedb command and replaced find / with cat $inputfile, and voila, I could use locate on that file. Of course, that was back when updatedb was a shell script. 😄 🧓🏻

from binjr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.