Code Monkey home page Code Monkey logo

mgrep's Introduction

Notes

As many of us, I recurrently need to extract multi line blocks of text from, let say, logs files. Depending of the structure of the block of text, a simple egrep "regexp" -Bn -An could be sufficient, or not. When it comes to match more complex regular expression over several lines of text, all traditional unix tools fails (well, I fail miserably) miserably to offer a simple solution.

The idea here is to provide a simple tool for matching complex regular expression over multiple lines of text.

CLI usage ideas :

mgrep -b "Warning" -e "ignored"

Will match a block of text where the first line contain the word "Warning" and the last line the word "ignored". The idea here is that such an expressions will be auto-anchored like this:

mgrep -b "^.*Warning.*$" -e "^.*ignored.*$"

One could give a block body constraint:

mgrep -b "Warning" -i "Unknow (media|mime) type" -e "ignored"

Will match the given text block only if one of it's inners lines match against "^.*Unknow (media|mime) type.*$"

Parts of the auto-anchoring could be dissabled by anchoring manually:

mgrep -b "^Warning" -e "ignored\.$"

Will be auto-anchored like this:

mgrep -b "^Warning.*$" -e "^.*ignored\.$"

Consecutive lines could be matched by providing several time the same switch:

mgrep -b "Warning" -b "^File" -e "ignored"

Will match a text block where the first line match "^.*Warning.*$", the second "^File.*$" and the last "^.*ignored.*$"

The last idea is to provide a simple scripting language:

Warning
^File
*
+MIME type
*(^Unknow media)|(Bad file type)
!Configuring NAT
+
ignored\.$

Here,

?, +, * and ! in front of the line have the same meaning as in regular expressions, but they match against new lines.

So, the following script match a text block which follow thoses rules:

The first line match

^.*Warning.*$

The second

^File.*$

Followed by zero or many lines

Followed by one or many lines matching

^MIME type.*$

Followed by zero or many lines matching

(^Unknow media.*$)|(^.*Bad file type.*$)

Followed by any line not matching

^.*Configuring NAT.*$

Followed by one or many lines

Finally, the last line in the pattern should match

^.*ignored\.$

That's it, let's back to work!

mgrep's People

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.