Code Monkey home page Code Monkey logo

grok's Introduction

Go Reference Build Status Coverage Status Go Report Card Documentation Status

grok

A simple library to parse grok patterns with Go.

Installation

Make sure you have a working Go environment.

go get github.com/vjeantet/grok

Use in your project

import "github.com/vjeantet/grok"

Usage

Available patterns and custom ones

By default this grok package contains only patterns you can see in patterns/grok-patterns file.

When you want to add a custom pattern, use the grok.AddPattern(nameOfPattern, pattern), see the example folder for an example of usage. You also can load your custom patterns from a file (or folder) using grok.AddPatternsFromPath(path), or PatterndDir configuration.

Parse all or only named captures

g, _ := grok.New()
values, _  := g.Parse("%{COMMONAPACHELOG}", `127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207`)

g, _ = grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})
values2, _ := g.Parse("%{COMMONAPACHELOG}", `127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207`)

values is a map with all captured groups values2 contains only named captures

Examples

package main

import (
	"fmt"

	"github.com/vjeantet/grok"
)

func main() {
	g, _ := grok.New()
	values, _ := g.Parse("%{COMMONAPACHELOG}", `127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207`)

	for k, v := range values {
		fmt.Printf("%+15s: %s\n", k, v)
	}
}

output:

       response: 404
          bytes: 207
       HOSTNAME: 127.0.0.1
       USERNAME: -
       MONTHDAY: 23
        request: /index.php
      BASE10NUM: 207
           IPV6:
           auth: -
      timestamp: 23/Apr/2014:22:58:32 +0200
           verb: GET
    httpversion: 1.1
           TIME: 22:58:32
           HOUR: 22
COMMONAPACHELOG: 127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207
       clientip: 127.0.0.1
             IP:
          ident: -
          MONTH: Apr
           YEAR: 2014
         SECOND: 32
            INT: +0200
           IPV4:
         MINUTE: 58
     rawrequest:

Example 2

package main

import (
  "fmt"

  "github.com/vjeantet/grok"
)

func main() {
  g, _ := grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})
  values, _ := g.Parse("%{COMMONAPACHELOG}", `127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207`)

  for k, v := range values {
    fmt.Printf("%+15s: %s\n", k, v)
  }
}

output:

      timestamp: 23/Apr/2014:22:58:32 +0200
           verb: GET
     rawrequest:
          bytes: 207
           auth: -
        request: /index.php
    httpversion: 1.1
       response: 404
COMMONAPACHELOG: 127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207
       clientip: 127.0.0.1
          ident: -

grok's People

Contributors

aantono avatar arnecls avatar fxnn avatar jamesofnet avatar lentregu avatar palmerabollo avatar paulstuart avatar prep avatar tengattack avatar vjeantet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grok's Issues

replace compiledin default patterns with only "base" ones

Hello,

Currently, 13905 allocations are performed when calling a Grok.New(), about 11ms on my MBPro

If I replace the current patterns.go with one containing only expressions found in the base file
Grok.New() performs 5502 allocations during 2,4 ms on the same MBPro.

What do you think about making patterns.go lighter ?

field names with dots

It seems that the following regex is used to detect named references:

namedReference = regexp.MustCompile(`%{(\w+(?::\w+(?::\w+)?)?)}`)

This forbids field names with special characters like dots and dashes to be used. The pattern will not be detected and doesn't match - without any error message.

Especially the dot is important to stay compatible with frameworks like LogStash, whose grok-patterns heavily relies on dots within names to add some structure.

Example:

  • pattern: %{DATA:var.with.dots}
  • pattern definitions/regexes: `map[string]string{"DATA": "dat[0-9]"}
  • input: "dat123"

When using a custom pattern with dot in name, parse intermittently fails with "no pattern found"

When using the new support for . in the named capture, sometimes the Parse will fail with the error "no pattern found".

The number of iterations required to trigger the error does not remain the same, but usually fails within 2-5 cycles.

Here is the test code I am using:

package main

import (
	"fmt"

	"github.com/vjeantet/grok"
)

func main() {
	for i := 0; i < 100; i++ {
		g, err := grok.New()
		if err != nil {
			fmt.Printf("new grok error: %v\n", err)
			return
		}

		g.AddPatternsFromMap(map[string]string{
			"CUSTOM": "%{SYSLOGTIMESTAMP:foo.bar}",
		})

		_, err = g.Parse("%{CUSTOM}", "Apr 10 05:11:57")
		if err != nil {
			fmt.Printf("parse error: %v\n", err)
			return
		}

		fmt.Printf("passed: %d\n", i)
	}
}

Output:

$ go run main.go
passed: 0
passed: 1
passed: 2
parse error: no pattern found for %{CUSTOM}

I believe this may be due to an issue with sortGraph, causing an ignored error to be returned here from addPattern.

Issue using grok %{IPORHOST} pattern

When I compile a pattern using grok's IPORHOST pattern, I do not get the expected results when parsing an IP address. This appears to be because the grok pattern does not get properly parsed down to a valid regex pattern.

I have reproduced this issue with the following unit test:

func TestParsePatternWithIPORHOST(t *testing.T) {
	g, _ := NewWithConfig(&Config{NamedCapturesOnly: true})
	patterns := map[string]string{
		"PATTERN_WITH_IP": `%{IPORHOST:client_ip} %{NOTSPACE:foo}`,
	}
	err := g.AddPatternsFromMap(patterns)
	if err != nil {
		t.Fatalf("AddPatternsFromMap should not return error: %s", err)
	}

	matches, err := g.Parse("%{PATTERN_WITH_IP}", `2001:0db8:85a3:0000:0000:8a2e:0370:7334 bar`)
	if err != nil {
		t.Fatalf("Unexpected error parsing pattern: " + err.Error())
	}

	expectedMatches := map[string]string{
		"client_ip": "2001:0db8:85a3:0000:0000:8a2e:0370:7334",
		"foo":       "bar",
	}

	for k, expected := range expectedMatches {
		actual, ok := matches[k]
		if !ok {
			t.Errorf("Expected match key [%s] but didn't find", k)
			continue
		}
		if actual != expected {
			t.Errorf("Expected match key [%s] to be [%s], got [%s]", k, expected, actual)
			continue
		}
	}
}

which fails with the following message (the pattern does not match the full IPv6 address:

	grok_test.go:125: Expected match key [client_ip] to be [2001:0db8:85a3:0000:0000:8a2e:0370:7334], got [7334]
FAIL
FAIL	github.com/vjeantet/grok	0.007s

You can verify that the pattern is supposed to match the entire IPv6 address here: http://grokdebug.herokuapp.com/, by pasting the log-line into the input:

2001:0db8:85a3:0000:0000:8a2e:0370:7334 bar

and the pattern:

%{IPORHOST:client_ip} %{NOTSPACE:foo}

return only named captures when parsing

To follow @prep Pull Request #4 :
Add a way to return only named captures when using Parse or MultiParse, to behave a bit more like the original grok tool.

@fxnn proposal : add a new func to keep code readable

grok.ParseOnlyNamedCaptures(string)

when parsing a text like

127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207

with

%{COMMONAPACHELOG:mylog}

it should return

{
  "mylog": [
    [
      "127.0.0.1 - - [23/Apr/2014:22:58:32 +0200] "GET /index.php HTTP/1.1" 404 207"
    ]
  ],
  "clientip": [
    [
      "127.0.0.1"
    ]
  ],
  "ident": [
    [
      "-"
    ]
  ],
  "auth": [
    [
      "-"
    ]
  ],
  "timestamp": [
    [
      "23/Apr/2014:22:58:32 +0200"
    ]
  ],
  "verb": [
    [
      "GET"
    ]
  ],
  "request": [
    [
      "/index.php"
    ]
  ],
  "httpversion": [
    [
      "1.1"
    ]
  ],
  "rawrequest": [
    [
      null
    ]
  ],
  "response": [
    [
      "404"
    ]
  ],
  "bytes": [
    [
      "207"
    ]
  ]
}

Match failed when the field has '-'

When I try to use junos patterns but failed to parse the logs.
The reason is the field has '-' ,such as %{IP:src-ip}.
It will be ok when I change to %{IP:src_ip}

Fields with brackets

Hey guys.

It would appear this library doesn’t support field names with brackets ( ) in them. Specifically I’m matching IIS logs that have field names that uses brackets.

Is there a way of working around this or could this library be updated to support the use of brackets?

Cheers

Pete

when I add a custom pattern wich starts with "(?<", I run into "error parsing regexp: invalid or unsupported Perl syntax: `(?<`"

I searched golang docs. And I found this description:
image
So I changed my custom pattern to "(?P<":

(?P<loglevel>[A-Z])%{NUMBER:logdate} %{TIME:logtime}%{SPACE}%{NUMBER:threadid} (?P<file_source>%{WORD}\.%{WORD}):(?P<file_line>%{NUMBER})\]%{SPACE}GetBucket\[token:%{NOTSPACE:token}, path:%{URIPATHPARAM:request}\]

But I notice that the default patterns are no need to start with "(?P<", indeed they start with "(?<":

CLOUDFRONT_ACCESS_LOG (?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}\t%{TIME})\t%{WORD:x_edge_location}\t(?:%{NUMBER:sc_bytes:int}|-)\t%{IPORHOST:clientip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status:int}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:agent}\t%{GREEDYDATA:cs_uri_query}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes:int}\t%{GREEDYDATA:time_taken:float}\t%{GREEDYDATA:x_forwarded_for}\t%{GREEDYDATA:ssl_protocol}\t%{GREEDYDATA:ssl_cipher}\t%{GREEDYDATA:x_edge_response_result_type}

multiline support

i want to parse multiline with grok

g, _ := NewWithConfig(&Config{NamedCapturesOnly: true})

text := "2017-09-20T13:42:38.349+8:00 error login [http-nio-8081-exec-6] [LogServiceImpl] <2eaad87b6c3443fa9ca0e4f3e5402b2d,0_1_2> [0x00105091] - this is a log195\n com.logstore.exception.ProgramException: java.text.ParseException: Unparseable date: \"2017-08-01T20:15:12.342+07:00\"\n"

captures, _ := g.Parse("(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NOTSPACE:module} \\[%{NOTSPACE:threadNo}\\] \\[%{NOTSPACE:methodName}\\] <%{NOTSPACE:traceId},%{NOTSPACE:rpcId}> \\[%{NOTSPACE:errorCode}\\] - %{GREEDYDATA:message}", text);

for k, v := range captures {
		fmt.Println(k, "=", v)
	}

==========================ouput================
level = error
module = login
threadNo = http-nio-8081-exec-6
methodName = LogServiceImpl
rpcId = 0_1_2
errorCode = 0x00105091
timestamp = 2017-09-20T13:42:38.349+8:00
traceId = 2eaad87b6c3443fa9ca0e4f3e5402b2d
message = this is a log195

the message field parsed by grok is not correct! please help me, thanks!

AddPatternsFromMap panics if any of the internal pattern does not exist

Hello.

Code (reproduce):

g := grok.New()
if err := g.AddPatternsFromMap(map[string]string{"SOME": `%{NOT_EXIST}`}); err != nil {
    fmt.Println(err)
}

Trace:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/gemsi/grok.sortGraph.func1(0x562070, 0x4)
    /home/username/go/src/github.com/gemsi/grok/graph.go:46 +0x58d
github.com/gemsi/grok.sortGraph(0xc82018b050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/username/go/src/github.com/gemsi/grok/graph.go:52 +0x31d
github.com/gemsi/grok.(*Grok).AddPatternsFromMap(0xc82000e300, 0xc82018af30, 0x0, 0x0)
    /home/username/go/src/github.com/gemsi/grok/grok.go:97 +0x391
main.main()
    /home/username/grok.go:17 +0xe1
exit status 2

But this works fine

g := grok.New()
if err := g.AddPattern("SOME", `%{NOT_EXIST}`); err != nil {
    fmt.Println(err)
}
// Output
// no pattern found for %{NOT_EXIST}

Export the Grok.compile and Grok.compiledparse ?

Hello,

Wouldn't it be cool to make Grok.Compile() and Grok.CompiledParse() exported ?
It is usually considered a good practice to compile regex in advance. Or maybe there is another way to do it and I didn't understand/read the doc properly,

ps: If you agree, I can submit a MR because I made the modifications locally.

Thanks !

Custom Pattern Grok Parse not working

Grok.Parse("%{GREEDYDATA:var1}: %{GREEDYDATA:var2} (?<var3>FOUND)", `/opt/facs/casrepos/fa/common.jar: 44228-9915812-0 FOUND`)

The above code returns empty map

Grok.Parser() Nested Fields support

Grok.Parse() should allow nested fields

Grok.Parse("%{GREEDYDATA:[field1][nestedField1]}: %{GREEDYDATA:[field2][nestedField2]} (?<[field3][nestedField3]>CUSTOM)", `log_source`)

Add Type to semantic

It would be great to use this tool to pipe lines into it and output the captured fields as json or any other standard representation. Something like this:

cat file.txt | grok -e "%{COMMONAPACHELOG}"

Output:

{"clientip":"127.0.0.1", "response": 404, "bytes": 207, ...}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.