Code Monkey home page Code Monkey logo

logstash-filter-kv's Introduction

Logstash Plugin

Travis Build Status

This is a plugin for Logstash.

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

Documentation

Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one central location.

Need Help?

Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.

Developing

1. Plugin Developement and Testing

Code

  • To get started, you'll need JRuby with the Bundler gem installed.

  • Create a new plugin or clone and existing from the GitHub logstash-plugins organization. We also provide example plugins.

  • Install dependencies

bundle install

Test

  • Update your dependencies
bundle install
  • Run tests
bundle exec rspec

2. Running your unpublished Plugin in Logstash

2.1 Run in a local Logstash clone

  • Edit Logstash Gemfile and add the local plugin path, for example:
gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
  • Install plugin
# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify
  • Run Logstash with your plugin
bin/logstash -e 'filter {awesome {}}'

At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.

2.2 Run in an installed Logstash

You can use the same 2.1 method to run your plugin in an installed Logstash by editing its Gemfile and pointing the :path to your local plugin development directory or you can build the gem and install it using:

  • Build your plugin gem
gem build logstash-filter-awesome.gemspec
  • Install the plugin from the Logstash home
# Logstash 2.3 and higher
bin/logstash-plugin install --no-verify

# Prior to Logstash 2.3
bin/plugin install --no-verify
  • Start Logstash and proceed to test the plugin

Contributing

All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.

Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.

It is more important to the community that you are able to contribute.

For more information about contributing, see the CONTRIBUTING file.

logstash-filter-kv's People

Contributors

7lima avatar andsel avatar colinsurprenant avatar electrical avatar fbaligand avatar jakelandis avatar jordansissel avatar jsvd avatar karenzone avatar kares avatar magnusbaeck avatar mashhurs avatar ph avatar robbavey avatar suyograo avatar talevy avatar torse avatar untergeek avatar urielha avatar whyscream avatar wiibaa avatar yaauie avatar ycombinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logstash-filter-kv's Issues

Regex is slow for text parsing job

Hi guys, recently I'm working on improving our team's logstash performance. After some profiling, I find out that the kv plugin costs lots of CPU time. So I read the source code, and I realize that kv plugin is using regex to perform its job. I understand that by using regex it can easily support some advanced features such like using regex pattern as field splitter. But for very simple use case, e.g. kv {source => 'some_source'}, regex seems too much overhead.

Most regex engine uses backtracking or NFA rather than DFA which causes bad performance in some patterns. As for kv plugin, '([^']+)'|\\(([^\\)]+)\\)|\\[([^\\]]+)\\]|<([^>]+)>|((?:\\\\ |[^ ])+)) cause the engine to scan the input over and over again if the input like " name= (x " * repeat many times. While this can be easily avoided if we parse text manually.

So I think maybe we can use some manually parsing algorithm when there is no advanced features involved. Actually I do developed an algorithm to perform same jobs as kv {source => 'message'}. I tested the new algorithm with our production input. To process 800K message, kv plugin used 19s, algorithm I developed used only 4s. As for input mentioned above " name=(x " repeat 10240 times, kv used 7934ms, while the algorithm I developed only used 12ms.

Dependencies Issues while running the Specs

I cloned the repository to my local; having Jruby installed, and bundle install was successful. When I run ny of the specs, I get the dependcies issues:

**RuntimeError:
you might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then resolve the jars with lock_jars command

no such file to load -- org/slf4j/slf4j-api/1.7.21/slf4j-api-1.7.21 (LoadError)**

I tried with few other repositories I see the same error message. Could you please let me know how do I resolve the issue.

Jruby Version - jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server VM 25.11
1-b14 on 1.8.0_111-b14 +jit [mswin32-x86_64]

Ruby Version - ruby 2.3.3p222 (2016-11-21 revision 56859) [i386-mingw32]

kv filter incorrectly parse messages, which contain keys with no value

in elastic/logstash#9786, @eatroshkin reports that logstash-filter-kv v4.1.1 was not correctly parsing messages that hand valueless keys:

Starting from version 5.6.9 logstash incorrectly parse messages, which contains keys with no value.
Can be reproduces with config:

input { stdin { } }
filter {
      kv {
        source => "message"
        field_split => "\t"
        value_split => "="
        include_brackets => true
      }
}
output {
  stdout { codec => rubydebug }
}

Result of parsing message a=11 b= c=33 d=44:

{
    "a" => "11",
    "b" => "c=33",
    "d" => "44",
    "message" => "a=11\tb=\tc=33\td=44"
}

logstash version 5.6.8 and before correctly parse the same message:

{
    "a" => "11",
    "c" => "33",
    "d" => "44",
    "message" => "a=11\tb=\tc=33\td=44"
}

OS: Ubuntu trusty,
Logstash installed from this repository deb https://artifacts.elastic.co/packages/5.x/apt stable main

I have not yet validated if the recently-released v4.1.2, which had a fix related to over-greedy caputures also resolves this issue.

Fix recursive issue

When using the same separator for a recursive kv filter, the output generated is wrong. See: https://discuss.elastic.co/t/kv-plugin-recursive-bug/52452.

Easy repro steps:

filter { 
  kv { 
    source => "message" 
    recursive => true 
    field_split => "," 
    value_split => "=" 
   } 
} 

Use the following input: a=1,b=2,c=[d=3,e=4,f=[g=5,h=6]] .

You will see that the result is wrongly generated.

kv filter caused logstash crashed

  • Version: Logstash and Elastic Versions: 5.4.0,
    filebeat version: filebeat version 5.4.0 (amd64), libbeat 5.4.0
  • Operating System: Mac OS Yosemite
  • Config File
filter {
  kv { 
    trim_key => "<>\[\],`\."
    remove_field => ["\\%{some_field}", "{%{some_field}"]
    include_brackets => false
  }
}
  • Sample Data:

time=2017-05-24T18:23:02.619100+00:00 severity=INFO pid=7078 method=GET path=/projects/20432/report format=html controller=reports action=show status=200 duration=2304.09 view=2172.47 db=89.51 time=2017-05-24 18:23:00 +0000 ip=216.16.231.94 host=demo.central.miovision.com params={"download_token"=>"1495650173", "report"=>{"format"=>"travel_time", "study_observation_set_ids"=>["", "57", "56", "55", ""], "bin_size"=>"3600"}, "project_id"=>"20432"}

  • Steps to Reproduce: I just upgraded our ELK stack to 5.4, and we got the time out error when we tried to publish the events

2017/05/24 15:20:00.468812 sync.go:85: ERR Failed to publish events caused by: read tcp 10.0.3.150:56617->167.114.251.27:19130: i/o timeout
2017/05/24 15:20:00.468844 single.go:91: INFO Error publishing events (retrying): read tcp 10.0.3.150:56617->167.114.251.27:19130: i/o timeout
2017/05/24 15:20:17.127813 metrics.go:39: INFO Non-zero metrics in the last 30s: libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.write_errors=1 libbeat.logstash.published_but_not_acked_events=4

Looks like there is something wrong with the kv filter and when we looked at the error log from logstash, we got this:

[[main]>worker15] ERROR logstash.pipeline - Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>"-1", "backtrace"=>["java.util.ArrayList.elementData(ArrayList.java:418)", "java.util.ArrayList.remove(ArrayList.java:495)", "org.logstash.FieldReference.parse(FieldReference.java:37)", "org.logstash.PathCache.cache(PathCache.java:37)", "org.logstash.PathCache.isTimestamp(PathCache.java:30)", "org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:122)", "org.logstash.ext.JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.call(JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.gen)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)", "rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.block_1$RUBY$file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb:326)", "rubyjit$LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247$block_1$RUBY$file.call(rubyjit$LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247$block_1$RUBY$file)", "org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)", "org.jruby.runtime.Block.yield(Block.java:142)", "org.jruby.RubyHash$13.visit(RubyHash.java:1355)", "org.jruby.RubyHash.visitLimited(RubyHash.java:648)", "org.jruby.RubyHash.visitAll(RubyHash.java:634)", "org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1306)", "org.jruby.RubyHash.each_pairCommon(RubyHash.java:1351)", "org.jruby.RubyHash.each19(RubyHash.java:1342)", "org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVOKER$i$0$0$each19.gen)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)", "rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb:326)", "rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:201)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:183)", "rubyjit.LogStash::Filters::Base$$do_filter_8e8403dcfdf01a35ffca12ed35ec4e79455489071173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145)", "rubyjit.LogStash::Filters::Base$$do_filter_8e8403dcfdf01a35ffca12ed35ec4e79455489071173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:201)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)", "rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164)", "rubyjit$LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247$block_0$RUBY$file.call(rubyjit$LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247$block_0$RUBY$file)", "org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)", "org.jruby.runtime.Block.yield(Block.java:142)", "org.jruby.RubyArray.eachCommon(RubyArray.java:1606)", "org.jruby.RubyArray.each(RubyArray.java:1613)", "org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)", "rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161)", "rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "rubyjit.LogStash::FilterDelegator$$multi_filter_43640ebf68de601b56cb618392ab9de0b4f8c58a1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:43)", "rubyjit.LogStash::FilterDelegator$$multi_filter_43640ebf68de601b56cb618392ab9de0b4f8c58a1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)", "org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)", "org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)", "org.jruby.ast.BlockNode.interpret(BlockNode.java:71)", "org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)", "org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)", "org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)", "org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)", "org.jruby.runtime.Block.call(Block.java:101)", "org.jruby.RubyProc.call(RubyProc.java:300)", "org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:64)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:370)", "rubyjit$LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247$block_0$RUBY$file.call(rubyjit$LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247$block_0$RUBY$file)", "org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)", "org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)", "org.jruby.runtime.Block.call(Block.java:101)", "org.jruby.RubyProc.call(RubyProc.java:300)", "org.jruby.RubyProc.call19(RubyProc.java:281)", "org.jruby.RubyProc$INVOKER$i$0$0$call19.call(RubyProc$INVOKER$i$0$0$call19.gen)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)", "org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:224)", "rubyjit$LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247$block_0$RUBY$file.call(rubyjit$LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247$block_0$RUBY$file)", "org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)", "org.jruby.runtime.Block.yield(Block.java:142)", "org.jruby.RubyHash$13.visit(RubyHash.java:1355)", "org.jruby.RubyHash.visitLimited(RubyHash.java:648)", "org.jruby.RubyHash.visitAll(RubyHash.java:634)", "org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1306)", "org.jruby.RubyHash.each_pairCommon(RubyHash.java:1351)", "org.jruby.RubyHash.each19(RubyHash.java:1342)", "org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVOKER$i$0$0$each19.gen)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)", "rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:223)", "rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:161)", "org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)", "org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)", "rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.chained_0_rescue_1$RUBY$SYNTHETIC__file__(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:369)", "rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb)", "rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb)", "org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)", "org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)", "org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)", "org.jruby.ast.BlockNode.interpret(BlockNode.java:71)", "org.jruby.ast.WhileNode.interpret(WhileNode.java:131)", "org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)", "org.jruby.ast.BlockNode.interpret(BlockNode.java:71)", "org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)", "org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)", "org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:219)", "org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)", "org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)", "org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)", "org.jruby.ast.BlockNode.interpret(BlockNode.java:71)", "org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)", "org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)", "org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)", "org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)", "org.jruby.runtime.Block.call(Block.java:101)", "org.jruby.RubyProc.call(RubyProc.java:300)", "org.jruby.RubyProc.call(RubyProc.java:230)", "org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)", "java.lang.Thread.run(Thread.java:748)"]}
Exception in thread "[main]>worker15" java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.elementData(ArrayList.java:418)
at java.util.ArrayList.remove(ArrayList.java:495)
at org.logstash.FieldReference.parse(FieldReference.java:37)
at org.logstash.PathCache.cache(PathCache.java:37)
at org.logstash.PathCache.isTimestamp(PathCache.java:30)
at org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:122)
at org.logstash.ext.JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.call(JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.gen)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
at rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.block_1$RUBY$file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb:326)
at rubyjit$LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247$block_1$RUBY$file.call(rubyjit$LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247$block_1$RUBY$file)
at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
at org.jruby.runtime.Block.yield(Block.java:142)
at org.jruby.RubyHash$13.visit(RubyHash.java:1355)
at org.jruby.RubyHash.visitLimited(RubyHash.java:648)
at org.jruby.RubyHash.visitAll(RubyHash.java:634)
at org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1306)
at org.jruby.RubyHash.each_pairCommon(RubyHash.java:1351)
at org.jruby.RubyHash.each19(RubyHash.java:1342)
at org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVOKER$i$0$0$each19.gen)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
at rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb:326)
at rubyjit.LogStash::Filters::KV$$filter_ed7fe6468ef7edb99f1600863745e9b6ad8a191d1173230247.file(/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-4.0.0/lib/logstash/filters/kv.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:201)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:183)
at rubyjit.LogStash::Filters::Base$$do_filter_8e8403dcfdf01a35ffca12ed35ec4e79455489071173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:145)
at rubyjit.LogStash::Filters::Base$$do_filter_8e8403dcfdf01a35ffca12ed35ec4e79455489071173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:201)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)
at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)
at rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:164)
at rubyjit$LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247$block_0$RUBY$file.call(rubyjit$LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247$block_0$RUBY$file)
at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
at org.jruby.runtime.Block.yield(Block.java:142)
at org.jruby.RubyArray.eachCommon(RubyArray.java:1606)
at org.jruby.RubyArray.each(RubyArray.java:1613)
at org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
at rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:161)
at rubyjit.LogStash::Filters::Base$$multi_filter_6c0dea8219a042f89f5a5b41e60697f8088a7c451173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
at rubyjit.LogStash::FilterDelegator$$multi_filter_43640ebf68de601b56cb618392ab9de0b4f8c58a1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb:43)
at rubyjit.LogStash::FilterDelegator$$multi_filter_43640ebf68de601b56cb618392ab9de0b4f8c58a1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/filter_delegator.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
at org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
at org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:64)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
at rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:370)
at rubyjit$LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247$block_0$RUBY$file.call(rubyjit$LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247$block_0$RUBY$file)
at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.RubyProc.call19(RubyProc.java:281)
at org.jruby.RubyProc$INVOKER$i$0$0$call19.call(RubyProc$INVOKER$i$0$0$call19.gen)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
at rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.block_0$RUBY$file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:224)
at rubyjit$LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247$block_0$RUBY$file.call(rubyjit$LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247$block_0$RUBY$file)
at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
at org.jruby.runtime.Block.yield(Block.java:142)
at org.jruby.RubyHash$13.visit(RubyHash.java:1355)
at org.jruby.RubyHash.visitLimited(RubyHash.java:648)
at org.jruby.RubyHash.visitAll(RubyHash.java:634)
at org.jruby.RubyHash.iteratorVisitAll(RubyHash.java:1306)
at org.jruby.RubyHash.each_pairCommon(RubyHash.java:1351)
at org.jruby.RubyHash.each19(RubyHash.java:1342)
at org.jruby.RubyHash$INVOKER$i$0$0$each19.call(RubyHash$INVOKER$i$0$0$each19.gen)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
at rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:223)
at rubyjit.LogStash::Util::WrappedSynchronousQueue::ReadBatch$$each_39448e3b53418e12c6a9b40cc27889a1b2905b7d1173230247.file(/usr/share/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:161)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
at rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.chained_0_rescue_1$RUBY$SYNTHETIC__file__(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:369)
at rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb)
at rubyjit.LogStash::Pipeline$$filter_batch_2b9b0fb1ba7f58d36d479ab8384717f4c3fd58e51173230247.file(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:219)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
at org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
at org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.RubyProc.call(RubyProc.java:230)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
at java.lang.Thread.run(Thread.java:748)

This has never happened with our old version of ELK. Any idea on what's wrong with our kv filter? Or is this just a bug for 5.4?

Documentation for https://github.com/logstash-plugins/logstash-filter-kv/pull/31 not showing

#31 is a wonderful feature addition to kv, but the documentation https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html does not seem to contain documentation for transform_key or transform_value

They seem to be properly documented here https://github.com/logstash-plugins/logstash-filter-kv/edit/master/lib/logstash/filters/kv.rb, which is where the docs should source from.

Perhaps this is less of a bug and more a request to rebuild the docs.

Wrong parsing of input with quotes in value

  • Version: 4.0.0
  • Operating System: rancher-os 0.9.0
  • Config File:
input {
  beats {
    port => 5044
  }
  gelf {
    type => "docker-container"
  }
}

filter {
  if [type] == "syslog" {
    grok {
      match => { 
          "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:message}" 
      }
      overwrite => ["message"]
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  } else if [type] == "docker" or [type] == "system-docker" {
    kv {
    }

    date {
      match => ["time", "ISO8601"]
    }
    mutate {
      rename => ["level", "log_level"]
      rename => ["msg", "message"]
      remove_field => ["time"]
    }
  }
}
output {
      elasticsearch {
        hosts => ["elasticsearch.service.x:9200"]
        document_type => "%{[@metadata][type]}"
      }
      stdout { codec => rubydebug }
}
  • Sample Data:
time="2017-04-26T13:56:50.500679286Z" level=error msg="Failed to log msg \"[I] 2017-04-26T13:56:50Z Post http://cdcdd70ec58e:9092/write?consistency=&db=telegraf&precision=ns&rp=autogen: dial tcp: lookup cdcdd70ec58e on 169.254.169.250:53: no such host service=subscriber\" for logger gelf: gelf: cannot send GELF message: write udp 10.0.30.31:38072->10.0.30.31:12201: write: connection refused" 
  • Example of output:
{
                                         "offset" => 5621064,
                                     "input_type" => "log",
                                      "log_level" => "error",
                                         "source" => "/var/log/docker.log",
    "http://system-node-1:9092/write?consistency" => "&db=telegraf&precision=ns&rp=autogen:",
                                        "message" => "Failed to log msg \\",
                                           "type" => "docker",
                                           "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
                                     "@timestamp" => 2017-04-26T14:05:57.004Z,
                                        "service" => "subscriber\\\"",
                                       "@version" => "1",
                                           "beat" => {
        "hostname" => "system-node-1",
            "name" => "system-node-1",
         "version" => "5.2.1"
    },
                                           "host" => "system-node-1"
}

kv should always overwrite the target field

(This issue was originally filed by @avleen at elastic/logstash#1315)


In the event that kv has nothing to do, it doesn't write the target field.
If your source and target are the same, this can be a problem: You expect the target to always be a hash but when kv took no action, it's still a string.

This in turn breaks inserting things into ES if ES is expecting an object.

Pipeline crash

When running the KV filter against some particularly nasty data, we are able to crash the pipeline:

Exception in thread "Ruby-0-Thread-13@[main]>worker5: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:384" java.lang.ArrayIndexOutOfBoundsException: -1
	at java.util.ArrayList.elementData(ArrayList.java:422)
	at java.util.ArrayList.remove(ArrayList.java:499)
	at org.logstash.FieldReference.parse(FieldReference.java:167)
	at org.logstash.FieldReference.parseToCache(FieldReference.java:142)
	at org.logstash.FieldReference.from(FieldReference.java:74)
	at org.logstash.ConvertedMap.convertKey(ConvertedMap.java:101)
	at org.logstash.ConvertedMap.access$000(ConvertedMap.java:23)
	at org.logstash.ConvertedMap$1.visit(ConvertedMap.java:34)
	at org.logstash.ConvertedMap$1.visit(ConvertedMap.java:28)
	at org.jruby.RubyHash.visitLimited(RubyHash.java:662)
	at org.jruby.RubyHash.visitAll(RubyHash.java:647)
	at org.logstash.ConvertedMap.newFromRubyHash(ConvertedMap.java:68)
	at org.logstash.ConvertedMap.newFromRubyHash(ConvertedMap.java:63)
	at org.logstash.Valuefier.lambda$initConverters$11(Valuefier.java:142)
	at org.logstash.Valuefier.convert(Valuefier.java:73)
	at org.logstash.ext.JrubyEventExtLibrary$RubyEvent.ruby_set_field(JrubyEventExtLibrary.java:95)
	at org.logstash.ext.JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.call(JrubyEventExtLibrary$RubyEvent$INVOKER$i$2$0$ruby_set_field.gen)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:358)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:195)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:323)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:83)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:348)
	at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:173)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:177)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:332)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:83)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:348)
	at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:173)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:177)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:332)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)
	at org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)
	at org.jruby.runtime.IRBlockBody.doYield(IRBlockBody.java:186)
	at org.jruby.runtime.BlockBody.yield(BlockBody.java:116)
	at org.jruby.runtime.Block.yield(Block.java:165)
	at org.jruby.RubyArray.each(RubyArray.java:1734)
	at org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)
	at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:498)
	at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:77)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:83)
	at org.jruby.ir.instructions.CallBase.interpret(CallBase.java:428)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:355)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:83)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:83)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)
	at org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:73)
	at org.jruby.runtime.Block.call(Block.java:124)
	at org.jruby.RubyProc.call(RubyProc.java:289)
	at org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:63)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:204)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:83)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
	at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:163)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:314)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:89)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:214)
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:200)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:208)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:193)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:323)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:73)
	at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:132)
	at org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:148)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:73)
	at org.jruby.runtime.Block.call(Block.java:124)
	at org.jruby.RubyProc.call(RubyProc.java:289)
	at org.jruby.RubyProc.call(RubyProc.java:246)
	at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104)
	at java.lang.Thread.run(Thread.java:748)
  • Version: logstash 6.2.2, kv 4.1.0
  • Operating System: Linux 4.13.0-36-generic #40-Ubuntu SMP Fri Feb 16 20:07:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Config File (if you have sensitive info, please remove it):
filter {
  base64 {
    action => "decode"
    field => "message"
  }
  kv {
    recursive => "false"
    source => "message"
    target => "kv"
    transform_key => "lowercase"
    value_split => "="
    field_split => "&?"
    trim_key => " \n\r\t"
  }
}
output { stdout {codec => rubydebug} }
  • Sample Data: The data below is base64 encoded the config above uses the base64 plugin to decode it.
    REI+Sy5paSlGIjUrISdqNW5yWWEmZVdeRltXbS1sTzJbO1U8PyxfIk9MRT9qRFZGQDJRKFlHZ0w2T09gZWVNQ1JjOmN9MHZjPih6JU5WeCUxZHJCNXQ2ODs9dF1RbkUyQH4nMTY0SWd+IEl5XiQ4TmA4K1gncz5uYUM8YGJsQDxtbyVAeVtzckEhQzQyX3Q9T1ZSdSZxfSkqMFZXaE43TjtvQyIoM3VDIHVDIGIzLSsqcWt9NTp3JTdeYHEvWXU4QVh4TmdeZV5ifCQlImx+MjBMe3JLe2R5VWYgQDI2LT5TLU9OUSR3fidjPktoPnEwQDd9JTVGI3l7T3c4OU1yK2d+UjQgU01AelkqYncncnxCQT1jPmk2PEJ1T3BjfjZmOD17NFxLJDhOJCNQI2h1WFtKX1NbOlhjQkcyTU11aVFvLCxgTT8vUXNOa1thSDFVNTZXJkJkJGdDKkxAXEdDVDggJnJhKClOIGQzaSA7YD95PVlBYSE8cFd4M188dFBOTit6O0tMMVZHQEAuOzlKbSx9PSh3Z1prQWRmLUkoXy1lQ0goL1tnfXdnYSctUDliNVQjbF1ZOX0/QiE4SmQraXJKWDJaKm4gX31zMntJfiBTJVIkVnNwK25EJ3JOYSYmQSRFZ35dfTleRmR9KS4zIzh1QlxUIDgmQ0x8WklrNndcViNjaCZ4JEhSUWRkLG93K2Jee2MmYz9+PGNrNW9IJGlYVkF7bm1iZE9cfEgyb3gpJD9MRUc+MXJ0IjZ7IUh8WidAJ0ZtVTNsLyNxenMzeGtpMU5cVTF4fCs/amUxKXd8fG4xJS9+PkYlYk5hb2svUGk0c2w+VjcjMSNfIlBIVz00IC1YdFVyfmhzdjlwS2J1NS85eEEoWTUyKWVOQEgoNmBoSnU1JE1xTTgrdkZZN2MuYilqTkxITHZCUiM9aEtGIG83L2A0clhzOXteekxDTH44OjFpU01KfmNzOXY+MC83OiFcRj8kIzFsWzVyZV5hdUZAImBhcDh+cWQwIFJHMmhffDs9fUBdR0w3V15JOkMzbEc7VSd5MjhTTjtOJW0hUXFEPExcRHJJVDs/ZGlcfXh1OS5HYUAhZ0JfSGtLIEAieC1YMzZ0TEhgQWFhVXBCdyR9U0VIN3NtPDQqeHE4aCZjemUqfWsuS1I0PVthbzQ+LVJjM2NuS1xXMl1NREo0Uzg1PkQwVTloWlIpaF5ec35hUnl5KFIpfj4zNWtnNEczdF1DXWlAI28kbHRuTicjOlkgXmY2OzYkYF5baXw6Lj9vR0BiY1hqRlRkI25HPj1nMnA+NHhDVkJlTj5uLDZeTkFVYzpSakZPOlk4QFdZQFghT2NTPT9FUzhDRzFTIz4wZEY/PDR8TzM0dW8wLX5RIll1SDxkVmxhVzJwbTgjN2x6PD5NVT5Cc2F8VypAaTxPRF8rPltrW3IlN0JbVWwqXl5Ae2xzL3VRfWJ+T1U1cVF8fHkiI0deX3xNPFlBdFZBY0U6dVs7fWE4c1BQT05gV3s3I1hKQil2VWcscVc/NCJdXFBeLUNbejhFUiZnZSZbPUJ1PHFcczw+JEJ6NFR8RGNSZGNVQn5oLiNlTk8hR2twSGNQXiZ4bVJMXEghPUErIzZwfWx9N0ZqZj1ILEpwe35IWS0vRCs/M1cxPFpzSzVuT0w5Im4oTXNkMj8kPH4vOXc1T1gqckteRUtfWC1eL0wqN2pVKXEkRzRUMHR+IzU4KXkgVCRgMEheLzlRRWxUXGAmPDZAKXcuRX1uMUxTPTNoQkJmPHFCUFgxVkBRSntTflVAJ2ViIDV+WFNfVmQoQG8gQFBFLXNRUSQsPSJ9aHcwRGloWCo7MCFoX1B5UCtGaTkpJ0siSTxgKXhgTCJvdTg5aihNLis2dWxpSjRYIUNUPUJsSjpJaWhfVCo5IlJnZjR1Q3p8fUhzLl4pdF4qVXAzdm53SlJ1Q0BZeE5pYUU3IVcnT2hccVZlLVhUMkdAeSlhe09DTF0wbDxTQkNHZysoVl9tI0sse3VsYEdwQnwzMz5cWyx1fSFAOC87aS9EW0tSXTxRZDhIVG47YDM+SCVNb017NyFqPS1QRys4O3lrN0EqSCBBSyAgICAgICAgICAgICAgICAgICAgICAgIA==
  • Steps to Reproduce:
  1. Install the base64 plugin
  2. Run logstash with above config
  3. Send provided base64 blob to logstash.
  4. Enjoy your stacktrace.

KV whitespace trim should only do beginning/end of value

(This issue was originally filed by @markwalkom at elastic/logstash#2459)


If I have a dataset like this;

"sentence": "the quick brown fox, jumped over the duck", "author": "Mark Walkom"

And this Logstash config, which removes quotes and also trims whitespace;

input {
  stdin {}
}

filter {
  kv {
    trim => "\"\ "
    trimkey => "\"\ \(\)"
    field_split => ","
    value_split => ":"
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

The output I get is mashed together sentence and author values;

 echo ""sentence": "the quick brown fox jumped over the duck", "author": "Mark Walkom""|bin/logstash agent -f conf/kvbug.conf
Using milestone 2 filter plugin 'kv'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
       "message" => "sentence: the quick brown fox jumped over the duck, author: Mark Walkom",
      "@version" => "1",
    "@timestamp" => "2015-01-28T02:57:09.654Z",
          "host" => "bender.local",
      "sentence" => "thequickbrownfoxjumpedovertheduck",
        "author" => "MarkWalkom"
}

I realise that the docs explicitly say "A string of characters to trim from the value" which implies it'll trim these chars from the entire string irrespective of location, but given trim in the other filters only does the start and beginning of a field, it'd be great if it did the same for the kv filter.

kv filter: support escaped quotes

Migrated from JIRA: https://logstash.jira.com/browse/LOGSTASH-2272
which was replicated at elastic/logstash#1605

With the following config:

input { stdin { } }
filter { kv { } }
output {
  stdout {
    codec => 'json_lines'
  }
}

The following message:

foo="bar \"baz\""

Should create the following output:

{"message":"foo=\"bar \\\"baz\\\"\"","@version":"1","@timestamp":"1969-01-01T01:01:01.000Z","host":"host.example.net","foo":"bar \"baz\""}

But instead creates the following output:

{"message":"foo=\"bar \\\"baz\\\"\"","@version":"1","@timestamp":"1969-01-01T01:01:01.000Z","host":"host.example.net","foo":"bar \\"}

ArgumentError: invalid byte sequence

i am getting the following error in the log then logstash fails to accept new messages

{:timestamp=>"2014-08-26T07:58:32.480000-0700", :message=>"Exception in filterworker", "exception"=>#<ArgumentError: invalid byte sequence in UTF-8>, "backtrace"=>["org/jruby/RubyString.java:5127:in scan'", "/opt/logstash/lib/logstash/filters/kv.rb:209:inparse'", "/opt/logstash/lib/logstash/filters/kv.rb:175:in filter'", "(eval):245:ininitialize'", "org/jruby/RubyProc.java:271:in call'", "/opt/logstash/lib/logstash/pipeline.rb:262:infilter'", "/opt/logstash/lib/logstash/pipeline.rb:203:in filterworker'", "/opt/logstash/lib/logstash/pipeline.rb:143:instart_filters'"], :level=>:error}

Moved from https://logstash.jira.com/browse/LOGSTASH-2284

New issue in logstash-filter-kv-4.1.1,

I have below new issue which seems to be introduced by the change in logstash-filter-kv-4.1.1,
kv {
field_split => "\|"
}
Below message is log message
type=[emc]|cause=[key is good]|flag=[P]|msg=[asdf
ffff=
ddd=]
then it will convert to below json message:
"cause": "key is good",
"msg":"[asdf"
"ffff=":"ddd=]"

Expected result is
"cause": "key is good",
"msg":"[asdf\nffff=\nddd=]"

How to handle '=' in values, splitting on | but KV takes over all '=' not only the first

I'm parsing custom logs. Here's a snippet:
18.7.2012 9:05:57\t|C3|date=18.07.2012 09:05:57|acronym=BS|... |firstsignUpDate=30.07.2007|bibl001c=m|biblUDK675s=(038)33=111=163.6|....

My second (first one ist just mutate/gsub just to change C3 => cir=C3) filter applied is KV

kv {
	field_split => "|"
}

This works fine, until my field containts multiples '=', for example biblUDK675s=(038)33=111=163.6.
I've thought that after splitting with '|' only the part before first = should be taken as key.
Is there any option to tell KV that biblUDK675s is the key and (038)33=111=163.6 is the value?

LS is currently on 5.5.1 version and running on Windows server 2012 R2 x64

I'm writing here, because didn't got any reply on logstash discus forums. And this issue is for me in top priority.

kv on Hashes

It appears that this filter is written for the case where a transformation would need to be done on text or to an array. What if we already have a Hash and wish to flatten (merge) it with the event? I don't see any built in way without resorting to using Ruby. Would it be acceptable to add support for processing an existing Hash, or have I missed an easier way of merging a field containing a Hash with an event?

Unconfirmed: Downgraded from 4.2.0 to 4.1.2 and everything started working again, LS 6.4

From a SFDC ticket.

We upgraded logstash from 6.3 to 6.4 and now one of my pipelines isn't working correctly. I suspect it has something to do with the logstash-filter-kv and have tried downgrading that but it still doesn't work correctly.

I asked for clarification on isn't working correctly but got this answer and a closed ticket.

I downgraded logstash-filter-kv from 4.2.0 to 4.1.2 and everything started working again.

Not splitting on all instances of field_split_pattern

  • Version: Logstash 6.2.4 w/include KV filter
  • Operating System: MacOS
  • Config File
input {
  file {
    path => "/PATH/TO/KVTEST.txt"
    sincedb_path => "/dev/null"
    start_position => "beginning"
    ignore_older => 0
  }
}
filter {
  kv {
    field_split_pattern => "!!!!!"
  }
}
output {
  stdout {
    codec => rubydebug {}
  }
}
  • Sample Data:
svc=http!!!!!req="POST /digest.html HTTP/1.1" 9!!!!!msg="Javascript pattern detected! applet matches #1 in a1"
  • Result:
{
    "svc" => "http",
    "req" => "POST /digest.html HTTP/1.1\" 9!!!!!msg=\"Javascript pattern detected! applet matches #1 in a1",
}
  • Expected:
{
    "svc" => "http",
    "req" => "POST /digest.html HTTP/1.1\" 9",
    "msg" => "Javascript pattern detected! applet matches #1 in a1"
}

The result is the same with simple single character delimiter and using field_split.

Parsing data with empty values

It seems like the kv filter doesn't correctly parse empty values, such as key= (rather than key=value). I am using Logstash 2.3.4 and version 2.1.0 of logstash-filter-kv. Here is an example which best illustrates the problem:

echo 'foo= bar=baz' | /opt/logstash/bin/logstash -e 'input { stdin {} } filter { kv { source => "message" } } output { stdout { codec => rubydebug {} } }'
Settings: Default pipeline workers: 2
Pipeline main started
{
       "message" => "foo= bar=baz",
      "@version" => "1",
    "@timestamp" => "2016-08-30T23:56:57.137Z",
          "host" => "ip-10-254-2-234",
           "foo" => "bar=baz"
}

In this example, I would've expected {"foo" => "", "bar" => "baz"} but instead got {"foo" => "bar=baz"}.

New regex functionality problem

As recommended by @colinsurprenant this in a new issue. This problems appears with the new regex functionality #55.

I'm using LS 6.2.3. And here are my steps to reproduce.

My kv filter:

	kv {
		field_split_pattern => "\|"
		include_brackets => false
		value_split_pattern => "="
	}

Input:

1.2.2018 6:54:17	|C3|date=01.02.2018 06:54:17|acronym=ACRONYM|user=user|type=11|rptPackageStatus=0|transactionHostDepartment=01|membIdentificNumb=0000000|patronId=111|inventoryNo=000019088|cobissId=0000000|note=f|patronCategory=002|lastVisitDate=30.01.2018|schoolType=0|schoolName=00000|schoolDept=4.c|libraryCode=00000|libraryDept=|firstsignUpDate=16.01.2015|patronOccupation=|readingRoom=|bibl001c=m|biblUDK675s=61|biblLanguage101a=slv|biblType001b=a|biblTargetAudienceCode100e=a|parentDepartment=Sth|holdStatus=c|materialType=01|loanDate=01.02.2018|returnDate=15.02.2018|visitValid=0|visitTypeValid=0|

Output to console:

{
              "parentDepartment" => "Sth",
                "patronCategory" => "002",
                    "schoolName" => "00000",
                           "cir" => "C3",
                   "libraryDept" => "|firstsignUpDate=16.01.2015",
              "biblLanguage101a" => "slv",
                          "date" => 2018-02-01T05:54:17.000Z,
                    "holdStatus" => "c",
                   "inventoryNo" => "000019088",
             "membIdentificNumb" => "0000000",
                          "beat" => {
        "hostname" => "C3RAZVOJ",
         "version" => "6.2.3",
            "name" => "C3RAZVOJ"
    },
                    "prospector" => {
        "type" => "log"
    },
              "rptPackageStatus" => 0,
                    "schoolDept" => "4.c",
                          "note" => "f",
                   "libraryCode" => "00000",
              "patronOccupation" => "|readingRoom=",
                      "bibl001c" => "m",
                        "offset" => 2441,
                   "biblUDK675s" => "61",
                    "visitValid" => "0",
                      "@version" => "1",
                  "materialType" => "01",
                          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
                          "type" => 11,
                "visitTypeValid" => "0",
    "biblTargetAudienceCode100e" => "a",
                 "lastVisitDate" => 2018-01-29T23:00:00.000Z,
                    "returnDate" => 2018-02-14T23:00:00.000Z,
                       "country" => "si",
                          "host" => "C3RAZVOJ",
                      "loanDate" => 2018-01-31T23:00:00.000Z,
     "transactionHostDepartment" => "01",
                          "user" => "user",
                      "patronId" => "111",
                       "acronym" => "ACRONYM",
                       "message" => "1.2.2018 6:54:17\t|cir=C3|date=01.02.2018 06:54:17|acronym=ACRONYM|user=user|type=1
1|rptPackageStatus=0|transactionHostDepartment=01|membIdentificNumb=0000000|patronId=111|inventoryNo=000019088|cobissId=
0000000|note=f|patronCategory=002|lastVisitDate=30.01.2018|schoolType=0|schoolName=00000|schoolDept=4.c|libraryCode=0000
0|libraryDept=|firstsignUpDate=16.01.2015|patronOccupation=|readingRoom=|bibl001c=m|biblUDK675s=61|biblLanguage101a=slv|
biblType001b=a|biblTargetAudienceCode100e=a|parentDepartment=Sth|holdStatus=c|materialType=01|loanDate=01.02.2018|return
Date=15.02.2018|visitValid=0|visitTypeValid=0|",
                      "cobissId" => "0000000",
                    "@timestamp" => 2018-03-21T13:43:35.905Z,
                        "source" => "g:\\elasticStack\\data\\test.log",
                    "schoolType" => "0",
                  "biblType001b" => "a"
}

If I'm slitting by | how can I get | in values? The splitted parts should be splitted again for key value pairs.

Add an option to lowercase all created fields

The idea is to add an option to make the key names lowercase before creating fields.

An example of when this would be useful is when using kv to parse URL parameters

....&foo=bar&ip=127.0.0.1

could be accepted by the web application the exact same way if it was

...&FoO=bar&IP=127.0.0.1

but kv would create 4 distinct fields.

Field values can be lowercased with mutate, but there is no way to lowercase field names generated during parsing without knowing what the will be beforehand.

trim_value only trimming single character at each end

As of 5045f43, not only is trim_value (formerly trim) trimming only at the ends, it also it only trimming single characters at each end. Ironically, the one use mentioned in the documentation (postfix), tends to break with this.

Sample:

D30EF77C: to=<[email protected]>, orig_to=<[email protected]>, relay=mail.example.com[private/dovecot-lmtp], delay=2.2, delays=1.9/0.01/0.01/0.21, dsn=2.0.0, status=sent (250 2.0.0 <[email protected]> A9wMBo9DwVqWBQAA3ZeTBg Saved)

After using https://github.com/whyscream/postfix-grok-patterns with this, at some point it filters "to=[email protected]," through kv, as such:

kv {
    source       => "postfix_keyvalue_data"
    trim_value   => "<>,"
    prefix       => "postfix_"
    remove_field => [ "postfix_keyvalue_data" ]
}

The result is: [email protected]>
Note the trailing >.

I would expect all matching leading and trailing characters to be trimmed. If this is the intended behavior, perhaps the documentation could be clarified.

  • Version: docker.elastic.co/logstash/logstash-oss:6.2.3
    That is: logstash 6.2.3 with logstash-filter-kv (4.1.0)
  • Operating System: Docker (on Debian Linux Stretch)

KV filter multi-character field/value splitters

For the KV filter the value_split/field_split options are characters that "...form a regex character class". This means they are only single characters.

Please turn this into a full REGEX and not a character class, as this would allow multi-character delimiters and avoid having to reserve 2 ASCII characters (only 128 of them) for them, and make the KV filter a lot less fragile.

People can still use character classes inside the REGEX if they really have multiple delimiters values for field and value splitting.

Add `field_split_char` and `value_split_char` that properly do escaping.

Injecting verbatim input from field_split and value_split directives into character classes is prone to user error, requiring our users know which characters are meaningful in the regexp context and properly escaping them in their pipeline configs.

When config.support_escapes is enabled, users need to double-escape, which makes things extra tricky.

This ticket is to add two new directives (field_split_char and value_split_char), which will properly escape inputs before generating the character classes.

Ignore keys with a particular value

Add a feature in order to ignore keys with a empty value or which match with a regex value.

I can use the prune plugin but I ignore the keys list generated by this plugin.

Allow spaces in value while field splitting on space

I have kv pairs like:
rt=Mar 25 2016 15:15:46 GMT src=10.251.1.198 cs3Label=Virtual System cs3=vsys1

... where the values themselves may have spaces in them but also where the space splits the fields. So that the fields would be like:

rt="Mar 25 2016 15:15:46 GMT"
src="10.251.1.198" 
cs3Label="Virtual System"
cs3="vsys1"

(The result would be similar to the cef codec.)

KV Filter Generates Pathological Regex for Simple Config

The following config

   kv {
        source => "PayloadParams"
        value_split => "="
        allow_duplicate_values => false
        target => "[powershell][param]"
        include_keys => [ "name", "value" ]
      }

results in

{:regex=>"/(?-mix:((?:\\\\.|[^= ])+))(?-mix:(?-mix: *)(?-mix:[=])(?-mix: *))(?-mix:(?-mix:(?-mix:\")(?-mix:((?:\\\\.|[^\"])+))?(?-mix:\"))|(?-mix:(?-mix:')(?-mix:((?:\\\\.|[^'])+))?(?-mix:'))|(?-mix:(?-mix:\\()(?-mix:((?:\\\\.|[^\\)])+))?(?-mix:\\)))|(?-mix:(?-mix:\\[)(?-mix:((?:\\\\.|[^\\]])+))?(?-mix:\\]))|(?-mix:(?-mix:<)(?-mix:((?:\\\\.|[^>])+))?(?-mix:>))|(?-mix:((?:\\\\.|[^ ])+)))?(?-mix:(?-mix:[ ])|(?-mix:$))/"}

being generated.

This regex appears to explode in runtime (as in locking up in a hot loop for minutes on a O(1k) length string) for longer strings with its overlapping sections. I think if the split pattern is not a regex but a constant string like = and there should be a way of not introducing overlapping matching sections here and get linear performance.

kv split data within value

kv parses just fine values within double quotes (and probably also with single quote). However, it parses the value as key-value if the value contains equal signs. This can be reproduced with an empty kv filter such as:

filter { kv { } }

With the following item, the value in cgfattr get split further and we end up with weird keys in Elastic Search:

date=2015-05-18 time=14:13:48 devname=FG140XXXX devid=FG140XXXXT logid=0100044547 type=event subtype=system level=information vd="VirtualDomain" logdesc="Configure object attribute" user="user1" ui="jsconsole" action=Edit cfgtid=1790967809 cfgpath="system.wccp" cfgobj="101" cfgattr="password[ENC K7taRRarXYdpNvARTqktIeNcecPbJB6gsRQPLKjjftFAj81qnhoGStE4PKI9PGjYodn/Z/f26bcGG0FDpsq4scGzMONwrNuV973xkizVF/YawO8kDmdAlCJeFVHJG99J1gwVhxqjz6cmWSF5aI6FcgAfyk4gjh4yJe0p/oWks3bXxCT2Q/6juahXAIqBtIY9ZJCMJw==->ENC K7taRTt0SmYF1SbAdZes1UJbKzwzFyD/0nrlQ6JHH0Dir6kdtDCtdrT5f9/GfwxmkmAS7hNS+Tidmrrczxf1FNdedgIQIt6gVx+C1J63RWtOp+D68aDScOgBXkO05An3o8EGo4+GyYIr1yUtG1QEGYlbJ7ZhTkYNmpxCi55PdHFZeGROMjvhfB7cSwsGjHpskFk4ug==]" msg="Edit system.wccp 101"

kv support for foo=bar(baz)

Dear KV filter support,

Can you please add support for the following string in a kv filter
foo=bar(baz)
as => foo=bar
newvariable=baz

or anything that is appropriate to be able to parse out (baz) into an new variable for use.

best regards,
amit

Add possibility not only to return a key-value array, but an array of objects with each key and its value as values

I am currently parsing url-queries that result into the known key-value-objects:

Parsed: ?ab=hello&cd=world&ef=123

{
   "ab": "hello",
   "cd": "world",
   "ef": "123"
}

As an additional feature, i suggest to provide a flag, which lets "kv" return an array of objects like:

[
  {
    "key": "ab",
    "value": "hello"
  },
  {
    "key": "cd",
    "value": "world"
  },
  {
    "key": "ef",
    "value": "123"
  },
]

That way it would be possible to write elasticsearch queries for nested types.

field_split_pattern does not support matching literal backslashes

GIVEN:

  • an input string with literal backslashes as a part of the field separator (e.g., "foo=bar\r\nbaz=bingo")
  • a pipeline configuration that attempts to split using properly-escaped backslashes:
    filter {
      kv {
        field_split_pattern => "\\r\\n"
      }
    }
    

EXPECT:

  • field to be split on exact match of the pattern

ACTUAL:

  • field is not split

CAUSE:

values having XML documents break unexpected making not valid keys

With the introduction of 73af431 a message like:

foo=bar|time=zero|payload=<?xml version=\"1.0\" encoding=\"UTF-8\"?> <S:Envelope xmlns:xs=\"http://www.w3.org/2001/XMLSchema\">

is break into KV pairs like

 foo="bar"
time="zero"
payload ="<?xml version=\"1.0\" encoding=\"UTF-8\"?> "
<S:Envelopexmlns:xs=...

obviously <S:Envelopexmlns:xs is not expected as key, and should be part of the payload, but the new regexp feature introduced it things is a new thing to break is pairs.

kv filter: case insensitive keys

Would be nice if kv filter include_keys / exclude_keys could support an option to treat keys as case insensitive. It would make it easy to use kv to parse http headers.

KV filter splitting on a field_split value within data

(This issue was originally filed by @markwalkom at elastic/logstash#2458)


If I have a KV document similar to this;

"sentence": "the quick brown fox, jumped over the duck", "author": "Mark Walkom"

And a LS config like;

input {
  stdin {}
}

filter {
  kv {
    trim => "\""
    trimkey => "\"\ \(\)"
    field_split => ","
    value_split => ":"
    }
}

output {
  stdout {
    codec => rubydebug
  }
}

The sentence field is split at the first comma which ends up dividing the string value and the loss of the data in the output;

$ echo ""sentence": "the quick brown fox, jumped over the duck", "author": "Mark Walkom""|bin/logstash agent -f kvbug.conf
Using milestone 2 filter plugin 'kv'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
       "message" => "sentence: the quick brown fox, jumped over the duck, author: Mark Walkom",
      "@version" => "1",
    "@timestamp" => "2015-01-28T02:47:22.726Z",
          "host" => "bender.local",
      "sentence" => " the quick brown fox",
        "author" => " Mark Walkom"
}

Ideally it shouldn't do this and my sentence field would be complete.

All jobs failling

https://travis-ci.org/logstash-plugins/logstash-filter-kv/builds/236500782

Failures:
  1) LogStash::Filters::KV keys without values (reported in #22) key and splitters with no value should ignore the incomplete key/value pairs
     Failure/Error:
       expect(event.to_hash.keys.sort).to eq(
         ["@timestamp", "@version", "AccountStatus", "IsSuccess", "message", "tags"])
     
       expected: ["@timestamp", "@version", "AccountStatus", "IsSuccess", "message", "tags"]
            got: ["@timestamp", "@version", "AccountStatus", "IsSuccess", "message"]
     
       (compared using ==)
     # ./spec/filters/kv_spec.rb:688:in `(root)'
     # /home/travis/.rvm/gems/jruby-1.7.25/gems/rspec-wait-0.0.9/lib/rspec/wait.rb:46:in `(root)'
  2) LogStash::Filters::KV remove_char_key/remove_char_value options : remove all characters in keys/values whatever their position key and value with leading, trailing and middle spaces should remove all spaces
     Failure/Error:
       expect(event.to_hash.keys.sort).to eq(
         ["@timestamp", "@version", "key1", "key2withspaces", "message", "tags"])
     
       expected: ["@timestamp", "@version", "key1", "key2withspaces", "message", "tags"]
            got: ["@timestamp", "@version", "key1", "key2withspaces", "message"]
     
       (compared using ==)
     # ./spec/filters/kv_spec.rb:748:in `(root)'
     # /home/travis/.rvm/gems/jruby-1.7.25/gems/rspec-wait-0.0.9/lib/rspec/wait.rb:46:in `(root)'
  3) LogStash::Filters::KV trim_key/trim_value options : trim only leading and trailing spaces in keys/values (reported in #10) key and value with leading, trailing and middle spaces should trim only leading and trailing spaces
     Failure/Error:
       expect(event.to_hash.keys.sort).to eq(
         ["@timestamp", "@version", "key1", "key2 with spaces", "message", "tags"])
     
       expected: ["@timestamp", "@version", "key1", "key2 with spaces", "message", "tags"]
            got: ["@timestamp", "@version", "key1", "key2 with spaces", "message"]
     
       (compared using ==)
     # ./spec/filters/kv_spec.rb:718:in `(root)'
     # /home/travis/.rvm/gems/jruby-1.7.25/gems/rspec-wait-0.0.9/lib/rspec/wait.rb:46:in `(root)'
Finished in 0.741 seconds (files took 4.63 seconds to load)
43 examples, 3 failures
Failed examples:
rspec ./spec/filters/kv_spec.rb:684 # LogStash::Filters::KV keys without values (reported in #22) key and splitters with no value should ignore the incomplete key/value pairs
rspec ./spec/filters/kv_spec.rb:744 # LogStash::Filters::KV remove_char_key/remove_char_value options : remove all characters in keys/values whatever their position key and value with leading, trailing and middle spaces should remove all spaces
rspec ./spec/filters/kv_spec.rb:714 # LogStash::Filters::KV trim_key/trim_value options : trim only leading and trailing spaces in keys/values (reported in #10) key and value with leading, trailing and middle spaces should trim only leading and trailing spaces
Randomized with seed 9365
The command "ci/build.sh" exited with 1.
cache.2
store build cache
$ bundle clean
Cleaning all the gems on your system is dangerous! If you're sure you want to remove every system gem not in this bundle, run `bundle clean --force`.
0.00s
0.81snothing changed, not updating cache
Done. Your build exited with 1.

kv filter gets stuck on single message, 100% CPU

I have a pipeline with filebeat input (/var/log/messages on my lab syslog server), and the pipeline had kv filter running against every message. Many of the messages have K=V readings and metrics from various systems in my lab so it was useful. But every now and then logstash would jump to 100% CPU. top -H helped me pinpoint the pipeline, and within the pipeline I ruled things out one by one until I narrowed it down to kv. Later I noticed that when it's happening, if I systemctl restart logstash.service, the logstash-plain.log would record a series of very detailed errors about the kv thread that was stuck[2].

The syslog message that it appears to be chewing on (the line in the remote filebeat server's syslog file that immediately followed the last one received from the pipeline) is an ugly cruft[3], which must contain some character or sequence that triggers the kv filter to go into the locked state. I have captured the log message verbatim in a gist [3].

OS: CentOS 7.5.1804 VM, 4-CPU, 16G memory
occured in both of the following versions of logstash:
logstash-6.7.1-1.noarch
logstash-6.7.2-1.noarch

[2] Errors from logstash-plain.log showing the kv thread stuck. This correlates with the 100% CPU consumption and I don't otherwise see these errors in the log.

[2019-07-25T03:27:12,434][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}
[2019-07-25T03:27:17,459][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}
[2019-07-25T03:27:22,485][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>223, "name"=>"[syslogs]<beats", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-input-beats-5.1.8-java/lib/logstash/inputs/beats.rb:212:in `run'"}], ["LogStash::Filters::KV", {"prefix"=>"kv_", "source"=>"message", "id"=>"be9b7b970d5afb75eb7c87a8a87e0b13280bc676e4259cf68c0a86cd6e7c940f"}]=>[{"thread_id"=>222, "name"=>"[syslogs]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/logstash-filter-kv-4.3.0/lib/logstash/filters/kv.rb:555:in `scan'"}]}}

[3] [https]://gist.github.com/regulatre/704713c1b2730ec877d33a06c89355ba

Add 'coerce' property to cast decimals to integers

Some logs like tor logs have time stamp with decimal in it like "timestamp": "1547308761.076351". Need such a feature to cast it to integer so that Elasticsearch accepts it in a valid timestamp format.

Not all fields are being split - change in behavior from 4.1.1 to 4.2.1

  • Version: Logstash 6.4.2 with plugin: logstash-filter-kv (4.2.1)
  • Operating System: MacOSX/Linux
  • Config File (if you have sensitive info, please remove it):
input {
   stdin {}
}
filter {
    kv {
        field_split => "|"
        trim_value => "[\\]"
    }
}
output {
    stdout { codec => "rubydebug" }
}
  • Sample Data:
    <14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\MyType Value\|Error=\Error Value\|RetCode=123|Dir=Northwest|headerFrom=|[email protected]|[email protected]|Act=Hello|RejInfo=\rej infor value\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256

  • Steps to Reproduce: cat the sample data to the pipeline and observe several fields are not split with the "|" separator.

EXAMPLES:
Test input data:
<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\MyType Value|Error=\Error Value|RetCode=123|Dir=Northwest|headerFrom=|Sender=[email protected]|Rcpt=[email protected]|Act=Hello|RejInfo=\rej infor value|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256

Logstash conf file:

input {
stdin {}
}
filter {
kv {
field_split => "|"
trim_value => "[\]"
}
}
output {
stdout { codec => "rubydebug" }
}

**Logstash 6.2.4 with plugin: logstash-filter-kv (4.1.1)

6.2.4 output**

{
        "@version" => "1",
            "host" => "Bobs-MacBook-Pro-2.local",
    "<14>datetime" => "2018-10-18T12:59:10-0400",
          "aField" => "valueofacode",
             "xyz" => "valueofacc",
              "IP" => "192.168.1.100",
           "Error" => "Error Value",
         "RetCode" => "123",
             "Act" => "Hello",
             "Dir" => "Northwest",
            "Rcpt" => "[email protected]",
         "RejInfo" => "rej infor value",
          "Sender" => "[email protected]",
          "MyType" => "MyType Value",
          "TlsVer" => "TLSv1.2",
      "@timestamp" => 2018-10-19T00:31:54.401Z,
            "Cphr" => "THIS_IS_MY_CPHR_256",
         "message" => "<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\\MyType Value\\|Error=\\Error Value\\|RetCode=123|Dir=Northwest|headerFrom=|[email protected]|[email protected]|Act=Hello|RejInfo=\\rej infor value\\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256"

}

**Logstash 6.4.2 with plugin: logstash-filter-kv (4.2.1)

6.4.2 output**

{
          "Sender" => "[email protected]",
            "Cphr" => "THIS_IS_MY_CPHR_256",
             "xyz" => "valueofacc",
            "host" => "Bobs-MacBook-Pro-2.local",
        "@version" => "1",
    "<14>datetime" => "2018-10-18T12:59:10-0400",
            "Rcpt" => "[email protected]",
              "IP" => "192.168.1.100",
      "@timestamp" => 2018-10-19T00:31:00.902Z,
          "MyType" => "MyType Value\\|Error=\\Error Value\\|RetCode=123",
             "Dir" => "Northwest",
          "aField" => "valueofacode",
         "message" => "<14>datetime=2018-10-18T12:59:10-0400|aField=valueofacode|xyz=valueofacc|IP=192.168.1.100|MyType=\\MyType Value\\|Error=\\Error Value\\|RetCode=123|Dir=Northwest|headerFrom=|[email protected]|[email protected]|Act=Hello|RejInfo=\\rej infor value\\|TlsVer=TLSv1.2|Cphr=THIS_IS_MY_CPHR_256",
             "Act" => "Hello",
         "RejInfo" => "rej infor value\\|TlsVer=TLSv1.2"
}

Add the possibility to create nested fields

It would be nice to create nested fields.

        kv {
          source       => "[postfix][keyvalue_data]"
          trim_value   => "<>,"
          prefix       => "[postfix]["
          suffix        => "]"
          remove_field => [ "[postfix][keyvalue_data]" ]
        }

Maybe a new option "suffix" can handle this case in combination with prefix?

Incorrect handling of value containing quotes

I am having problems with the kv filter parsing a value which contains double quotes. I have been able to reproduce my issue with the following RSpec test:

  describe "test quotes" do
    config <<-CONFIG
      filter {
        kv {
          field_split => ", "
          value_split => ": "
        }
      }
    CONFIG

    sample 'referrer: "https://www.google.com.au/search?hl=en&q="foobar" +41 2016 .ch&num=100&start=0"' do
      insist { subject.get("referrer") } == 'https://www.google.com.au/search?hl=en&q="foobar" +41 2016 .ch&num=100&start=0"'
    end
  end

Is there a way to fix this via configuration, or is it a bug?

Allow user to specify fields to overwrite for KV

The KV plugin will happily overwrite pre-existing fields in the event stream when splitting, including fields that have special meaning such as "type." Proposed feature set would include three options for kv:

  1. Allow all overwrites (current behavior)
  2. Disallow generated fields to overwrite existing fields.
  3. Allow user to specify an array of fields which kv generator can overwrite, still allowing new fields to be introduced.

From https://logstash.jira.com/browse/LOGSTASH-2236

Add 'tag_on_failure' property

It would be great if this filter could optionally add a custom tag upon failure, such as the grok or date filter. This would be extremely useful in a multi-tenancy environment where the Logstash instance is processing many different kv filter instances. It would allow you to isolate which kv filter failed, and also the specific tenant.

performance regression since 4.1.2

There is a performance regression starting at version 4.1.2.

Using the config and sample data below I am getting these throughput number (run on a 2.9 GHz Core i7 MBP). The tests were run on LS 6.3.2.

Version EPS
4.0.3 41k
4.1.2 27k
4.2.0 27k

How to reproduce:

  • sample.conf
input { stdin { codec => json_lines }}

filter {
  kv {
    field_split => "\t"
    include_brackets => false
    source => "message"
  }
}

output { stdout { codec => dots }}
  • sample.txt
{"message":"host_name=Member1\tsys_name=AA-BB-CCCC\tdevTimeFormat=MMM dd yyyy HH:mm:ss Z\tdevTime=Aug 16 2018 09:10:20 +0300\tpolicy=Default_https-proxy-00\tdisp=Allow\tin_if=000-AAA_Internal\tout_if=111-BBB_External\tgeo_dst=USA\tip_len=314\tip_TTL=64\tproto=tcp\tsrc=1.2.3.4\tsrcPort=12345\tsrcPostNAT=1.2.3.4\tdst=1.2.3.4\tdstPort=123\ttcp_offset=5\ttcp_flag=A\ttcp_seq=123456789\ttcp_window=1281\tapp=HTTP Protocol over TLS SSL\tapp_cat=Network protocols\tapp_behavior=Access\tmsg=Application identified"}
  • Command
$ yes `cat sample.txt` | bin/logstash -f sample.conf

I used the tool in https://github.com/elastic/logstash-benchmark-tools/tree/master/pq_blog to measure EPS.

Alternatively the pv command can be used to measure EPS from the dots codec output with:

$ yes `cat sample.txt` | bin/logstash -f sample.conf | pv -bart > /dev/null

Statistical error

Why the sample data and statistical data error to 5%

  • Version:5.1.1
  • Operating System:rhel 6.2
  • Steps to Reproduce
    ELK DB ELK-DB(error?
    1,644 1643 1
    837 836 1
    554 554 0
    463 463 0
    487 487 0
    678 678 0
    2,073 2073 0
    5,983 6021 -38
    19,491 19387 104
    22,766 22600 166
    20,634 20562 72
    18,600 18484 116
    14,386 14631 -245
    15,243 15244 -1
    19,783 19964 -181
    20,082 20006 76
    20,737 20517 220
    17,906 17934 -28
    12,164 11928 236
    9,004 8924 80
    8,096 8091 5
    7,257 7205 52
    5,490 5489 1
    3,278 3253 25

Total
247,636 246974 662

Restore removed settings as drepecated

#39 renamed some settings and did not keep the old settings as deprecated. Mistakes happen ;)

We need to do the following:

  • Restore the old settings and mark them 'deprecated' with a reference to use the new settings
  • Make the old settings still do the same behavior as they had before.

Ship a patch release (v4.0.1?) with this fix.

KV filter dropping existing fields in target object v2.0.2 -> v2.0.3

I didn't confirm this, but my suspicion is that the changes in the following commit seems to override any fields in the target object for the KV filter: refactor field refereces to not rely on in-place mutability. @colinsurprenant @ph

In my particular case, I add a few fields to an object in a grok filter, then I run the KV filter targeting that object to add the KV fields to the object. In version 2.0.2, the plugin does not drop the existing fields, in version 2.0.3, the plugin drops any fields in the target object before adding the KV pairs as fields.

  • Version: Logstash 2.2.1 and above, logstash-filter-kv 2.0.3 and above
  • Operating System: Oracle Enterprise Linux 7.2
  • Config File: See below
  • Sample Data: Apply to any log data, these filters just add the required example
  • Steps to Reproduce:
  1. Create a JSON object with some fields and values in a grok pattern or using mutate i.e.
mutate {
  add_field => [
    "[object][field1]", "value1",
    "[object][field2]", "value2"
  ]
}
  1. Now use that [object] as a target for the KV filter:
mutate {
  add_field => {
    "[kv-string]" => "&Packet-Type=Access-Accept&Session-Timeout=2573737&qnsService=STAFF_USER&Class=qnsService=STAFF_USER&Framed-IP-Address=172.18.100.01"
  } 
}
kv {
  source => "[kv-string]"
  target => "object"
  field_split => "\&"
  value_split => "="
}

What happens is the new key values exist in the [object], but existing fields [field1] and [field2] in the object are dropped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.