Code Monkey home page Code Monkey logo

Comments (12)

peter-slovak avatar peter-slovak commented on May 28, 2024 4

I can also confirm that avoiding reemission to the current label fixes the problem. The use case here is with Docker's partial_message, concatenating logs that were split up by standard Docker buffering.

If anyone is interested this is my current configuration snippet, working well in docker-ce 18.06.1, fluentd 1.7.0 and fluent-plugin-concat 2.4.0:

<source>
  @type forward
  # Some Docker logs can be split to 16 KB chunks, concat them first
  @label @CONCAT_PARTIAL
  port 24224
  bind 0.0.0.0
</source>

<label @CONCAT_PARTIAL>
  <filter **>
    @type concat
    key log
    stream_identity_key container_id
    partial_key partial_message
    partial_value true
    keep_partial_key false
    separator ""
    use_first_timestamp true
    flush_interval 5
    timeout_label @START
  </filter>

  # Send all logs to @START label, whether the concat attempt timed out or succeeded
  <match **>
    @type relabel
    @label @START
  </match>
</label>

<label @START>
# the rest of log processing pipeline
</label>

from fluent-plugin-concat.

okkez avatar okkez commented on May 28, 2024 1

Sorry for the delay. I missed the issue.

Try following diff:

diff --git a/fluentd_docker/config/output.conf b/fluentd_docker/config/output.conf
index 9029af3..531748f 100644
--- a/fluentd_docker/config/output.conf
+++ b/fluentd_docker/config/output.conf
@@ -1,10 +1,9 @@
-<label @SPLUNK>
- # extract index fields and sourcetype for container logs
+<label @TRANSFORM1>
+  # extract index fields and sourcetype for container logs
   <filter tail.containers.**>
     @type jq_transformer
     jq '.record | .container_id = (.source | capture("(?<container_id>[0-9a-f]{12})") | .container_id)'
   </filter>
-
   <filter tail.containers.**>
     @type concat
     key log
@@ -13,14 +12,26 @@
     multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
     flush_interval 5s
   </filter>
+  <match>
+    @type relabel
+    @label @SPLUNK
+  </match>
+</label>
 
- # = filters for docker config =
+<label @TRANSFORM2>
+  # = filters for docker config =
   <filter tail.config.**>
     @type jq_transformer
     jq ".record.sourcetype = \"docker:config\" | .record"
   </filter>
-    
- # = output =
+  <match>
+    @type relabel
+    @label @SPLUNK
+  </match>
+</label>
+
+<label @SPLUNK>
+  # = output =
   <match **>
     @type splunk_hec
     protocol https
diff --git a/fluentd_docker/config/source.containers.conf b/fluentd_docker/config/source.containers.conf
index 636c6ac..af4a05d 100644
--- a/fluentd_docker/config/source.containers.conf
+++ b/fluentd_docker/config/source.containers.conf
@@ -18,7 +18,7 @@
 <source>
   @id containers.log
   @type tail
-  @label @SPLUNK
+  @label @TRANSFORM1
   tag tail.containers.*
   path /var/lib/docker/containers/*/*-json.log
   pos_file /var/log/splunk-fluentd-containers.log.pos
diff --git a/fluentd_docker/config/source.docker.conf b/fluentd_docker/config/source.docker.conf
index e1c5246..43f43fe 100644
--- a/fluentd_docker/config/source.docker.conf
+++ b/fluentd_docker/config/source.docker.conf
@@ -2,7 +2,7 @@
 <source>
   @id config.v2 
   @type tail
-  @label @SPLUNK
+  @label @TRANSFORM2
   tag tail.config.*
   path /var/lib/docker/containers/*/config.v2.json
   pos_file /var/log/splunk-fluentd-docker-config.log.pos

In the original configuration, concat plugin configuration includes timeout_label @SPLUNK that will re-emit the event to <label @SPLUNK>.
At this time, re-emitted events via timeout and emitted events via normal flow will conflict and it will cause deadlock which is reported in this issue.

In the above diff, I set the proper label to apply the appropriate filter per <source>.
We should apply concat filter once.

from fluent-plugin-concat.

okkez avatar okkez commented on May 28, 2024

Which version of Fluentd and Ruby do you use?
Do you see the same errors without k8s?

from fluent-plugin-concat.

matthewmodestino avatar matthewmodestino commented on May 28, 2024

Hey Okkez!

Dockerfile for the images used is here:
https://github.com/splunk/fluent-plugin-splunk-hec/blob/master/docker/Dockerfile

We use Ruby 2.5.1-slim, concat 2.3.0 & fluentd 1.4.0 in our latest build. Don’t believe we saw this using ruby 2.5.1-slim with fluentd 1.2.x & concat 2.2.2 which we used in the previous version of our images.

Thanks for having a look!

from fluent-plugin-concat.

matthewmodestino avatar matthewmodestino commented on May 28, 2024

anything further I can help collect to assist with this?

from fluent-plugin-concat.

okkez avatar okkez commented on May 28, 2024

@matthewmodestino Thanks.
I want the reproducible scenario to debug this issue.
I don't have any idea to resolve this issue...

from fluent-plugin-concat.

matthewmodestino avatar matthewmodestino commented on May 28, 2024

Have you had the chance to test the plugin against fluent 1.4?

What is your preferred testing method?

If I provide steps using our docker container, is that ok? Or do you prefer testing some other way?

from fluent-plugin-concat.

okkez avatar okkez commented on May 28, 2024

I have tested this plugin using very simple configuration and logs with Fluentd 1.4.x.

I like bare metal environment for debugging.
But reproducible steps using docker containers are also OK.

from fluent-plugin-concat.

matthewmodestino avatar matthewmodestino commented on May 28, 2024

Hi @okkez,

Apologies for the delay.

I have pushed test configs to my repo, that I hope will make it easy for you to have a look at this.

I used docker for mac, but technically these configs should be able to run in any docker enviro. You can clone the repo:

git clone https://github.com/matthewmodestino/container_workshop.git

Or just grab the splunk_docker and fluent_docker folders from the repo.

https://github.com/matthewmodestino/container_workshop/tree/master/splunk_docker
https://github.com/matthewmodestino/container_workshop/tree/master/fluentd_docker

Once you have pulled down the folders (the following assumes you keep the folder naming), from within the splunk_docker folder, run:

docker-compose -f 1so_hec.yaml -f 1so_hec.yaml up -d

give the Splunk container a minute or two to start up. This is just to ensure fluentd has a valid backend to send to.

From within the fluentd_docker folder deploy the fluentd container:

docker-compose -f fluent_hec.yaml up -d

The fluentd folder has the config directory that contains the fluentd configuration files and is mounted into the container when it runs. We can update configs here and delete and redeploy the container for easy testing.

The output.conf file contains the following concat plugin config, which will be applied to any docker json driver logs we pick up:

<filter tail.containers.**>
    @type concat
    key log
    timeout_label @SPLUNK
    stream_identity_key stream
    multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
    flush_interval 5s
  </filter>

You should now have the following 2 containers running:

docker ps -a
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                             PORTS                                                                                                                      NAMES
dca738a36569        splunk/fluentd-hec:1.1.0   "/fluentd/entrypoint…"   4 seconds ago       Up 3 seconds                                                                                                                                                  fluent-hec
36ce93169d2d        splunk/splunk:latest       "/sbin/entrypoint.sh…"   26 seconds ago      Up 24 seconds (health: starting)   4001/tcp, 8065/tcp, 8191/tcp, 9887/tcp, 9997/tcp, 0.0.0.0:9999->8000/tcp, 0.0.0.0:9088->8088/tcp, 0.0.0.0:9089->8089/tcp   so1

Now check the logs of the fluentd container:

docker logs -f dca738a36569

at the very top of the log we dump the running configs and versions:

<system>
    log_level info
    root_dir "/tmp/fluentd"
  </system>
  <source>
    @id containers.log
    @type tail
    @label @SPLUNK
    tag "tail.containers.*"
    path "/var/lib/docker/containers/*/*-json.log"
    pos_file "/var/log/splunk-fluentd-containers.log.pos"
    path_key "source"
    read_from_head true
    <parse>
      @type "json"
      time_key "time"
      time_type string
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      localtime false
    </parse>
  </source>
  <source>
    @id config.v2
    @type tail
    @label @SPLUNK
    tag "tail.config.*"
    path "/var/lib/docker/containers/*/config.v2.json"
    pos_file "/var/log/splunk-fluentd-docker-config.log.pos"
    path_key "source"
    read_from_head true
    <parse>
      @type "json"
    </parse>
  </source>
  <source>
    @id fluentd-monitor-agent
    @type monitor_agent
    @label @SPLUNK
    tag "monitor_agent"
  </source>
  <label @SPLUNK>
    <filter tail.containers.**>
      @type jq_transformer
      jq ".record | .container_id = (.source | capture(\"(?<container_id>[0-9a-f]{12})\") | .container_id)"
    </filter>
    <filter tail.containers.**>
      @type concat
      key "log"
      timeout_label "@SPLUNK"
      stream_identity_key "stream"
      multiline_start_regexp "/^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}/"
      flush_interval 5s
    </filter>
    <filter tail.config.**>
      @type jq_transformer
      jq ".record.sourcetype = \"docker:config\" | .record"
    </filter>
    <match **>
      @type splunk_hec
      protocol https
      hec_host "localhost"
      hec_port 9088
      hec_token "00000000-0000-0000-0000-000000000000"
      host "docker"
      source_key "source"
      sourcetype_key "sourcetype"
      insecure_ssl true
      <fields>
        container_id 
      </fields>
      <buffer>
        @type "memory"
        chunk_limit_records 10000
        chunk_limit_size 200m
        flush_interval 5s
        flush_thread_count 1
        overflow_action block
        retry_max_times 3
        total_limit_size 600m
      </buffer>
      <format monitor_agent>
        @type "json"
      </format>
      <format tail.config.**>
        @type "json"
      </format>
      <format>
        @type "single_value"
        message_key "log"
        add_newline false
      </format>
    </match>
  </label>
</ROOT>
2019-05-21 17:16:35 +0000 [info]: starting fluentd-1.4.0 pid=6 ruby="2.5.1"
2019-05-21 17:16:35 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--under-supervisor"]
2019-05-21 17:16:35 +0000 [info]: gem 'fluent-plugin-concat' version '2.3.0'
2019-05-21 17:16:35 +0000 [info]: gem 'fluent-plugin-jq' version '0.5.1'
2019-05-21 17:16:35 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2019-05-21 17:16:35 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.1.0'
2019-05-21 17:16:35 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2019-05-21 17:16:35 +0000 [info]: gem 'fluentd' version '1.4.0'
2019-05-21 17:16:35 +0000 [info]: adding filter in @SPLUNK pattern="tail.containers.**" type="jq_transformer"
2019-05-21 17:16:35 +0000 [info]: adding filter in @SPLUNK pattern="tail.containers.**" type="concat"
2019-05-21 17:16:35 +0000 [info]: adding filter in @SPLUNK pattern="tail.config.**" type="jq_transformer"
2019-05-21 17:16:35 +0000 [info]: adding match in @SPLUNK pattern="**" type="splunk_hec"
2019-05-21 17:16:35 +0000 [info]: adding source type="tail"
2019-05-21 17:16:35 +0000 [warn]: #0 [containers.log] 'encoding' and 'from_encoding' are same encoding. No effect
2019-05-21 17:16:35 +0000 [info]: adding source type="tail"
2019-05-21 17:16:35 +0000 [warn]: #0 [config.v2] 'encoding' and 'from_encoding' are same encoding. No effect
2019-05-21 17:16:35 +0000 [info]: adding source type="monitor_agent"
2019-05-21 17:16:35 +0000 [info]: #0 starting fluentd worker pid=14 ppid=6 worker=0

and you should see that the logs we picked up from the fluentd container itself start triggering the concat rule and spitting errors:

2019-05-21 17:18:53 +0000 [info]: #0 Timeout flush: tail.containers.var.lib.docker.containers.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0-json.log:stdout
2019-05-21 17:18:58 +0000 [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/local/bundle/gems/fluent-plugin-concat-2.3.0/lib/fluent/plugin/filter_concat.rb:144:in `synchronize'" tag="tail.containers.var.lib.docker.containers.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0-json.log" time=2019-05-21 17:18:58.562890167 +0000 record={"log"=>"2019-05-21 17:18:53 +0000 [info]: #0 Timeout flush: tail.containers.var.lib.docker.containers.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0.dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0-json.log:stdout\n", "stream"=>"stdout", "source"=>"/var/lib/docker/containers/dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0/dca738a36569ceb04fa6c21bc0cf39a4fbd95ab661e140cedc4c5b423f8cddc0-json.log", "container_id"=>"dca738a36569"}

Let me know if there is something specific you would like me to check in the fluentd container itself.

to clean up these resources:

from within your fluentd_docker folder:

docker-compose -f fluent_hec.yaml down

then from within your splunk_docker folder:

docker-compose -f 1so_hec.yaml down

Clean up docker volumes:

docker volume prune

from fluent-plugin-concat.

matthewmodestino avatar matthewmodestino commented on May 28, 2024

Thanks so much for checking. I will review and advise.

from fluent-plugin-concat.

bamb00 avatar bamb00 commented on May 28, 2024

@peter-slovak I have a log (size 59k) that gets truncated into four separated log by docker and the partial log (59k) is created with other regular logs. How does the "partial_key log" or "partial_key partial_message" determine when the partial log ends to append the partial logs?

  <filter **>
    @type concat
    key log
    partial_key partial_message
    partial_value true
    keep_partial_key false
    separator ""
    use_first_timestamp true
    flush_interval 5
    timeout_label @START
  </filter>

Thanks.

from fluent-plugin-concat.

peter-slovak avatar peter-slovak commented on May 28, 2024

@bamb00 As I understand it, Docker flags partial logs in its internal structure and then exposes that flag to a particular logging driver/plugin. In case of fluentd, the flag makes it into the partial_message log attribute.

Since these logs are line-oriented, as soon as we encounter an \n in the log, we know it's the last chunk. See moby/moby#34620 (comment) for a more detailed explanation by people that know a lot more about this :)

from fluent-plugin-concat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.