Code Monkey home page Code Monkey logo

Comments (27)

pkoretic avatar pkoretic commented on June 19, 2024 1

@jsvd 3.1.0 has issues with both ab and siege
ab will stall after first few requests and siege will use up all the ports and fail, no issues like that with 3.0.10, any idea?

from logstash-input-http.

pkoretic avatar pkoretic commented on June 19, 2024 1

I guess 16 seconds is not enough, try it with --time=2M with new plugin and old and a new one will fail, transaction rate is also lower in my testing
keepalive or not:
Cannot assign requested address

bunch of connections in TIME_WAIT state

from logstash-input-http.

pkoretic avatar pkoretic commented on June 19, 2024 1

Just some information as I was investigating this plugin for a number of weeks now and the performance is lower than expected, even with a new 3.1.0 version. I'm not sure what machine you have but even from your results this is also not exceptional.
Just having this logstash http input and elastic output with no filters I was not able to get more than 10k trans/sec per second when playing with memory, batch size and worker count on my machine (i7 7700, 8GB RAM, Intel SSD) and logstash was using all the cpu it can get.

Replacing http input with generator I was able to get 2-3x more documents and by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide. Testing the same by using TCP input this is again confirmed.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

I looked into implementing HTTP/2 support for this plugin as netty does support it but this is beyond my scope and expertise and I'm not sure how much of a boost it would provide here.

from logstash-input-http.

pkoretic avatar pkoretic commented on June 19, 2024 1

Sorry if I haven't explained this properly but we are not trying to do anything special.
And we did tests with batches of events in one request which, as expected, performs much better because, again, this http input is very slow so multiple evens confirm and workaround that but that also pushes batching logic to all clients.

We were trying to use http input plugin because of the obvious reasons that we can reuse our logstash with http input being just one input provider and keeping all the filter and index logic which logstash provides. http input is, as expected, slower, but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

On an m5.2xlarge aws instance we had trouble getting over 30k e/s with this plugin. And we use and develop a lot http services so we easily noticed this one performs poorly. I would have to recheck but I also remember graylog with http gelf input, which is somewhat equivalent to this, performing as expected.

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

Can you provide your configuration?

I've done a basic test to see the speed of 3.0.8 in logstash 6.1:

/tmp/logstash-6.1.0 % bin/logstash-plugin list --verbose http
logstash-input-http (3.0.8)

On the producer side:

/tmp/logstash-6.1.0 % bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2

On the consumer side running a couple of minutes:

/tmp/logstash-6.1.0 % bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null
[326KiB/s] [ 204KiB/s]

204k events per second on a local machine.

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd I'm sorry , I might not have described it clearly. I set up an elk cluster,X-pack is installed to monitor some key metrics,When I found that the data entry was very slow, I tried a separate test for login-http-input, configuration file:
input{
http{
host => "0.0.0.0"
port => 10001
additional_codecs => {"application/json"=>"json"}
threads => 10
}
}
output{
stdout{
codec => dots
}
}.
In addition, the Java httpclient that previously sent HTTP is running on Windows.When the speed exception is found, instead of sending HTTP to the local tomcat server, which is a satisfactory 500 events/s.To test the throughput of loginch-http-input, I also tried to send curl HTTP at the same time:
function fun(){
for (( i = 0; i < 300000; i++ ));do
{
line="http://172.16.0.50:10001/ear/erererer"
isok=curl -s -X POST --data '{"clientHost":"172.16.0.50","requestTime":"11/Aug/2017:17:21:05 +0800","method":"post","page":"/openlayers3/index3","protocol":"HTTP/1.1","stateCode":"200","bytes":"3245"}' -w %{http_code} $line -o /dev/null
}
done
}
for((j = 0;j<5;j++));do
{
fun > /dev/null &
}
done
Throughput is 1000 events/s. This speed is not satisfactory, but it is also possible that curl sends speed limits.So then, inspired by you, I also tried the generator + http-output to generate data and
http-input + stdout {codec => dots} consumption data ,It's far from what you've given。
Thank you very much for your reply. I look forward to your reply!

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd Download a 6.10 version of logstash and then copy your example for testing(Base configuration but Installing the X-ray pack):
On the producer side:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://localhost:8080' http_method => post } }" --path.data data2
On the consumer side running a couple of minutes:
bin/logstash -e "input { http {port => 8080} } " > /dev/null
View throughput through kibana:100 events/s ,
At the same time, some HTTP connection timeout errors are found in the logstash log:
[2017-12-19T17:02:53,307][ERROR][logstash.outputs.http ] [HTTP Output Failure] Could not fetch URL {:url=>"http://localhost:8080", :method=>:post, :body=>"{"@timestamp":"2017-12-19T09:02:22.640Z","message":"Hello world!","@Version":"1","host":"site50","sequence":3707}", :headers=>{"Content-Type"=>"application/json"}, :message=>"Read timed out", :class=>"Manticore::SocketTimeout", :backtrace=>nil, :will_retry=>true}
I really don't know why this gap has been created. It's frustrating

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

this is really weird indeed. so we can test the exactly same thing, can you install/use the pv tool like I did in my test?

producer:

bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2

consumer:

bin/logstash -e "input { http {port => 8080} }" | pv -War > /dev/null

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd Use the pv tool (probably the version I use is different from you, and my pv doesn't have the 'a' parameter):
producer:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2
consumer:
bin/logstash -e "input { http {port => 8080} }" | pv -Wr > /dev/null
Stable data:[64.8KB/s]

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd @ph @colinsurprenant @elasticdog @karmi During the test, I also found several phenomena:

  1. I used the 5.6 version before, when I tried to run the above example in version 5.6, the logstash started and generated the data for about two seconds before the OOM was reported:
    Error: Your application used more memory than the safety cap of 4G.
    Specify -J-Xmx####m to increase it (#### = cap size in MB).
    Specify -w for full OutOfMemoryError stack trace
  2. Increase the number of threads of logstash-http-input , increasing the proportion of processing speed.

Through try many times, I feel the reason of the slow speed in logstash - HTTP - input here, increase the Java client send speed (about 500 events/s) if the HTTP input using the default configuration, will return to 429, using the generator generated by input data, have been SocketTimeout thrown

I am very sorry for the use of @. I have been bothered by this question for many days. I hope to get your help and thanks again for the reply to JSVD!

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

Please don't ping people directly, specially since I'm working on this with you.

After some testing I'm also seeing a very low throughput, my numbers before were mislead because I was not measuring the dots codec output. I am investigating this, I'll report back when I know more.

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

@Ccxlp can you try starting the http input receiver with an ip instead of host and check the speed?
So instead of:
bin/logstash -e "input { http {port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

Try:
bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd I just gave it a try:
producer:
bin/logstash -e "input { generator { threads => 4 } } output { http { url => 'http://127.0.0.1:8080' http_method => post } }" --path.data data2
consumer:
bin/logstash -e "input { http {host => '127.0.0.1' port => 8080 } } output { stdout { codec => dots } }" | pv -Wr > /dev/null
[100B/s],
What's the situation with you?

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

I'm seeing slow speeds as well when testing with the logstash as an event producer. However, using a tool like ab or siege I see normal speeds:

siege --content-type "application/json" 'http://127.0.0.1:8080 POST {"ids": ["1","2","3"]}'          
** SIEGE 4.0.4
** Preparing 25 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:		       89082 hits
Availability:		      100.00 %
Elapsed time:		        8.42 secs
Data transferred:	        0.17 MB
Response time:		        0.00 secs
Transaction rate:	    10579.81 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        3.78
Successful transactions:       89082
Failed transactions:	           0
Longest transaction:	        0.02
Shortest transaction:	        0.00
ab -p data.json -n 100000 -T application/json -m POST -k "http://127.0.0.1:8080/"  
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        2 bytes

Concurrency Level:      1
Time taken for tests:   11.720 seconds
Complete requests:      100000
Failed requests:        0
Keep-Alive requests:    100000
Total transferred:      9000000 bytes
Total body sent:        17400000
HTML transferred:       200000 bytes
Requests per second:    8532.58 [#/sec] (mean)
Time per request:       0.117 [ms] (mean)
Time per request:       0.117 [ms] (mean, across all concurrent requests)
Transfer rate:          749.93 [Kbytes/sec] received
                        1449.87 kb/s sent
                        2199.80 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       8
Waiting:        0    0   0.1      0       8
Total:          0    0   0.1      0       8

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      8 (longest request)

siege test: 10579.81 trans/sec
ab test: 8532.58 [#/sec] (mean)

So around 8.5-10k eps

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd Thank you for your answer.However, for logsage-http-input and Java httpclient, the interaction is so slow that the processing of a single message requires 200ms, and by grabbing the software (fidder),You can see that log-http-input is slow to respond, what do you suggest?
fidder Caught HTTP packet:
GotRequestHeaders: 20:44:57.472
ClientDoneRequest: 20:44:57.472
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 20:44:56.272
FiddlerBeginRequest: 20:44:57.472
ServerGotRequest: 20:44:57.472
ServerBeginResponse: 20:44:57.475
GotResponseHeaders: 20:44:57.475
ServerDoneResponse: 20:44:57.675
ClientBeginResponse: 20:44:57.675
ClientDoneResponse: 20:44:57.675
Overall Elapsed: 00:00:00.2030116

from logstash-input-http.

Ccxlp avatar Ccxlp commented on June 19, 2024

@jsvd
Do you have other operations in the process of building the logstash system?
I used another virtual machine, download the 6.10 version, install jdk1.8.0 _74, no configuration java_home, but in logstash.lib.sh file specifies the JAVACMD, then keep the default configuration, use ab tools to implement your operation process, but my speed is still 25 b/s
Am I missing something?

from logstash-input-http.

varun0002 avatar varun0002 commented on June 19, 2024

@Ccxlp jsvd: Were you able to find any solution for this?

from logstash-input-http.

troymevosus avatar troymevosus commented on June 19, 2024

I'm also experiencing this issue. Any updates?

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

Can you try version 3.1.0 which replaces puma with netty?

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

interesting, seems to be an issue with how ab does http 1.0
what is the problem with siege?

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

I tried siege locally and it worked:

% siege  http://localhost:8888
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...^C
Lifting the server siege...
Transactions:		       73917 hits
Availability:		      100.00 %
Elapsed time:		       16.64 secs
Data transferred:	        0.14 MB
Response time:		        0.00 secs
Transaction rate:	     4442.13 trans/sec
Throughput:		        0.01 MB/sec
Concurrency:		        9.86
Successful transactions:       73917
Failed transactions:	           0
Longest transaction:	        0.27
Shortest transaction:	        0.00

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

my results with 3.1.0:

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1297582 hits
Availability:		      100.00 %
Elapsed time:		      119.91 secs
Data transferred:	        2.47 MB
Response time:		        0.00 secs
Transaction rate:	    10821.30 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        9.69
Successful transactions:     1297582
Failed transactions:	           0
Longest transaction:	        0.19
Shortest transaction:	        0.00

and with version 3.0.10:

siege  http://localhost:8889 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1054378 hits
Availability:		      100.00 %
Elapsed time:		      119.70 secs
Data transferred:	        2.04 MB
Response time:		        0.00 secs
Transaction rate:	     8808.50 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        4.76
Successful transactions:     1053448
Failed transactions:	           0
Longest transaction:	        0.28
Shortest transaction:	        0.00

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

This is strange indeed, a curl works correctly, siege works too, but ab fails:

% curl --http1.0 -v localhost:8888/ 
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8888 (#0)
> GET / HTTP/1.0
> Host: localhost:8888
> User-Agent: curl/7.54.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Length: 2
< Content-Type: text/plain
< 
* Closing connection 0
ok
% ab -n 100 -c 1 -s 5 http://127.0.0.1:8888/
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)...apr_pollset_poll: The timeout specified has expired (70007)

% siege  http://localhost:8888 --time=2M
** SIEGE 4.0.4
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...
Transactions:		     1297582 hits
Availability:		      100.00 %
Elapsed time:		      119.91 secs
Data transferred:	        2.47 MB
Response time:		        0.00 secs
Transaction rate:	    10821.30 trans/sec
Throughput:		        0.02 MB/sec
Concurrency:		        9.69
Successful transactions:     1297582
Failed transactions:	           0
Longest transaction:	        0.19
Shortest transaction:	        0.00

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

thanks for the feedback @pkoretic

Replacing http input with generator I was able to get 2-3x more documents

This is certainly expected, generator performs no networking calls, no waiting, it's just a tight loop allocating logstash events and pushing them to the queue.

by benchmarking elastic using rally I was seeing the same, that elastic can process much more documents than this plugin can provide.

I'm not sure I understand the comparison here. the http input doesn't provide events, it just accepts http requests and pushes them to the logstash queue to be processed by filters+outputs.

In the end we have continued by writing custom Node.js HTTP and then HTTP/2 service using bulk api from elastic and throughput got up 3-4x times on the same system with the same identical data set which is also slightly processing data on the fly (parsing, modification, stringification).

again, I don't think I understand the comparison, if you're writing data to elasticsearch then the http input is not necessary.

Is your architecture http endpoint -> processing -> elasticsearch? for this you have ingest node in elasticsearch.

Also, the benchmarks you're performing for this plugin are based on each request having a single event.
You can send json document that is an array of objects, so the http input will produce one event per object.
You can see the performance boost of bulk vs single item by doing running two logstash instances:

The one receiving data and showing event count per second:

bin/logstash -e "input { http {} } output { stdout { codec => dots } }" | pv -War > /dev/null

Another instance generating events as fast as possible and pushing entire batches of events in a single request:

bin/logstash -e "input { generator {} } output { http { url => 'http://localhost:8080' format => json_batch http_method => post } }"

In my 13" mbp from 2018 I see about 75-85k e/s

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

Another issue was that this plugin returns busy if you use a number of clients that is higher than a number of threads (try with siege -c 10) which is not a proper reactor pattern and that should be solved by using netty and 3.1.0.

the previous < 3.1.0 implementation supports multiple clients above the number of threads, but it responds immediately with 429 if those threads are busy. The netty implementation adds a queue to buffer bursts of traffic and micro delays.

but it should not be that slower which we easily confirmed by comparing generator, tcp, and http input with stdout and elastic outputs and then writing a Node.js service that mimics this plugin behavior.

building a service that performs the same task as the http input will always be much faster as logstash is a general purpose event pipeline with a lot of built in flexibility at the cost of performance.

Can you talk a bit more about the load you're putting on http input? is it high volume per connection, or many connections ? or both?

As for this plugin being slow, after the netty rewrite and pr #85 gets merged, we'll need java based codecs to be faster. That said, the plugin is at this point very light in what it does, it's a standard http netty processing pipeline, requests are read from the socket, transformed into HttpRequest objects, the content is decoded using the ruby codecs (the costliest part) and then pushed to the logstash queue.

Another option here is to support http pipelining, we can use elastic/elasticsearch#8299 as inspiration.

from logstash-input-http.

pkoretic avatar pkoretic commented on June 19, 2024

It's mostly a lot of connections but there are spikes when there is also a big volume of events per connection, an event is generally under 100 characters.
We have tried plain codec with filter to extract data and that gave some improvement but not as much as we would hoped, and I tried msgpack as well but that was abandoned for other reasons. I haven't tried plain codec with 3.1.0 though.

In the end I personally think http/2 would also be a good direction given it has proper multiplexing support and binary data transfer and that would replace pipelining and compressed requests efforts for http/1.x.
It should be noticeably faster and lighter both for data ingestion and processing speed and in our service the difference between http/1.x and http2 is more than visible.
But I do understand this could be a big task to implement given elastic itself doesn't support it yet, although netty is also used, so there is a reason for that I guess elastic/elasticsearch#10981.
Thanks for all the work on the plugin!

from logstash-input-http.

jsvd avatar jsvd commented on June 19, 2024

Thank you all for the great feedback. Since this issue has now diverged from the initial topic and the internal implementation has changed I'm going to close this. If you see a performance regression or any odd behaviour please open a new issue.

from logstash-input-http.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.