Code Monkey home page Code Monkey logo

Comments (15)

rwynn avatar rwynn commented on August 17, 2024

The settings in your TOML file look correct assuming you want to copy over the gridfs files from the test2 database. monstache uses the oplog to determine changes to propagate. When the replay is true as you have it, monstache replays all the operations in this collection (normally database local, collection oplog.rs). Can you please connect to mongo and check if you have any documents in the oplog collection. Otherwise, report if you have the database local and what collections it contains.

When you set up replicate sets you usually get an oplog (so you shouldn't need to force it with the -master option). But if the files were saved to mongo before the oplog was created then the oplog would be empty and monstache would have nothing to propagate. In that case you might want to do a mongodump and then a mongorestore with monstache running. The mongorestore would populate the oplog and monstache should pick up the changes and send them to elasticsearch.

from monstache.

rwynn avatar rwynn commented on August 17, 2024

To your first questions, monstache does not need to be running while the data is imported. However, you do need to make sure that the oplog gets populated during the import AND that the oplog is big enough to hold all the documents from your import (it is a capped collection). In the case the import is done before monstache is run, then monstache must be run with the replay option as you have, otherwise it will just start tailing the oplog for new changes.

from monstache.

rwynn avatar rwynn commented on August 17, 2024

Actually, I just noticed that your elasticsearch URL might be wrong. You will want the URL to be to the REST API which is normally port 9200.

from monstache.

amjustin13 avatar amjustin13 commented on August 17, 2024

Hi @rwynn

Thank you for the quick response. I am still not getting anything on ES after trying what you mentioned.

First, I checked the oplog.rs collection to see how many documents it contained and it had about 36,000.

Then, I did a Mongodump and then a Mongorestore while running Monstache but nothing happened in ES.

I am checking ES by viewing the indexes created like so:
curl http://localhost.company.com:9201/_cat/indices?v
But this returns no indices.

About the port number for ES. I have been using 9201 because for some reason that I have not been able to connect using port 9200 (my guess is because of some proxy settings). Port 9201 is working fine for me when using the curl commands to PUT, GET, ect.

Lastly, I tried deleting all of my data and re-importing it to Mongodb. I deleted the Local db in mongo as well to restart the replica set. While I was importing the data Monstache and ES were running. This did generate the monstache db in Mongo, but nothing in ES.
I am still using the same TOML file, except that I changed resume = true on line 5.

Thanks again for your help.
AJ

from monstache.

rwynn avatar rwynn commented on August 17, 2024

Do you see any activity in the elasticsearch log file? Usually in /var/log/elasticsearch.

Here are a couple other things to look into:

make sure elasticsearch does not have automatic index creation turned off in the settings. Check in elasticsearch.yml for a line like this action.auto_create_index and make sure it is not false or that it is not set to a whitelist/blacklist.

try to index a test document into an index which does not yet exist. and then view it.

curl -XPUT 'http://localhost.company.com:9201/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}'

curl http://localhost.company.com:9201/twitter/_search?pretty

Test if monstache is sending data. Use the following go server which simply logs requests.

package main

import (
    "fmt"
    "net/http"
    "net/http/httputil"
)

func handler(w http.ResponseWriter, r *http.Request) {
    dump, err := httputil.DumpRequest(r, true)
    if err != nil {
        fmt.Println(fmt.Sprint(err))
        return
    }
    fmt.Println(fmt.Sprintf("%q", dump))
}

func main() {
    http.HandleFunc("/", handler)
    http.ListenAndServe(":8000", nil)
}

Run this program with go run http_dump.go. Then set your elasticsearch-url to http://localhost:8000 in your TOML configuration and re-run monstache.

In the console where you ran http_dump.go you should begin to see the requests that normally would be going to elasticsearch. If you see these requests and they look correct then we can narrow it down to a problem on the elasticsearch side.

from monstache.

amjustin13 avatar amjustin13 commented on August 17, 2024

The Elasticsearch log file does not have anything more than the initial connection text. The log file does not contain information about indexes created for the Gridfs files when I run Monstache.

I ran the code you gave me to create an index and search it and it was successful. The logfile reflects these changes.

I also ran the program http_dump.go like you advised and it did run. For some time the output was a lot of what seemed to be junk characters. Around the time that it was almost done, I was able to see outputs like these:
"POST /_bulk?refresh=false HTTP/1.1\r\nHost: localhost:8000\r\nTransfer-Encoding: chunked\r\nAccept: application/json\r\nAccept-Encoding: gzip\r\nUser-Agent: elasticSearch/0.0.2 (linux-amd64)\r\n\r\n3e13\r\n{\"index\":{\"_index\":\"test2\",\"_type\":\"fs.files\",\"_id\":\"579a0db0001bfb08774f0580\"}}\n{\"chunkSize\":261120,\"contentType\":\"binary/octet-stream\",\"filecontent\":\"ClBvc3Qg"

While the http_dump.go was running, this was an output in the console where I ran ES:
[2016-07-29 08:06:19,028][WARN ][monitor.jvm ] [Valinor] [gc][young][2099][6] duration [2.1s], collections [1]/[2.8s], total [2.1s]/[2.3s], memory [88.5mb]->[26.4mb]/[989.8mb], all_pools {[young] [68.3mb]->[203.7kb]/[273mb]}{[survivor] [8.2mb]->[8.5mb]/[34.1mb]}{[old] [11.9mb]->[17.7mb]/[682.6mb]}

And in the console where I ran the Mongod instance there were outputs as such:

2016-07-29T07:55:20.282-0400 I COMMAND  [conn4] getmore local.oplog.rs query: { ts: { $gt: Timestamp 0|0 } } cursorid:28585181247 ntoreturn:0 keyUpdates:0 writeConflicts:0 numYields:6 nreturned:17 reslen:4441525 locks:{ Global: { acquireCount: { r: 14 } }, MMAPV1Journal: { acquireCount: { r: 7 } }, Database: { acquireCount: { r: 7 } }, oplog: { acquireCount: { R: 7 } } } 200ms

2016-07-29T07:57:21.108-0400 I COMMAND  [conn6] command test2.fs.chunks command: find { find: "fs.chunks", filter: { files_id: ObjectId('579a0d5f001bfb08774ef288'), n: 367 }, skip: 0, limit: 1, batchSize: 1, singleBatch: true } planSummary: IXSCAN { files_id: 1, n: 1 } keysExamined:1 docsExamined:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:261306 locks:{ Global: { acquireCount: { r: 2 } }, MMAPV1Journal: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { R: 1 } } } protocol:op_query 169ms

2016-07-29T07:57:53.854-0400 I WRITE    [conn6] update monstache.monstache query: { _id: "default" } update: { $set: { ts: Timestamp 1469713829000|53 } } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 fastmod:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { W: 1 } } } 1274ms

2016-07-29T07:57:53.854-0400 I COMMAND  [conn6] command monstache.$cmd command: update { update: "monstache", updates: [ { q: { _id: "default" }, u: { $set: { ts: Timestamp 1469713829000|53 } }, upsert: true } ], writeConcern: { getLastError: 1 }, ordered: true } keyUpdates:0 writeConflicts:0 numYields:0 reslen:115 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, MMAPV1Journal: { acquireCount: { w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { W: 1 } } } protocol:op_query 1275ms

2016-07-29T08:39:28.353-0400 I STORAGE  [DataFileSync] flushing mmaps took 11840ms  for 32 file`

Would there be a problem connecting to ES with Monstache if I am behind a proxy server?

Thanks again for all of your help,
AJ

from monstache.

rwynn avatar rwynn commented on August 17, 2024

The request logs from http_dump look OK. I've just created a new release v1.2 of monstache with better error reporting. You will want to turn on the new verbose option in your TOML config file. Hopefully this will give us some idea why the request is not reaching elasticsearch.

You mentioned being behind a proxy server. Unfortunately, monstache does not support situations where a proxy server is surfacing elasticsearch with a custom path like http://example.com:9200/elasticsearch/. This is due to the fact that the library monstache uses does not currently allow configuring the /elasticsearch part above. It simply expects the host:port and then the path following must be the path that elasticsearch expects in its API. Having said that, it doesn't seem from your examples that that is the case. The only thing that I noticed you needed to change was the port.

One thing to check might be any configuration related to request size. Since monstache uses the bulk API multiple files may be sent at once. Also, elasticsearch requires the file contents to be base64 encoded which makes the size even larger. The base64 encoded file is probably what you were referring to as the "junk characters".

You will probably want to check any server between monstache and elasticsearch to make sure the request body size is large. There are some configuration options in ES itself with regards to request body size. But something tells me its not even reaching ES since you aren't seeing anything in the ES logs.

from monstache.

amjustin13 avatar amjustin13 commented on August 17, 2024

Hi @rwynn thanks again for your quick response.

I was able to get some activity on ES by running this: export http_proxy=""

After setting the env variable, I ran ES and MongoDB and then Monstache and it appeared to be working correctly. It created the indexes in ES and I could see that the logs looked normal.

However... I ran into a memory problem and now I can't connect to ES. Here is what happened:

[2016-07-29 10:52:09,965][WARN ][bootstrap                ] running as ROOT user. this is a bad idea!
[2016-07-29 10:52:10,259][INFO ][node                     ] [Sunset Bain] version[2.1.2], pid[70907], build[63c285e/2016-01-27T12:57:52Z]
[2016-07-29 10:52:10,259][INFO ][node                     ] [Sunset Bain] initializing ...
[2016-07-29 10:52:10,785][INFO ][plugins                  ] [Sunset Bain] loaded [mapper-attachments], sites []
[2016-07-29 10:52:10,827][INFO ][env                      ] [Sunset Bain] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [130.2gb], net total_space [230.3gb], spins? [possibly], types [ext4]
[2016-07-29 10:52:12,867][INFO ][node                     ] [Sunset Bain] initialized
[2016-07-29 10:52:12,867][INFO ][node                     ] [Sunset Bain] starting ...
[2016-07-29 10:52:13,145][INFO ][transport                ] [Sunset Bain] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2016-07-29 10:52:13,160][INFO ][discovery                ] [Sunset Bain] elasticsearch/OMLWdUn_QoCtZ1pnrP_ebw
[2016-07-29 10:52:16,228][INFO ][cluster.service          ] [Sunset Bain] new_master {Sunset Bain}{OMLWdUn_QoCtZ1pnrP_ebw}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-07-29 10:52:16,331][INFO ][http                     ] [Sunset Bain] publish_address {127.0.0.1:9201}, bound_addresses {127.0.0.1:9201}, {[::1]:9200}
[2016-07-29 10:52:16,331][INFO ][node                     ] [Sunset Bain] started
[2016-07-29 10:52:16,627][INFO ][gateway                  ] [Sunset Bain] recovered [2] indices into cluster_state
[2016-07-29 10:53:38,034][INFO ][cluster.metadata         ] [Sunset Bain] [test2] creating index, cause [api], templates [], shards [5]/[1], mappings [fs.files]
[2016-07-29 10:54:12,732][INFO ][cluster.metadata         ] [Sunset Bain] [test2] update_mapping [fs.files]
[2016-07-29 11:21:15,914][DEBUG][action.bulk              ] [Sunset Bain] [test2][3] failed to execute bulk item (index) [FAILED toString()]
MapperParsingException[failed to parse]; nested: OutOfMemoryError[Java heap space];
    at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:159)
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:79)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:304)
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:552)
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:543)
    at org.elasticsearch.action.support.replication.TransportReplicationAction.prepareIndexOperationOnPrimary(TransportReplicationAction.java:1050)
    at org.elasticsearch.action.support.replication.TransportReplicationAction.executeIndexRequestOnPrimary(TransportReplicationAction.java:1067)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:338)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:131)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:579)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at com.fasterxml.jackson.core.util.ByteArrayBuilder.toByteArray(ByteArrayBuilder.java:118)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeBase64(UTF8StreamJsonParser.java:3516)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getBinaryValue(UTF8StreamJsonParser.java:484)
    at com.fasterxml.jackson.core.JsonParser.getBinaryValue(JsonParser.java:1225)
    at org.elasticsearch.common.xcontent.json.JsonXContentParser.binaryValue(JsonXContentParser.java:190)
    at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:441)
    at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:314)
    at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:441)
    at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:267)
    at org.elasticsearch.index.mapper.DocumentParser.innerParseDocument(DocumentParser.java:127)
    ... 14 more
[2016-07-29 11:21:23,669][WARN ][http.netty               ] [Sunset Bain] Caught exception while handling client http traffic, closing connection [id: 0x949ef53d, /127.0.0.1:53472 => /127.0.0.1:9201]
java.lang.OutOfMemoryError: Java heap space
    at org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)
    at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
    at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
    at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
    at org.jboss.netty.buffer.CompositeChannelBuffer.copy(CompositeChannelBuffer.java:568)
    at org.jboss.netty.buffer.AbstractChannelBuffer.copy(AbstractChannelBuffer.java:494)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.appendToCumulation(HttpChunkAggregator.java:208)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:175)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:135)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[2016-07-29 11:21:23,924][WARN ][http.netty               ] [Sunset Bain] Caught exception while handling client http traffic, closing connection [id: 0x949ef53d, /127.0.0.1:53472 :> /127.0.0.1:9201]

After getting this error I attempted to re-launch ES but I am getting an error like this:
[2016-08-01 10:08:20,196][INFO ][discovery.zen ] [Fagin] failed to send join request to master [{Sunset Bain}{OMLWdUn_QoCtZ1pnrP_ebw}{127.0.0.1}{127.0.0.1:9300}], reason [RemoteTransportException[[Sunset Bain][127.0.0.1:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[Fagin][127.0.0.1:9301] connect_timeout[30s]]; ]

When I try to talk to ES:
curl http://localhost.x.com:9201/_cat/indices?v

I get an error like this:
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":"waited for [30s]"}],"type":"master_not_discovered_exception","reason":"waited for [30s]"},"status":503}

I have searched around an it seems that its a problem with my elasticsearch.yml or with a proxy connection problem.

This is everything in my elasticsearch.yml
network.host: 127.0.0.1
http.max_content_length: 1gb

Thanks again,
AJ

from monstache.

rwynn avatar rwynn commented on August 17, 2024

This documentation about the ES Heap might be helpful.

You mentioned previously that you have lots of files in mongodb. When you replay the oplog monstache is going to generate some sizeable bulk index requests to store those in ES. You might want to try limiting how many documents are sent in a single bulk request. You can do this now in the latest version of monstache by setting the following in your TOML config.

elasticsearch-max-docs = 2

By default max docs is set to 100, which is normally fine for indexing mongodb documents, but since your use case is files, it probably should be set much lower. Even 2 documents would translate to a sizeable bulk request since those documents would include the base64 encoded contents of 2 files.

Finally, since this is a large import you will want to greatly increase or turn OFF the refresh_interval. The refresh interval is the timeout before elasticsearch does some work to take newly indexed documents and make them available for search. There is a global index.refresh_interval setting for elasicsearch.yml. Try setting this to something large or turning it off completely (set to -1) during the indexing. Then you can remove the line after the indexing is complete and restart ES to return it to the default of 1s.

Some links:
Index Modules
Indexing Buffer
Indexing Performance

from monstache.

rwynn avatar rwynn commented on August 17, 2024

As far as the ES startup exception is concerned you can check Network Settings.

from monstache.

rwynn avatar rwynn commented on August 17, 2024

Hi @amjustin13

Were you able to resolve the startup and memory issues?

from monstache.

amjustin13 avatar amjustin13 commented on August 17, 2024

Hi @rwynn

I was able to resolve the startup problem. But after taking a couple of steps to try to fix the memory problem, I am still getting the out-of-memory error.

This is what I changed in elasticsearch.yml file:

index.number_of_shards: 1
index.number_of_replicas: 0
index.refresh_interval: -1
http.max_content_length: 1gb

I also did the following: export ES_HEAP_SIZE=4g

I am able to run Monstache now after doing export http_proxy="" and I can see the activity in ES. However, after a while ES throws the out-of-memory error again.

Here is the log file that it created:

[2016-08-05 10:39:38,412][INFO ][node                     ] [Brain-Child] version[2.1.2], pid[91094], build[63c285e/2016-01-27T12:57:52Z]
[2016-08-05 10:39:38,413][INFO ][node                     ] [Brain-Child] initializing ...
[2016-08-05 10:39:38,879][INFO ][plugins                  ] [Brain-Child] loaded [mapper-attachments], sites []
[2016-08-05 10:39:38,892][INFO ][env                      ] [Brain-Child] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [128.8gb], net total_space [230.3gb], spins? [possibly], types [ext4]
[2016-08-05 10:39:40,436][INFO ][node                     ] [Brain-Child] initialized
[2016-08-05 10:39:40,436][INFO ][node                     ] [Brain-Child] starting ...
[2016-08-05 10:39:41,124][INFO ][transport                ] [Brain-Child] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2016-08-05 10:39:41,135][INFO ][discovery                ] [Brain-Child] elasticsearch/R5uyzLW2TWCuWkOGYcd2RA
[2016-08-05 10:39:44,261][INFO ][cluster.service          ] [Brain-Child] new_master {Brain-Child}{R5uyzLW2TWCuWkOGYcd2RA}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-08-05 10:39:44,627][INFO ][http                     ] [Brain-Child] publish_address {127.0.0.1:9201}, bound_addresses {127.0.0.1:9201}, {[::1]:9200}
[2016-08-05 10:39:44,628][INFO ][node                     ] [Brain-Child] started
[2016-08-05 10:39:44,628][INFO ][gateway                  ] [Brain-Child] recovered [0] indices into cluster_state
[2016-08-05 10:40:53,611][INFO ][cluster.metadata         ] [Brain-Child] [test2] creating index, cause [api], templates [], shards [1]/[0], mappings [fs.files]
[2016-08-05 10:41:20,402][INFO ][monitor.jvm              ] [Brain-Child] [gc][young][99][3] duration [761ms], collections [1]/[1.2s], total [761ms]/[1.2s], memory [521.5mb]->[407mb]/[3.8gb], all_pools {[young] [419.5mb]->[193mb]/[1gb]}{[survivor] [101.9mb]->[39.8mb]/[136.5mb]}{[old] [0b]->[174.1mb]/[2.6gb]}
[2016-08-05 10:41:33,951][INFO ][cluster.metadata         ] [Brain-Child] [test2] update_mapping [fs.files]
[2016-08-05 10:41:36,272][INFO ][monitor.jvm              ] [Brain-Child] [gc][young][114][5] duration [831ms], collections [1]/[1.1s], total [831ms]/[2.6s], memory [1.6gb]->[1.1gb]/[3.8gb], all_pools {[young] [860.2mb]->[261.1mb]/[1gb]}{[survivor] [136.5mb]->[28.9mb]/[136.5mb]}{[old] [678.5mb]->[922.4mb]/[2.6gb]}
[2016-08-05 10:43:44,689][INFO ][monitor.jvm              ] [Brain-Child] [gc][young][235][24] duration [907ms], collections [1]/[1.1s], total [907ms]/[7.9s], memory [2.8gb]->[2.4gb]/[3.8gb], all_pools {[young] [851.4mb]->[260kb]/[1gb]}{[survivor] [104.8mb]->[35.4mb]/[136.5mb]}{[old] [1.9gb]->[2.3gb]/[2.6gb]}
[2016-08-05 10:53:15,445][WARN ][http.netty               ] [Brain-Child] Caught exception while handling client http traffic, closing connection [id: 0xfac7cdc4, /127.0.0.1:52440 => /127.0.0.1:9201]
org.jboss.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 1073741824 bytes.
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:169)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:135)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-08-05 10:53:15,451][WARN ][http.netty               ] [Brain-Child] Caught exception while handling client http traffic, closing connection [id: 0xfac7cdc4, /127.0.0.1:52440 :> /127.0.0.1:9201]
java.lang.IllegalStateException: null cannot be returned if no data is consumed and state didn't change.
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:503)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-08-05 10:53:54,209][WARN ][http.netty               ] [Brain-Child] Caught exception while handling client http traffic, closing connection [id: 0xf59af950, /127.0.0.1:52450 => /127.0.0.1:9201]
org.jboss.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 1073741824 bytes.
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:169)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:135)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-08-05 10:53:54,210][WARN ][http.netty               ] [Brain-Child] Caught exception while handling client http traffic, closing connection [id: 0xf59af950, /127.0.0.1:52450 :> /127.0.0.1:9201]
java.lang.IllegalStateException: null cannot be returned if no data is consumed and state didn't change.
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:503)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-08-05 10:54:21,790][WARN ][http.netty               ] [Brain-Child] Caught exception while handling client http traffic, closing connection [id: 0xd6be99ee, /127.0.0.1:52452 => /127.0.0.1:9201]
java.lang.OutOfMemoryError: Java heap space
    at org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)
    at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
    at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
    at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
    at org.jboss.netty.buffer.CompositeChannelBuffer.copy(CompositeChannelBuffer.java:568)
    at org.jboss.netty.buffer.AbstractChannelBuffer.copy(AbstractChannelBuffer.java:494)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.appendToCumulation(HttpChunkAggregator.java:208)
    at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:175)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:135)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
    at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)


I'm not really sure how to determine what I need to do in ES to be able to index my data without memory problems. Right now I just have a small subset of the data in Mongodb with 35gb. I will be importing much more data into Mongodb in the future. My intuition tells me that ES would be able to handle an incredible amount of data, but I have no idea how I can set it up in order to do so.

Thanks again for your help,
AJ

from monstache.

rwynn avatar rwynn commented on August 17, 2024

Hi @amjustin13,

From the log it seems that at least one of your requests was greater than 1GB (1073741824 bytes). How big are the files on average that you're indexing? What types of files are these?

Did you try reducing the elasticsearch-max-docs to 1 or 2 in the monstache TOML config (requires the lastest version of monstache)? You can also try reducing the gtm-channel-size to something very small like 2. This will cause back pressure and generally slow the rate at which documents are produced for indexing.

It seems that monstache is flooding ES. ES may not be able to keep up which the rate at which monstache is emitting large bulk requests. That is probably causing a lot of GC pressure.

from monstache.

amjustin13 avatar amjustin13 commented on August 17, 2024

My files are mostly just text/log files. I notices that I had some data in Mongo that did not belong that was creating the http request error, so I took those out and that error is no longer appearing.

I did reduce the elasticsearch-max-docs to 2 with the latest version of Monstache installed. I also reduced gtm-channel-size to 2.

I am still getting the OutOfMemoryError: Java heap space. I am allocating 4gb of heap to ES, but is this not enough? I am not sure what is really going on in the background with ES, what would I have to look into in order have this working on my development server?

Thanks again @rwynn,

AJ

from monstache.

rwynn avatar rwynn commented on August 17, 2024

Hi @amjustin13,

At this point since the issue you are facing seems to be specific to ES, it might be easier to ask some questions on the ES IRC channel and/or forums. You might want to mention you are using the mapper-attachment plugin, the count of log files in mongodb, the average size of a log file, and your heap size configuration. Hopefully, someone will be able to help you.

from monstache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.