richardwilly98 / elasticsearch-river-mongodb Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 216.0 14.78 MB

MongoDB River Plugin for ElasticSearch

Shell 0.18% JavaScript 6.77% Groovy 2.08% Java 87.50% HTML 0.97% Batchfile 2.49%

elasticsearch-river-mongodb's People

Contributors

Stargazers

Watchers

Forkers

nomic akhavi jamescarr samvj xma laigood huchunyu ajaxros val9 cogenta zhengreat gustavocoding spancer fragro jdalecki hopi alexning kwonder clkao bitted rafael-munoz mblodnick zealdin findly-inc contextworks pablomolnar woisio christos-papoulas rbung hemant19cse maxlang lilithwittmann shinytechtastic jassinm benmccann micka47 bernd jbinfo fmacicasan alexandra12 ianjw11 jidev chrisbg hengesense aweiland vantroy mallorymegan1984 johnnncodes smergler firdausramlan yilab duego lijinhui philmod kdkeck nicolastr fashtimedotcom llvtt mykabam micatom anuva smurp ekochnev cdosso syzer castrovilli fredoche sipims qraynaud suminda123 pmariano foxlik gusnips eon01 jlinn tmulin laxika minewhat noonanmedia renttherunway javajian deepakr199 l4inoday lsnyder prakashru lkanikka yuxinling priyanka1308 danielsmithsd dharshanr fc13240 likaiguo noah- bakergh suensummit maziadi gpstathis nai0om adammendoza goudru

elasticsearch-river-mongodb's Issues

how do I configure mongodb repl to to get this river working?

Can you suggest the best way to configure mongo replication for this river to work please? It would be great to know what to set in my mongodb.conf file without having to read up all about replication config etc.

version containing SNAPSHOT dependencies

Version 1.6.0 does not build using maven as it depends on non existing de.flapdoodle.embed:de.flapdoodle.embed.mongo:jar:1.28-SNAPSHOT
Janusz

No known previous slurping time

I tried to follow these instructions exactly:
https://gist.github.com/2029361

But when I run:
curl -XGET "http://localhost:9200/testmongo/_search?q=firstName:John"
I get:
{"error":"IndexMissingException[[testmongo] missing]","status":404}

The elasticsearch log just keeps repeating this:

java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(LinkedList.java:698)
at com.mongodb.DBCursor._next(DBCursor.java:453)
at com.mongodb.DBCursor.next(DBCursor.java:533)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:378)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Thread.java:680)
[2012-05-29 02:29:43,249][INFO ][river.mongodb ] [Node1] [mongodb][mongodb] No known previous slurping time for this collection
[2012-05-29 02:29:43,252][INFO ][node ] [Node1] {0.19.3}[5532]: stopping ...
[2012-05-29 02:29:43,261][INFO ][river.mongodb ] [Node1] [mongodb][mongodb] closing mongodb stream river
[2012-05-29 02:29:43,270][WARN ][river.mongodb ] [Node1] [mongodb][mongodb] A mongoDB cursor bug ?

And the mongodb log just keeps repeating this:
Tue May 29 02:29:43 [conn3] CMD fsync: sync:1 lock:1
Tue May 29 02:29:43 [conn3] removeJournalFiles
Tue May 29 02:29:43 [fsyncjob] db is now locked for snapshotting, no writes allowed. db.fsyncUnlock() to unlock
Tue May 29 02:29:43 [fsyncjob] For more info see http://www.mongodb.org/display/DOCS/fsync+Command
Tue May 29 02:29:43 [conn3] command: unlock requested

Any ideas on what I am doing wrong?

com.mongodb.MongoException: not talking to master and retries used up

My search is not working now. I guess because my index was not configured for replica set:

curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "mongo",
"host": "local",
"port": "40000",
"collection": "users"
},
"index": {
"name": "api",
"type": "users"
}
}'

Is there anyway to declare properly a replica set so that elasticsearch can find the master. Like PHP do:
$m = new Mongo("mongodb://localhost:40000,localhost:41000", array("replicaSet" => true));

How to proper use this plugin?

I've tried method from doc:

curl -X PUT "localhost:9200/_river/mongodb/_meta" -d '{
"type": "mongodb",
"mongodb":{
"db":"dbtest",
"collection":"users",
"index":{"
name":"mongoindex",
"type":"users"
}
}}'

after getting normal result:
{"ok":true,"_index":"_river","_type":"mongodb","_id":"_meta","_version":1}

I can access result on the nex url:
http://localhost:9200/dbtest/_search
but should be as far as I understand something like:
http://localhost:9200/dbtest/users/_search

Can someone explain me how to proper configure this to work with several collections

and is it possible to index not every field from mongo document but only a few?

error on IndexMissingException

Hi there, I've tried to follow the same issues regarding NoSuchElementException
{"error":"IndexMissingException[[mongoindex] missing]","status":404}

Below is ES.log, I've tried to set the log to debug and reinstall plugins. From the ES.log, it seems not even tried to tell if it did find the mongodb replicaset or not.

[2012-06-11 16:15:14,038][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: initializing ...
[2012-06-11 16:15:14,055][INFO ][plugins ] [Joe Fixit] loaded [river-mongodb, mapper-attachments], sites []
[2012-06-11 16:15:16,260][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: initialized
[2012-06-11 16:15:16,261][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: starting ...
[2012-06-11 16:15:16,362][INFO ][transport ] [Joe Fixit] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.1.11:9300]}
[2012-06-11 16:15:19,571][INFO ][cluster.service ] [Joe Fixit] detected_master [Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]], added {[Lighting Rod][mJb8jdIPQxWDpzrVK9B4ZA][inet[/192.168.1.11:9302]],[Living Eraser][qEpGWyf5S2SR4gP-jhcsUw][inet[/192.168.1.11:9303]],[Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]],}, reason: zen-disco-receive(from master [[Reyes, Cecelia][ecfXwyIWSSOo5T3m756Vvg][inet[/192.168.1.11:9301]]])
[2012-06-11 16:15:19,638][INFO ][discovery ] [Joe Fixit] elasticsearch/RPqNAlTZRG6kGaPqRgUIdw
[2012-06-11 16:15:19,641][INFO ][http ] [Joe Fixit] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.1.11:9200]}
[2012-06-11 16:15:19,641][INFO ][node ] [Joe Fixit] {0.19.4}[26020]: started

I also issued the command from other windows using mongod --replSet foo --port 27017 --dbpath /data/r0 --oplogSize 700, but without lucks, can you please provide any insights? In addition, will any oplog file will get generated I can spot it?

Thanks.

failed to create river [mongodb][mongodb] in log file

I am unable to properly use the mongo-river plugin with elasticsearch.

I followed the instructions on the front page after creating a replicate set in mongo called myset and doign rs.instantiate() to mongo.

I changed the XGet call to include mongoindex instead of testmongo.

I kept getting: -> { "error":"IndexMissingException[[mongoindex] missing]","status":404}

I checked the myset.log file, and it contains the following:

[2012-08-02 01:52:05,434][INFO ][node ] [Decay] {0.19.8}[25663]: initializing ...
[2012-08-02 01:52:05,440][INFO ][plugins ] [Decay] loaded [], sites [river-mongodb]
[2012-08-02 01:52:06,534][INFO ][node ] [Decay] {0.19.8}[25663]: initialized
[2012-08-02 01:52:06,534][INFO ][node ] [Decay] {0.19.8}[25663]: starting ...
[2012-08-02 01:52:06,592][INFO ][transport ] [Decay] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
[2012-08-02 01:52:09,649][INFO ][cluster.service ] [Decay] new_master [Decay][M7x4p7G1Sr2U58D362eoyw][inet[/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2012-08-02 01:52:09,700][INFO ][discovery ] [Decay] myset/M7x4p7G1Sr2U58D362eoyw
[2012-08-02 01:52:09,711][INFO ][http ] [Decay] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[/127.0.0.1:9200]}
[2012-08-02 01:52:09,712][INFO ][node ] [Decay] {0.19.8}[25663]: started
[2012-08-02 01:52:10,161][INFO ][gateway ] [Decay] recovered [2] indices into cluster_state
[2012-08-02 01:52:10,249][WARN ][river ] [Decay] failed to create river [mongodb][mongodb]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [mongodb]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:86)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:57)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:135)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:270)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:264)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:86)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.ClassNotFoundException: mongodb
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:72)
... 9 more

it seems to load the river-mongodb plugin but it gives me a warning that it cannot find mongodb for some reason. How do i get it to find it because i do have it working for other projects.

Gradle dependencies error

Hi, i'am stuck at start : /

Project with path ':elasticsearch' could not be found in root project 'elasticsearch-river-mongodb'.

Config successful,but mongodb gave an exception

Hi,all. Here it my river Configuration:

curl -XPUT 'http://192.168.1.206:9200/_river/mongodb/_meta' -d '{
type:"mongodb",
mongodb:{
host:"192.168.1.206",
port:27017,
db:"testes",
collection:"userlog"
},
index:{
name:"userlog",
type:"userlog",
bulk_size:1000,
bulk_timeout:30
}
}'

It run successful,but Mongodb throw an exception to ElasticSearch, as

[2012-04-11 15:33:20,140][ERROR][river.mongodb] [Bloodhawk] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: Could not lock the database for FullCollection sync
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:388)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Unknown Source)
[2012-04-11 15:33:20,156][INFO][river.mongodb] [Bloodhawk] [mongodb][mongodb] No known previous slurping time for this collection

Please tell me what happend, thank you!

com.mongodb.MongoException: can't find a master

If of elasticsearch and mongdb in the normal operation of a machine, if in a different machine error.

help me!

java code :

public void riverMongo3(){
Client client = EsticSearchClientFactory.getClient();
try {
client.prepareIndex("_river", "mongodb", "_meta")
.setSource(
jsonBuilder().startObject()
.field("type", "mongodb")
.startObject("mongodb")
.field("host","192.168.1.133")
.field("port",10000)
.field("db","jua")
.field("collection","blog")
.endObject()
.startObject("index")
.field("name","test")
.field("type","test")
.field("bulk_size","1000")
.field("bulk_timeout","30")
.endObject()
.endObject()
).execute().actionGet();
} catch (ElasticSearchException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

192.168.1.133:10000 can be visited;

error:

[2012-10-30 14:06:27,907][INFO ][cluster.metadata ] [Stygyro] [_river] update_mapping mongodb
[2012-10-30 14:06:28,034][INFO ][river.mongodb ] [Stygyro] [mongodb][mongodb] Using mongodb server(s): host [192.168.1.133], port [10000]
[2012-10-30 14:06:28,035][INFO ][river.mongodb ] [Stygyro] [mongodb][mongodb] starting mongodb stream: options: secondaryreadpreference [false], gridfs [false], filter [jua], db [test], indexing to [test]/[{}]
[2012-10-30 14:06:28,187][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,192][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,193][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,200][INFO ][paoding-analyzer ] postPropertiesLoaded init
[2012-10-30 14:06:28,200][INFO ][paoding-analyzer ] postPropertiesLoaded return
[2012-10-30 14:06:28,202][INFO ][index.analysis ] [Stygyro] [test] /home/www/es/elasticsearch/config/mmseg
[2012-10-30 14:06:28,216][INFO ][paoding-analyzer ] postPropertiesLoaded init
[2012-10-30 14:06:28,216][INFO ][paoding-analyzer ] postPropertiesLoaded return
[2012-10-30 14:06:28,247][INFO ][cluster.metadata ] [Stygyro] [test] creating index, cause [api], shards [5]/[1], mappings []
[2012-10-30 14:06:29,185][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master
at com.mongodb.DBTCPConnector.checkMaster(DBTCPConnector.java:437)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:313)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:298)
at com.mongodb.DB.getCollectionNames(DB.java:298)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:509)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:546)
at java.lang.Thread.run(Thread.java:662)
[2012-10-30 14:06:29,196][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master
at com.mongodb.DBTCPConnector.checkMaster(DBTCPConnector.java:437)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:208)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:313)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:298)
at com.mongodb.DB.getCollectionNames(DB.java:298)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:509)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:546)
at java.lang.Thread.run(Thread.java:662)
[2012-10-30 14:06:29,207][ERROR][river.mongodb ] [Stygyro] [mongodb][mongodb] Mongo gave an exception
com.mongodb.MongoException: can't find a master

Can river mongo select specific attributes for ElasticSearch to index

Hi we are looking for the functionality to select the attributes for ElasticSearch to index.
For example, i may have millions of records like this in my mongo collection:

{ "_id" : ObjectId("509e5cb863cade071b013552"),

"id" : "235601010750659014_6335261",

"tags" : [ "beach", "losangeles", "california" ],

"user" : { "username" : "xxxx", "website" : "", "bio" : "xxxxx", "profile_picture" : "http://some_website.com/xxx.jpg", "full_name" : "XXXXX", "id" : "1234" },

"comments" : { "count" : 12 },

"images" : { "low_resolution" : { "url" : "http://some_website.com/xxx.jpg", "width" : 306, "height" : 306 }, "thumbnail" : { "url" : "http://some_website.com/xxx.jpg", "width" : 150, "height" : 150 }, "standard_resolution" : { "url" : "http://some_website.com/xxx.jpg", "width" : 612, "height" : 612 } },
}

And suppose I just want index on document.tags, document.user.username, document.user.full_name, document.user.bio.
Can we use river to tell ElasticSearch to index just those attributes (even attributes within attributes)?
So this is different from filtering records by attribute using the new "script feature".
Thank you very much.

How about transaction

Hi,
I started to wonder how elasticsearch handles transaction with the shards – is it at all possible?
Is elsaticsearch transaction-less app – and is left to the main CRUD database to handle it?
Regards,
Janusz

NoSuchElementException and no search result.

Hi, I forgot something or did something wrong? there is no search result.
I google by "elasticsearch IndexMissingException" but can not solve it.
These(1-5) are what I do:
1, bin/mongod --directoryperdb --dbpath=/var/data/db --logpath=/var/data/log/mongodb.log --fork
2, bin/elasticsearch
3, curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
4, bin/mongo
use testmongo
db.person.save({firstName: "John", lastName: "Doe"})
5, curl -XGET "localhost:9200/testmongo/person/_search?q=firstName:John&pretty=true"

And I got:
{
"error" : "IndexMissingException[[testmongo] missing]",
"status" : 404
}
When I tail the elasticsearch.log, there are some exceptions:
java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(LinkedList.java:715)
at com.mongodb.DBCursor._next(DBCursor.java:453)
at com.mongodb.DBCursor.next(DBCursor.java:533)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:378)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353)
at java.lang.Thread.run(Thread.java:636)
[2012-04-06 18:19:30,636][INFO ][river.mongodb ] [Rama-Tut] [mongodb][mongodb] No known previous slurping time for this collection

My environment:
debian 6 64bit
openjdk 6 64bit
mongodb 2.0.4 Linux 64-bit
elasticsearch 0.19.2
and
plugin -install elasticsearch/elasticsearch-mapper-attachments/1.2.0
plugin -install richardwilly98/elasticsearch-river-mongodb/1.1.0

So, what's my wrong please?

mongodb to elasticsearch removal strategy

Question from Martin
Hi Richard,

I am successfully using your mongo 2 elastic river plugin to power the backend of my latest web project. Thank you for taking the time to develop a great bit of code.

I wonder if I could just pick your brains for a second?

The documents I'm pushing into elastic do need to be removed once a certain flag is set.
I want to keep the records in mongo.
I did look at your 'filter' param in the config, but you said there was a limitation where it would not delete records once they were already in elastic.

Do you have any ideas on how I could accomplish this?

Many Thanks,
Martin

Create river for multiple collections from a single mongodb

Hi,

can someone shed some light on the config on howto create a river for multiple collections from a single mongodb.
I already have a config for a single collection up and running.

Best regards,
Meykel

Add check if oplog collection exists.

The river should be disabled / stopped if this required collection does not exist.

Using Mapping along with mongo db

I want to make sure that the details attribute in the type "test" of Index "mongoindex" is not indexed but only stored. I tried the below two command but I can see elastic search still analyzing it.

curl -XPUT localhost:9200/mongoindex -d '{"settings": {"number_of_shards": 5,"number_of_replicas": 1},"mappings": {"test": {"properties": {"details": {"type": "string","index": "no","store":"yes"}}}}}'

curl -XPUT localhost:9200/_river/mongodb/_meta -d '{ "type": "mongodb", "mongodb": {"host":"localhost", "port":27017, "db":"testdb", "collection": "test"}, "index": {"name": "mongoindex", "type": "test"}}'

Regards
Saud Ur Rehman

Filter logic for river

Hi. I am just starting to investigate using the river concept for my mongodb/elasticsearch setup. The thing that I was wondering is if it would be possible to implement a filter on the river such that it will only detect changes that meet a certain criteria, e.g. in my case I don't want elasticsearch to grab a record from my mongodb until a certain field has been set. Is there a way to accomplish this with the current way that the river is implemented? Please advise. And thanks in advance.

Pattern matching in a collection name

I've found in MongoDBRiver.getIndexFilter() an interesting line with following code:

            filter.put(OPLOG_NAMESPACE, Pattern.compile(mongoOplogNamespace));

It seems like a bug or a half-feature... :)

mongoOplogNamespace is the concatenation of a DB name and a collection name via DOT!

    mongoOplogNamespace = mongoDb + "." + mongoCollection;

But DOT in patterns is any symbol.

So potentialy, it is possible to get all data from repository DB using a configuration like:

...
"mongodb": { 
    "db": "repo", 
    "collection": "i"
}
...

because it gives repo.i regular expression matches repository string. Also it matches any collections which are started from 'i' in the repo DB. Or repository collections in any DB.

I've faced with this problem when I duplicated collections in the MongoDB and the River caught those collections.

Nothing happens when I put the configuration.

I put

{
    type: "mongodb",
    mongodb: {
        "servers": [
            { host: "localhost", port: "27017" }
        ],
        "credentials": [
            {
                db: "local",
                user: "admin",
                password: "blabla"
            }
        ],
        db: "app_database", 
        collection: "apps",
        gridfs: false 
    },
    index: {
        name: "apps"
    }
}

into http://localhost:9200/_river/mongodb/_meta and nothing happens except the document created (which is what would happen even without the plugin). Yes, I installed the plugin with <ES_HOME>/bin/plugin -install richardwilly98/elasticsearch-river-mongodb/1.4.0 and it's dependency too. When I visit http://localhost:9200/apps/_search it says:

{
"error": "IndexMissingException[[apps] missing]",
"status": 404
}

Update > http://localhost:9200/_river/mongodb/_status says:

{
"_index": "_river",
"_type": "mongodb",
"_id": "_status",
"exists": false
}

Does that mean the plugin is not installed properly?

Sharded collections error

Im having a problem when applying the river against a MongoDB sharded environment (it should be supported on river version 1.6.0).

ES version: 0.20.1
River version: 1.6.0
MongoDB Server version: 2.2.0

Short version:

Im getting this error in ES log (the error loops forever until i forcibly stop ES):

[2012-12-14 17:51:58,837][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] mongoServersSettings: [{port=27017, host=mongo-flexicloud}]
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Server: mongo-flexicloud - 27017
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Using mongodb server(s): host [mongo-flexicloud], port [27017]
[2012-12-14 17:51:58,853][INFO ][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] starting mongodb stream. options: secondaryreadpreference [true], throttlesize [500], gridfs [false], filter [], db [AA], script [null], indexing to [aa]/[catalogs]
[2012-12-14 17:51:59,274][ERROR][org.elasticsearch.river.mongodb] [Elijah] [mongodb][aa-catalogs] Mongo gave an exception
com.mongodb.MongoException: can't use 'local' database through mongos
at com.mongodb.MongoException.parse(MongoException.java:82)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:314)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DB.getCollectionNames(DB.java:412)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.assignCollections(MongoDBRiver.java:715)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:756)
at java.lang.Thread.run(Unknown Source)

Long Version:

I will try to provide has much information as possible.

The rig (everything is running on Windows):

The MongoDB cluster:

Mongo-1 (shard 1 - master)
Mongo-2 (shard 1 - secondary)
Mongo-3 (shard 2 - master)
Mongo-4 (shard 2 - secondary)
Mongo-5 (mongos & arbiters) (dns alias: mongo-flexicloud)

Target database is named as "AA" with shardingEnabled and has 2 collections:

Accounts collection (40000 documents) - Sharded
Catalogs collection (57 documents) - Not sharded

The ElasticSearch cluster:
Cluster name: xpto

ES-1 (master: true, data: true)
ES-2 (master: true, data: true)
ES-3 (master: true, data: true)
ES-4 (master: true, data: true)
ES-5 (Coordinator, master: true, data: false) (dns alias: flexilastic)

I setted up the river successfully using the plugin install methods.
I also setted up the river plugin on the other nodes aswell, but it should have no impact whatsoever because im using the ES-5 node to perform the API operations.
Following is the ElasticSearch startup log from the node ES-5 startup:

[2012-12-14 17:49:55,436][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: initializing ...
[2012-12-14 17:49:55,560][INFO ][org.elasticsearch.plugins] [Elijah] loaded [river-mongodb, mapper-attachments], sites [bigdesk, head]
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: initialized
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.service] starting...
[2012-12-14 17:49:59,289][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: starting ...
[2012-12-14 17:49:59,398][INFO ][org.elasticsearch.transport] [Elijah] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.100.100.109:9300]}
[2012-12-14 17:50:02,565][INFO ][org.elasticsearch.cluster.service] [Elijah] detected_master [Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true}, added {[Jackman][54c5D21oTSyzCtw4svOGCw][inet[/10.100.100.123:9300]]{master=true},[Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true},[Lucy][jACaQpOrReishEwtDBAKww][inet[/10.100.100.103:9300]]{master=true},[Belamy][QkdX_dsDRlK08a7KLt2iug][inet[/10.100.100.124:9300]]{master=true},}, reason: zen-disco-receive(from master [[Cosby][hvYoobDTSRWSP47m5Tq4jg][inet[/10.100.100.107:9300]]{master=true}])
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.discovery] [Elijah] flexilastic/wpPROrKTT7i7omDqFXH8NQ
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.http ] [Elijah] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.100.100.109:9200]}
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.node ] [Elijah] {0.20.1}[4468]: started
[2012-12-14 17:50:02,612][INFO ][org.elasticsearch.service] running...

This is request i used to setup the river for the Catalogs collection:

PUT: http://flexilastic:9200/_river/aa-catalogs/_meta
Request BODY:

{
"type" : "mongodb",
"mongodb" : {
"servers" :
[{
"host" : "mongo-flexicloud",
"port" : "27017"
}
],
"db" : "AA",
"collection" : "Catalogs",
"gridfs" : false
},
"index" : {
"name" : "aa",
"type" : "catalogs"
}
}

The i got the following error (the error loops forever until i force ES to shutdown):

Cannot find oplog.rs collection

Is it possible to start synchronizing data without sharded mongo installation?
I just want to use it for development process on my local machine. There is only one mongo instance startet without any replication.

please can help some one

[2012-11-12 14:44:47,060][WARN ][bootstrap ] jvm uses the client vm, make sure to run java with the server vm for best performance by adding -server to the command line
[2012-11-12 14:44:47,087][INFO ][node ] [Stone] {0.19.9}[4480]: initializing ...
[2012-11-12 14:44:47,182][INFO ][plugins ] [Stone] loaded [river-mongodb, mapper-attachments], sites []
[2012-11-12 14:44:52,254][INFO ][node ] [Stone] {0.19.9}[4480]: initialized
[2012-11-12 14:44:52,278][INFO ][node ] [Stone] {0.19.9}[4480]: starting ...
[2012-11-12 14:44:52,514][INFO ][transport ] [Stone] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.10.18:9300]}
[2012-11-12 14:44:55,707][INFO ][cluster.service ] [Stone] new_master [Stone][faSz-0aNQf2EBZoZ2q-yQQ][inet[/192.168.10.18:9300]], reason: zen-disco-join (elected_as_master)
[2012-11-12 14:44:55,745][INFO ][discovery ] [Stone] elasticsearch/faSz-0aNQf2EBZoZ2q-yQQ
[2012-11-12 14:44:55,808][INFO ][http ] [Stone] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.10.18:9200]}
[2012-11-12 14:44:55,808][INFO ][node ] [Stone] {0.19.9}[4480]: started
[2012-11-12 14:44:55,840][INFO ][gateway ] [Stone] recovered [0] indices into cluster_state
[2012-11-12 14:45:49,647][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x016c14c0, /127.0.0.1:53945 => /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:46:38,616][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x016c14c0, /127.0.0.1:53945 :> /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:478)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:366)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:107)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:399)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:634)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:48:23,804][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x002f75e5, /127.0.0.1:53959 => /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:59:08,316][INFO ][node ] [Stone] {0.19.9}[4480]: stopping ...
[2012-11-12 14:59:08,338][WARN ][transport.netty ] [Stone] Exception caught on netty layer [[id: 0x002f75e5, /127.0.0.1:53959 :> /127.0.0.1:9300]]
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.1gb] exceeded [918.7mb]
at org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:31)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:478)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:366)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:107)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:399)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:634)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:390)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:261)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-12 14:59:08,368][INFO ][node ] [Stone] {0.19.9}[4480]: stopped
[2012-11-12 14:59:08,371][INFO ][node ] [Stone] {0.19.9}[4480]: closing ...
[2012-11-12 14:59:08,415][INFO ][node ] [Stone] {0.19.9}[4480]: closed
[2012-11-12 16:42:00,664][WARN ][bootstrap ] jvm uses the client vm, make sure to run java with the server vm for best performance by adding -server to the command line
[2012-11-12 16:42:00,680][INFO ][node ] [Order] {0.19.9}[4684]: initializing ...
[2012-11-12 16:42:00,767][INFO ][plugins ] [Order] loaded [river-mongodb, mapper-attachments], sites []
[2012-11-12 16:42:04,099][INFO ][node ] [Order] {0.19.9}[4684]: initialized
[2012-11-12 16:42:04,100][INFO ][node ] [Order] {0.19.9}[4684]: starting ...
[2012-11-12 16:42:04,482][INFO ][transport ] [Order] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/192.168.10.18:9300]}
[2012-11-12 16:42:07,709][INFO ][cluster.service ] [Order] new_master [Order][wixWS6fNTNWrF5hcT35TiA][inet[/192.168.10.18:9300]], reason: zen-disco-join (elected_as_master)
[2012-11-12 16:42:07,811][INFO ][discovery ] [Order] elasticsearch/wixWS6fNTNWrF5hcT35TiA
[2012-11-12 16:42:07,876][INFO ][http ] [Order] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/192.168.10.18:9200]}
[2012-11-12 16:42:07,877][INFO ][node ] [Order] {0.19.9}[4684]: started
[2012-11-12 16:42:07,943][INFO ][gateway ] [Order] recovered [0] indices into cluster_state

Converting a Standalone to a Replica Set

Hi Richard,
The article “Convert a Standalone to a Replica Set” describes how to set mongodb for a replica set using the console .
1)How do I set it for the windows service? Right now my service is started by the windows service manager like this:
“C:\mongodb\bin\mongod.exe" --config "c:\mongodb\mongod.cfg" –service”

Should I change it to:
“C:\mongodb\bin\mongod.exe" –port 27017 –replSet rs0 --config "c:\mongodb\mongod.cfg" –service”
… or this is one time initializatiom?
Do I run :
rs.initiate()
.... just once or every time I import any data into mongodb?

Is this the right forum to ask questions like this or there is another one?
Regards,
Janusz

Unable to create indexes NoShardAvailableActionException

Hi,

While mongo and elasticsearch seams to be running fine, I have trouble creating new indexes.
After some trails and error i noted that he river mongodb status seems to be erroneous...
The URL: http://localhost:9200/_river/mongodb/_status

Returns:
{"error":"NoShardAvailableActionException[[_river][0] No shard available for [[_river][mongodb][_status]: routing [null]]]","status":500}

Someone has clue what could be causing this?

EleastiSearch 0.20.2
River plugin 1.61
Mongodb 2.2.2

Do not use fsync/lock

It seems like your use case of fsynclock is not needed. Please remove your use of them.

no longer works after a period of inactivity

Mongodb 2.0.4 (10gen / debian squeeze)
Elasticsearch 0.19.0 & 0.19.2 debian build
elasticsearch-river-mongodb 1.1.0 with mongodb driver 2.7.2

it works well in the evening and the next day it no longer works... ?

no errors in logs :/

request body dose not work?

Hi,
I'm trying your mongodb river and i find that it can work with query string, but not request body. Any clue?

with query string

curl -XGET http://localhost:9200/mongoidx/jobs/_search?pretty=true&q=title:SA

It'll return search results as expected.

with quest body
curl -XPOST http://localhost:9200/mongoidx/jobs/_search?pretty=true -d '{
"query" : { "term" : { "title" : "SA" }}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
'

Thanks

Suggestion: Initial sync

Hey Richard,
I would like to suggest some sort of initial sync functionality (optional).

Something like when you create the river via the PUT api, some additional options regarding on how the user would like to perform the initial sync.

This would be a "one time" operation. I dont even know if it is possible...

The main issue is that not everything is on the oplog, especially for really large and stale collections...
So it would be nice to implement a set of options that would allow the user to tell the river to pull all data from mongo (much like a GetAll operation).

Of course we could discuss different strategies for pulling the data, such as:

GetAll (easy, but cumbersome for large collections)
via MongoDump, MongoExport of BsonDump
Others..?

It would be nice to support different import strategies, much like as plugins for this river.

Keep up the good work :)

Is river 1.5.0 working with mongodb 2.2.2/es 0.19.12?

I'm having the same problem as #37, except I'm running mongo 2.2.2/es 0.19.12. Everything setup, replset and all. No documents seem to make it down the river from mongo to es.

Could you please confirm 1.5.0 should be working with 2.2.2/0.19.12?

...
[2012-12-07 22:36:27,973][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] Using mongodb server(s): host [localhost], port [27017]
[2012-12-07 22:36:27,973][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [true], filter [], db [testmongo], indexing to [testmongo]/[files]
[2012-12-07 22:36:27,974][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-12-07 22:36:28,142][INFO ][river.mongodb            ] [Astronomer] [mongodb][mongogridfs] No known previous slurping time for this collection

That's the last. The files collection is being filled with pdf's using mongofiles, but sadly no indexing takes place. I've been banging my head for the last couple of hours or so. Please shed some light on this. Thanks a bunch.

Regards,
Roland.

Automatic plugin deployment is broken with ES 0.20.2

Hi,

This command does not work anymore with ES 0.20.2:
plugin.bat -install richardwilly98/elasticsearch-river-mongodb/1.6.1

Use the command as temporary solution:
plugin.bat -url https://github.com/downloads/richardwilly98/elasticsearch-river-mongodb/elasticsearch-river-mongodb-1.6.1.zip -install river-mongodb

Thanks,
Richard.

Exception: java.lang.NoSuchMethodError: com.mongodb.Mongo.fsyncAndLock()

Following the wiki / example. I get this exception:

[2012-03-13 12:38:10,971][INFO ][cluster.metadata ] [Metalhead] [mongoindex] creating index, cause [api], shards [5]/[1], mappings [] [2012-03-13 12:38:11,596][INFO ][river.mongodb ] [Metalhead] [mongodb][mongodb] No known previous slurping time for this collection Exception in thread "elasticsearch[Metalhead]mongodb_river_slurper-pool-26-thread-1" java.lang.NoSuchMethodError: com.mongodb.Mongo.fsyncAndLock()Lcom/mongodb/CommandResult; at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.processFullCollection(MongoDBRiver.java:375) at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:353) at java.lang.Thread.run(Thread.java:636) [2012-03-13 12:38:11,845][INFO ][cluster.metadata ] [Metalhead] [_river] update_mapping [mongodb] (dynamic)

issues with mongo 2.2.1

Just wondering if this known to work at all with the latest version of mongo 2.2.1? I've got a replica set and elasticsearch seems to start up fine, but when I add data nothing happens. Also it doesn't seem to initially import any data. Here's my ES log:

[2012-10-30 23:32:14,131][INFO ][discovery ] [Umar] elasticsearch/435P26SvQLGKaxfCf1G_kg
[2012-10-30 23:32:14,589][INFO ][http ] [Umar] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.0.19:9200]}
[2012-10-30 23:32:14,659][INFO ][node ] [Umar] {0.19.11}[2009]: started
[2012-10-30 23:32:33,094][INFO ][gateway ] [Umar] recovered [3] indices into cluster_state
[2012-10-30 23:32:45,167][INFO ][river.mongodb ] [Umar] [mongodb][mongodb] Using mongodb server(s): host [localhost], port [27017]
[2012-10-30 23:32:45,235][INFO ][river.mongodb ] [Umar] [mongodb][mongodb] starting mongodb stream: options: secondaryreadpreference [false], gridfs [false], filter [testmongo], db [mongoindex], indexing to [person]/[{}]
[2012-10-30 23:32:46,104][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] Using mongodb server(s): host [localhost], port [27017]
[2012-10-30 23:32:46,132][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] starting mongodb stream: options: secondaryreadpreference [false], gridfs [true], filter [testmongo], db [testmongo], indexing to [files]/[{}]
[2012-10-30 23:32:46,171][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-10-30 23:32:49,754][INFO ][river.mongodb ] [Umar] [mongodb][mongogridfs] No known previous slurping time for this collection

It stops there and even when I add new data nothing gets added to this log. ES is still running since I can run queries. This is the result of running this: curl -XGET 'http://localhost:9200/testmongo/_count'

{"count":0,"_shards":{"total":5,"successful":5,"failed":0}}

Anyways not even sure if 2.2.1 is supported. If not what is the highest version of mongo supported?

Partial update support

@richard

seems like partial updates, like increment are not supported now?
Logs on this

Cannot get object id. Skip the current item: [{$set={ "userLikesCount" : 1}, _id=null}]
Oplog do not contain _id inside "o", but there is "o2" which contains
the _id of the document.

Do you think you could fix this?
Thanks

geo_point mapping

Hi,
I have a class called Asset with the member annotated like this:
@Embedded
@indexed(IndexDirection.GEO2D)
private GpsLocation location;// morphia annotation

The “GpsLocation“ object is an instance of a simple class containing two Double latitude/longitude variables.
When I run my tool to create all Asset objects in mongoDB from xml file the Asset collection is created and indexed into elasticsearch db. When I execute code using mongodb and morphia code:
Query dbAsset = productsDAO.getDatastore().find(AssetDB.class).field("location").near(
latitude, longitude, 5);
return fromDbList(dbAsset.asList(), resolve);

… I am getting the correct results.
However mongoDB river creates an Asset where location field looks like this:
"location": {
"dynamic": "true",
"properties": {
"latitude": {
"type": "double"
},
"longitude": {
"type": "double"
}
}
}
… so location is not of type geo_point.
Is it possible to fix it in mongoDB river or by the time it get to river plugin all info that it is geo location is lost?
Regards,
Janusz

failed bulk item index

I am failing on the indexing of the following object. I suspect, however, that this may be an issue with the json schema. I have sub objects that are arrays in the case where multiple exist (such as location), however, they are simply objects where there is only one.

[Masque] [clinicaltrialindex][4] failed to execute bulk item (index) index {[clinicaltrialindex][clinicaltrial][4ffdebc4bc313a65577ec5bf], source[{"_id":"4ffdebc4bc313a65577ec5bf","brief_summary":{"textblock":"The purpose of this study is to see if it is safe and effective to give an experimental anti-HIV drug, adefovir dipivoxil (ADV), in combination with other anti-HIV drugs (HAART) to patients who have a viral load (level of HIV in the blood) between 50 and 400 copies/ml."},"brief_title":"A Study on the Safety and Effectiveness of Adefovir Dipivoxil in Combination With Anti-HIV Therapy (HAART) in HIV-Positive Patients","condition":"HIV Infections","condition_browse":{"mesh_term":["HIV Infections","Acquired Immunodeficiency Syndrome"]},"detailed_description":{"textblock":"Patients are randomized to 1 of 2 arms in a 2:1 ratio. Approximately 260 patients receive ADV and approximately 130 patients receive placebo. Patients receive ADV or placebo in addition to L-carnitine and their current stable HAART regimen. Each patient receives blinded study medication for 48 weeks and is evaluated at Weeks 16, 24, and 48. Patients who reach the primary endpoint of virologic failure prior to Week 48 may continue blinded study medication or receive open-label ADV at the investigator's discretion. In both cases, patients continue their study visits as per the original visit schedule. Virologic failure is defined as 2 consecutive HIV-1 RNA measurements, after baseline, above 400 copies/ml (measured by the Roche Amplicor HIV-1 Monitor UltraSensitive assay) drawn at least 14 days apart. All patients who complete study visits without treatment-limiting ADV toxicity may continue open-label ADV in the Maintenance Phase at the discretion of the principal investigator."},"eligibility":{"criteria":{"textblock":"Inclusion Criteria You may be eligible for this study if you: - Are HIV-positive. - Have been on a stable HAART regimen consisting of at least 3 antiretroviral drugs for at least 16 weeks prior to study entry. - Have a CD4 count of 50 cells/mm3 or more. - Have a viral load greater than 50 and less than or equal to 400 copies/ml within 14 days prior to study entry. - Have had at least 1 additional viral load in the past that was less than or equal to 400 copies/ml while on your current stable HAART regimen."},"gender":"Both","minimum_age":"N/A","maximum_age":"N/A","healthy_volunteers":"No"},"enrollment":"390","firstreceived_date":"November 2, 1999","has_expanded_access":"No","id":"NCT00002426","id_info":{"org_study_id":"232K","secondary_id":"GS-97-415","nct_id":"NCT00002426"},"intervention":{"intervention_type":"Drug","intervention_name":"Adefovir dipivoxil"},"intervention_browse":{"mesh_term":["Adefovir","Adefovir dipivoxil","Reverse Transcriptase Inhibitors"]},"keyword":["HIV-1","RNA, Viral","VX 478","Reverse Transcriptase Inhibitors","Anti-HIV Agents","Viral Load"],"lastchanged_date":"June 23, 2005","location":[{"facility":{"name":"Pacific Oaks Research","address":{"city":"Beverly Hills","state":"California","zip":"90211","country":"United States"}}},{"facility":{"name":"ViRx Inc","address":{"city":"Palm Springs","state":"California","zip":"92262","country":"United States"}}},{"facility":{"name":"Ctr for AIDS Research / Education and Service (CARES)","address":{"city":"Sacramento","state":"California","zip":"95814","country":"United States"}}},{"facility":{"name":"San Francisco VA Med Ctr","address":{"city":"San Francisco","state":"California","zip":"94121","country":"United States"}}},{"facility":{"name":"Kaiser Foundation Hospital","address":{"city":"San Francisco","state":"California","zip":"94118","country":"United States"}}},{"facility":{"name":"San Francisco Gen Hosp / UCSF AIDS Program","address":{"city":"San Francisco","state":"California","zip":"94110","country":"United States"}}},{"facility":{"name":"Blick Med Associates","address":{"city":"Stamford","state":"Connecticut","zip":"06901","country":"United States"}}},{"facility":{"name":"George Washington Univ Med Ctr","address":{"city":"Washington","state":"District of Columbia","zip":"20037","country":"United States"}}},{"facility":{"name":"Georgetown Univ Med Ctr","address":{"city":"Washington","state":"District of Columbia","zip":"20007","country":"United States"}}},{"facility":{"name":"Dupont Circle Physicians Group","address":{"city":"Washington","state":"District of Columbia","zip":"200091104","country":"United States"}}},{"facility":{"name":"IDC Research Initiative","address":{"city":"Altamonte Springs","state":"Florida","zip":"32701","country":"United States"}}},{"facility":{"name":"Community AIDS Resource Inc","address":{"city":"Coral Gables","state":"Florida","zip":"33146","country":"United States"}}},{"facility":{"name":"TheraFirst Med Ctrs Inc","address":{"city":"Fort Lauderdale","state":"Florida","zip":"33308","country":"United States"}}},{"facility":{"name":"Duval County Health Department","address":{"city":"Jacksonville","state":"Florida","zip":"32206","country":"United States"}}},{"facility":{"name":"Health Positive","address":{"city":"Safety Harbor","state":"Florida","zip":"34695","country":"United States"}}},{"facility":{"name":"Center for Quality Care","address":{"city":"Tampa","state":"Florida","zip":"33609","country":"United States"}}},{"facility":{"name":"Georgia Research Associates","address":{"city":"Atlanta","state":"Georgia","zip":"30342","country":"United States"}}},{"facility":{"name":"Rush Presbyterian - Saint Luke's Med Ctr","address":{"city":"Chicago","state":"Illinois","zip":"60612","country":"United States"}}},{"facility":{"name":"Indiana Univ Infectious Disease Research Clinic","address":{"city":"Indianapolis","state":"Indiana","zip":"46202","country":"United States"}}},{"facility":{"name":"Johns Hopkins Univ School of Medicine","address":{"city":"Baltimore","state":"Maryland","zip":"21205","country":"United States"}}},{"facility":{"name":"Albany Med College","address":{"city":"Albany","state":"New York","zip":"12208","country":"United States"}}},{"facility":{"name":"Mount Sinai Med Ctr","address":{"city":"New York","state":"New York","zip":"10029","country":"United States"}}},{"facility":{"name":"St Luke Roosevelt Hosp","address":{"city":"New York","state":"New York","zip":"10011","country":"United States"}}},{"facility":{"name":"Bentley-Salick Med Practice","address":{"city":"New York","state":"New York","zip":"10011","country":"United States"}}},{"facility":{"name":"James Jones MD","address":{"city":"New York","state":"New York","zip":"10019","country":"United States"}}},{"facility":{"name":"Wake Forest Univ School of Medicine","address":{"city":"Winston Salem","state":"North Carolina","zip":"27157","country":"United States"}}},{"facility":{"name":"Associates of Med and Mental Health","address":{"city":"Tulsa","state":"Oklahoma","zip":"74114","country":"United States"}}},{"facility":{"name":"The Research and Education Group","address":{"city":"Portland","state":"Oregon","zip":"97210","country":"United States"}}},{"facility":{"name":"Roger Williams Med Ctr","address":{"city":"Providence","state":"Rhode Island","zip":"02908","country":"United States"}}},{"facility":{"name":"Miriam Hosp","address":{"city":"Providence","state":"Rhode Island","zip":"02906","country":"United States"}}},{"facility":{"name":"Vanderbilt Univ School of Medicine","address":{"city":"Nashville","state":"Tennessee","zip":"37212","country":"United States"}}},{"facility":{"name":"Univ of Texas Southwestern Med Ctr of Dallas","address":{"city":"Dallas","state":"Texas","zip":"75235","country":"United States"}}},{"facility":{"name":"Univ of Texas Med Branch","address":{"city":"Galveston","state":"Texas","zip":"77555","country":"United States"}}},{"facility":{"name":"Thomas Street Clinic","address":{"city":"Houston","state":"Texas","zip":"77009","country":"United States"}}},{"facility":{"name":"Univ of Utah Med School / Clinical Trials Ctr","address":{"city":"Salt Lake City","state":"Utah","zip":"84108","country":"United States"}}},{"facility":{"name":"Infectious Disease Physicians Inc","address":{"city":"Annandale","state":"Virginia","zip":"22203","country":"United States"}}},{"facility":{"name":"N Touch Research Corp","address":{"city":"Seattle","state":"Washington","zip":"98122","country":"United States"}}},{"facility":{"name":"St Paul's Hosp","address":{"city":"Vancouver","state":"British Columbia","country":"Canada"}}},{"facility":{"name":"Sunnybrook Health Science Centre","address":{"city":"Toronto","state":"Ontario","country":"Canada"}}},{"facility":{"name":"Centre hospitalier de l'Universite de Montreal (CHUM)","address":{"city":"Montreal","state":"Quebec","country":"Canada"}}},{"facility":{"name":"Hopital Edouard Herriot","address":{"city":"Lyon Cedex 03","country":"France"}}},{"facility":{"name":"Hopital Sainte-Marguerite","address":{"city":"Marseille","country":"France"}}},{"facility":{"name":"Klinikum Der Johann Wolfgang Goethe Universitat","address":{"city":"Frankfurt","country":"Germany"}}},{"facility":{"name":"Universitatskrankenhaus Eppendorf","address":{"city":"Hamburg","country":"Germany"}}},{"facility":{"name":"Klinikum der Ludwig-Maximilians-Universitaet","address":{"city":"Muenchen","country":"Germany"}}},{"facility":{"name":"Royal Free Hosp","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"King's College Hospital","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"Chelsea and Westminster Hosp","address":{"city":"London","country":"United Kingdom"}}},{"facility":{"name":"Senior Lecturer in GU Medicine","address":{"city":"London","country":"United Kingdom"}}}],"location_countries":{"country":["United States","Canada","France","Germany","United Kingdom"]},"official_title":"A Randomized, Double-Blind, Placebo-Controlled, Multicenter Study of the Safety and Efficacy of Adefovir Dipivoxil as Intensification Therapy in Combination With Highly Active Antiretroviral Therapy (HAART) in HIV Infected Patients With HIV-1 RNA > 50 and <= 400 Copies/Ml","overall_status":"Completed","oversight_info":{"authority":"United States: Food and Drug Administration"},"phase":"N/A","required_header":{"download_date":"Information obtained from ClinicalTrials.gov on July 10, 2012","link_text":"Link to the current ClinicalTrials.gov record.","url":"http://clinicaltrials.gov/show/NCT00002426"},"source":"NIH AIDS Clinical Trials Information Service","sponsors":{"lead_sponsor":{"agency":"Gilead Sciences","agency_class":"Industry"}},"study_design":"Endpoint Classification: Safety Study, Masking: Double-Blind, Primary Purpose: Treatment","study_type":"Interventional","verification_date":"December 1999"}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping for [clinicaltrial] tried to parse as object, but got EOF, has a concrete value been provided to it?
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:447)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:437)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:311)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Doesn't work with sharded collections.

The oplog is only available for local transactions. How to capture all data from MongoDB in a sharded environment?

data being indexed on elasticsearch does not get pushed to mongodb

I initialized the river as documented in the wiki. Replica set is set up, but I only have a single replication server.

When I insert data in mongodb it gets pushed to elasticsearch:

mongo
PRIMARY> use DBNAME
PRIMARY> entry = {    "user" : "phil",
...     "post_date" : "2009-11-15T14:12:12",
...     "message" : "trying out Elastic Search"}
{
    "user" : "phil",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}

db.contacts.insert(entry)

$ curl -XGET 'http://localhost:9200/contacts/_search?pretty=true&size=5000' -d '
> { 
>     "query" : { 
>         "matchAll" : {} 
>     } 
> }'
{
  "took" : 62,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "contacts",
      "_type" : "contact",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
    "user" : "phil",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}
    } ]
  }

However, when I index data to elasticsearch issuing a PUT request, the data does not show up in mongodb (data from above has been cleared before executing the following sample)

$ curl -XPUT 'http://localhost:9200/contacts/contact/1' -d '{
>     "user" : "kimchy",
>     "post_date" : "2009-11-15T14:12:12",
>     "message" : "trying out Elastic Search"
> }'
{"ok":true,"_index":"contacts","_type":"contact","_id":"1","_version":1}forste@machine:~/opt$ 
$ url -XGET 'http://localhost:9200/contacts/_search?pretty=true&size=5000' -d '
{                     
    "query" : {                         
        "matchAll" : {} 
    } 
}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "contacts",
      "_type" : "contact",
      "_id" : "1",
      "_score" : 1.0, "_source" : {
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}
    } ]
  }
}

$ mongo
PRIMARY> use DBNAME
PRIMARY> db.contacts.find()
PRIMARY>

Does the river work both ways?

Initial import does not work

Hi,

It looks like the river initial import does not work. My setup is the following:

ES 0.19.11
MongoDB 2.2.1
River 1.5.0

After a quick look at the code, a wild guess would be that this appeared when #31 was fixed.
In fact, the method Slurper#getIndexFilter does not return null anymore when there is no input timestamp. This means that the first slurper loop won't execute processFullCollection().

Let me know, if you need more information.
Cheers,
Emmanuel

Issue with river and gridFS configuration

Note: a non gridFS mongo stream (people from the example) works fine

This is on Mac OS X Lion

[2012-06-08 12:07:37,352][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] starting mongodb stream: host [localhost], port [27017], gridfs [true], filter [testmongo], db [mongoindex], indexing to [files]/[{}]
[2012-06-08 12:07:37,355][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] Mapping: {"files":{"properties":{"content":{"type":"attachment"},"filename":{"type":"string"},"contentType":{"type":"string"},"md5":{"type":"string"},"length":{"type":"long"},"chunkSize":{"type":"long"}}}}
[2012-06-08 12:07:37,408][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:37,914][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:38,417][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:38,920][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection
[2012-06-08 12:07:39,422][INFO ][river.mongodb ] [Nameless One] [mongodb][mongodb] No known previous slurping time for this collection

plugins/river-mongodb contains:

elasticsearch-river-mongodb-1.3.0-SNAPSHOT.jar
mongo-java-driver-2.7.2.jar

elasticsearch-0.19.4

2.0.6 MongoDB

PRIMARY> rs.status();
{
"set" : "foo",
"date" : ISODate("2012-06-08T19:14:21Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1339182415000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-08T19:06:55Z"),
"self" : true
},
{
"_id" : 1,
"name" : "localhost:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 2658,
"optime" : {
"t" : 1339182415000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-08T19:06:55Z"),
"lastHeartbeat" : ISODate("2012-06-08T19:14:21Z"),
"pingMs" : 0
}
],
"ok" : 1
}

Can I retrieve attribute from mongo db while searching elastic search index (mongoIndex).

Hi,

I have three fields id, name and details in mongo db but I am only indexing id and name
in elastic search and setting details to null using script filter . Can i some how query the
elastic search and force it to include details field from mongo db in the same single request.

Regards
Saud Ur Rehman.

Front Page Examples Don't Work

I'm not sure if there something insanely retarded I am doing but I cannot for the life of me seem to get this to work. I have installed the mapper plugin (1.2.0 but also tried 1.1.0 and 1.3.0), 1.1.0 of this plugin, restarted elasticsearch, then followed the example on the front page and in other sections of the wiki. It doesn't seem to index anything.

Is there something I missed? I am running elasticsearch 0.19.

Thanks,
James

Slurping large collections

We have large (15million+ documents, 30GB) collections in Mongodb.
Our servers have 16GB ram & 8 cores, fast local storage and 10GB ethernet.

Trying to use river to auto-syncronise elasticsearch and mongodb.

When I start a river running, elasticsearchs memory use appears to climb without limit, eventually getting stuck in a garbage collection loop and failing.

Examination of the code suggests the stream between the slurper and indexer threads is growing unboundedly as the indexer cannot keep up with the slurper. (the slurp is sustaining about 100Mbit, 5000-10000 documents per second)

Perhaps a slurp rate throttle or maximum stream queue size would allow the slurper to back off and let the indexer catch up.

Links in home page talking about memory issues

Hello,

On the github homepage, there is:
"For the initial implementation see tutorial"
http://www.matt-reid.co.uk/blog_post.php?id=68#&slider1=4

On this link, at the end, in the comments, we can find:
"Matthew Reid · Norwich, Norfolk
Note to future readers. I have since come across memory problems with the mongodb river so have reverted back to manually re-indexing documents!"

I find it quite confusing to have that on the home page of the plugin.
If there are memory issues to know about the plugin, is it possible to explain them?
If there are not and it's a plugin misuse, can someone explain that misuse and remove that link please?

org.elasticsearch.common.collect.computationexception

Hey guys,

Iam trying to develop a native elasticsearch service for my project. Im using Java API and followed all guide but i got these exception below:

org.elasticsearch.common.collect.ComputationException: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/ga/IrishAnalyzer

I searched any-all Google results and much of them say that this issue is related with Maven dependency besides it is not meaningful to me. Here is my usage:

Node node = NodeBuilder.nodeBuilder().node();
Client _nodeClient = node.client();
SearchResponse resSearch = null;
try{
resSearch = _nodeClient.prepareSearch("videos")
.setSearchType(SearchType.DEFAULT)
.setQuery(QueryBuilders.queryString("q:anytext"))
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();

Could you give a hand pls? I can not keep moving on :(

Identify index field name freely while using mongo river

Hi Richard,

While using mongo river, I can't identify the mapping properties' names freely, except using the collections' fields' names. I suggest mongo river support this feature so that users can name the properties names freely or even identify which fields to river, which fields to exclude.

e.g. I have a collection 'users' which has two properties 'user_name' and 'user_age' in mongodb, while river 'users', I have to configure my mapping as below to make the river job run successfully:

{
"user" :
{
"properties" :
{
"user_name":
{
"type":"string",
"index_analyzer":"ansj",
"search_analyzer":"ansj",
"null_value" : "NA"
},
"user_age":
{
"type":"string",
"null_value" : "NA"
}
}
}
}

I hope mongo river can support identify properties names freely, in my case, if only I can identify properties names like 'username', 'age'. Or in some case, I only want to river 'user_name', with 'user_age' excluded.

Thanks,
Spancer

Mongodb with username, password

My mongodb has username/password protection. How can i use it in the configuration of river?
I tried:
curl -XPUT "localhost:9200/_river/mongodb/_meta" -d '
{
"type": "mongodb",
"mongodb": {
"db": "mydb",
"host": "localhost",
"collection": "mycol",
"user": "myuser",
"password": "mypassword"
},
"index": {
"name": "myindex",
"type": "mytype"
}
}';

but i get an empty index and in the log i see: [mongodb][mongodb] Invalid credential

Indexing of document with property "attachment" fails

I have a collection of activities. The activity document contains a property "attachment" which at the moment just holds a string. The river doesn't seem to like the property name "attachment". I had to change the name to make it work.
Apart from that weird behavior the river works fine for all my collections.

what does throttlesize do?

The readme would benefit from a brief description of what the throttlesize param does.

Obviously it throttles something, but what? In what situations would you need to change it? Is giving the Java vm more memory an alternative? Are there any consequences of raising/lowering it? Does it mean that there's a delay in indexing under certain circumstances?

I've read #30 and #23 but still don't have a great understanding of it.

Failed to load class with value [mongodb]

I successfully installed elasticsearch with the river Windows in the past, but I cannot do that in Debian. I installed the plugin with bin/plugin as shown in the wiki, and the plugins directory contains exactly what it should:

`-- plugins
    |-- mapper-attachments
    |   |-- elasticsearch-mapper-attachments-1.4.0.jar
    |   `-- tika-app-1.1.jar
    `-- river-mongodb
        |-- elasticsearch-river-mongodb-1.4.0-SNAPSHOT.jar
        `-- mongo-java-driver-2.8.0.jar

When I run elasticsearch it says it's loading the plugin, but http://localhost:9200/_river/mongodb/_status says this error:

NoClassSettingsException[Failed to load class with value [mongodb]]; nested: ClassNotFoundException[mongodb];