Comments (7)
Hi,
That should not be an issue.
Can you please provide more details of your configuration?
- river setting.
- ES / mongo log files
Thanks,
Richard.
On Tue, Nov 6, 2012 at 11:27 AM, egueidan [email protected] wrote:
Hi,
It looks like the river initial import does not work. My setup is the
following:ES 0.19.11
MongoDB 2.2.1
River 1.5.0After a quick look at the code, a wild guess would be that this appeared
when #31https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/31was fixed.
In fact, the method Slurper#getIndexFilter does not return null anymore
when there is no input timestamp. This means that the first slurper loop
won't execute processFullCollection().Let me know, if you need more information.
Cheers,
Emmanuel—
Reply to this email directly or view it on GitHubhttps://github.com//issues/38.
from elasticsearch-river-mongodb.
Sure,
the river setting is trivial:
curl -XPUT "localhost:9200/_river/mongo_mydb_mycollec/_meta" -d '
{
type: "mongodb",
mongodb: {
db: "mydb",
collection: "mycollec"
},
index: {
name: "mydb",
type: "mycollec"
}
}
'
The mongo log only shows connections opening (no errors) e.g.:
...
Tue Nov 6 17:58:33 [initandlisten] connection accepted from 127.0.0.1:54303 #232 (11 connections now open)
...
The ES log shows for each river (I create multiple ones) the following lines:
[2012-11-06 17:58:33,678][INFO ][cluster.metadata ] [Superia] [_river] update_mapping [mongo_mydb_mycollec] (dynamic)
[2012-11-06 17:58:33,699][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] Using mongodb server(s): host [localhost], port [27017]
[2012-11-06 17:58:33,699][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [false], filter [], db [mydb], indexing to [mydb]/[mycollec]
[2012-11-06 17:58:33,923][INFO ][cluster.metadata ] [Superia] [_river] update_mapping [mongo_mydb_mycollec] (dynamic)
[2012-11-06 17:58:33,961][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] No known previous slurping time for this collection
The 'mydb' index remains empty. If I update an element in mongo it will be correctly picked up. The interesting part is that if wipe out ES and restart the river the updated element will show up at restart (but only that updated element). I guess this is because the update operation is still in the oplog. It is important to note that the replSet has been activated on mongo after the data was inserted (I just activated it to be able to use the river from now on).
FYI, I've cloned the code and tried returning null in the getIndexFilter method when the time is null:
...
if (time == null) {
logger.info("No known previous slurping time for this collection");
return null;
}
...
With this modification (which I understand is not satisfying for the purpose of filtered rivers), all my data is picked up.
Thanks,
Emmanuel
from elasticsearch-river-mongodb.
Hi,
It is a requirement to have the replica set setup before to start importing
data as the river rely on oplog.rs collection.
None of the data imported the replica set was created will be indexed in ES.
Thanks,
Richard.
On Tue, Nov 6, 2012 at 12:26 PM, egueidan [email protected] wrote:
Sure,
the river setting is trivial:
curl -XPUT "localhost:9200/_river/mongo_mydb_mycollec/_meta" -d '
{
type: "mongodb",
mongodb: {
db: "mydb",
collection: "mycollec"
},
index: {
name: "mydb",
type: "mycollec"
}
}
'The mongo log only shows connections opening (no errors) e.g.:
...
Tue Nov 6 17:58:33 [initandlisten] connection accepted from 127.0.0.1:54303 #232 (11 connections now open)
...The ES log shows for each river (I create multiple ones) the following
lines:[2012-11-06 17:58:33,678][INFO ][cluster.metadata ] [Superia] [_river] update_mapping mongo_mydb_mycollec
[2012-11-06 17:58:33,699][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] Using mongodb server(s): host [localhost], port [27017]
[2012-11-06 17:58:33,699][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [false], filter [], db [mydb], indexing to [mydb]/[mycollec]
[2012-11-06 17:58:33,923][INFO ][cluster.metadata ] [Superia] [_river] update_mapping mongo_mydb_mycollec
[2012-11-06 17:58:33,961][INFO ][river.mongodb ] [Superia] [mongodb][mongo_mydb_mycollec] No known previous slurping time for this collectionThe 'mydb' index remains empty. If I update an element in mongo it will be
correctly picked up. The interesting part is that if wipe out ES and
restart the river the updated element will show up at restart (but only
that updated element). I guess this is because the update operation is
still in the oplog. It is important to note that the replSet has been
activated on mongo after the data was inserted (I just activated it to
be able to use the river from now on).
FYI, I've cloned the code and tried returning null in the getIndexFilter
method when the time is null:...if (time == null) {
logger.info("No known previous slurping time for this collection");
return null;} ...With this modification (which I understand is not satisfying for the
purpose of filtered rivers), all my data is picked up.Thanks,
Emmanuel—
Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-10119334.
from elasticsearch-river-mongodb.
Ok but that used to work (tested with 1.4.0)... Also, I might be wrong but the oplog being a capped collection, it won't contain all the operations that ever happened in mongo. This means that the first time you use the river you have to query directly the slurped collection for all elements (which is what processFullCollection does if I understand correctly). And that does not rely on the oplog.
Thanks for your help,
Emmanuel
from elasticsearch-river-mongodb.
Hi,
Just tested:
- Start mongo instance (no replica set yet)
- Create a document in mydb/mycollec
- Restart mongo instnace with --replSet
- Call rs.initiate()
- Check local/oplog.rs there is only one record about "initiating set"
So even if the new filter was there the river would not find the document that has been created before the replica set was initiated.
The new filter implementation also make sure that only data related to the collection monitored are returned (as opposed as before where everything was returner to the river).
Thanks,
Richard.
from elasticsearch-river-mongodb.
I have exactly the same problem where processFullCollection is not called the first time it's loaded from a new _river/mongodb/_meta.
I built elasticsearch and the plugins from master cloned locally.
processFullCollection is only triggered when oplogCursor method returns null. But that's never the case for a collection that is existing. Is there an external way to trigger a fullcollection process for the first time elasticsearch is ran?
I also agree with egueidan: the oplog won't give you all the records that need to be indexed. It seems that a first time run should upload ALL mongodb docs to the es indexer. Did I miss anything?
How can the search indexer be loaded from the mongodb river on initial setup? I need to implement the search on top of an existing mongodb configuration already containing 100,000 documents. How can I index these documents? Please advise.
from elasticsearch-river-mongodb.
Hi,
If a collection has been created before the replica set then the river will no be able to index the documents.
The recommendation is:
- Use mongodump [1] to export the collection.
- Drop the existing collection
- Use mongorestore [2] to reimport the collection.
[1] - http://docs.mongodb.org/manual/reference/mongodump/
[2] - http://docs.mongodb.org/manual/reference/mongorestore/#bin.mongorestore
Thanks,
Richard.
from elasticsearch-river-mongodb.
Related Issues (20)
- Make the river more resistant to bulk import failures HOT 1
- Compatibility with ES 2.0 & MongoDB 3.0 or 2.6.11 HOT 1
- can't use 'local' database through mongos
- version update HOT 1
- MongoDB location field type is identified as boolean
- Help,My mongo-river has error~ClassNotFoundException[mongodb] HOT 3
- Data is not replicating from MongoDB to elasticSearch and mongo-river is not running HOT 1
- ES 2.2.0 how install river for mongodb error 'plugin-descriptor.properties' HOT 6
- Release new version HOT 6
- River in replica set and shard not updating index at all times HOT 4
- ELASTICSEARCH connect MONGOLAB BDD seems empty
- I am trying to use the MongoDB river (v2.0.11) with elasticsearch (v1.7.3) , can't sync data to elasticsearch HOT 1
- CollectionScan died due to position in capped collection being deleted
- create river on mongodb sharded cluster
- how to install for elasticsearch 2.2.0 ?
- how to install it for elasticsearch 5.0.1 HOT 2
- Impossible to import collection with binary _id HOT 1
- Question: can't install mongodb river plugin on windows command HOT 1
- Support for ElasticSearch 5.0+ Version HOT 3
- Project seems dead. Is there an alternative? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-river-mongodb.