iftech-engineering / mongo-es Goto Github PK
View Code? Open in Web Editor NEWA MongoDB to Elasticsearch connector
License: Mozilla Public License 2.0
A MongoDB to Elasticsearch connector
License: Mozilla Public License 2.0
In some scenarios, there is dbref data that exsits in mongod model, how we can map that properly to es database. This should be very common case anyway.
I was trying to sync with Alicloud Mongodb service (replicaset),
when I run the example and got the error as below:
run 2018-08-17T02:46:30.317Z put mapping banner_v1 banner from checkpoint books.books___banner_v1.banner CheckPoint { phase: 'tail', time: 2017-08-16T10:55:24.474Z } (node:15569) DeprecationWarning: current URL string parser is deprecated, and will be removed in a future version. To use the new parser, pass option { useNewUrlParser: true } to MongoClient.connect. tail books.books___banner_v1.banner from 2017-08-16T10:55:24.474Z tail books.books___banner_v1.banner Error: should not complete at AnonymousObserver.tail.bufferWithTimeOrCount.subscribe [as _onCompleted] (/home/michael/another-connector/node_modules/mongo-es/dist/src/processor.js:302:33) at AnonymousObserver.Rx.AnonymousObserver.AnonymousObserver.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:1843:12) at AnonymousObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at AnonymousObserver.tryCatcher (/home/michael/another-connector/node_modules/rx/dist/rx.js:63:31) at AutoDetachObserverPrototype.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:5897:56) at AutoDetachObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at MergeAllObserver.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:3751:37) at MergeAllObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at MergeAllObserver.tryCatcher (/home/michael/another-connector/node_modules/rx/dist/rx.js:63:31) at AutoDetachObserverPrototype.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:5897:56) tailOpLog Error: should not complete at AnonymousObserver.tail.bufferWithTimeOrCount.subscribe [as _onCompleted] (/home/michael/another-connector/node_modules/mongo-es/dist/src/processor.js:302:33) at AnonymousObserver.Rx.AnonymousObserver.AnonymousObserver.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:1843:12) at AnonymousObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at AnonymousObserver.tryCatcher (/home/michael/another-connector/node_modules/rx/dist/rx.js:63:31) at AutoDetachObserverPrototype.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:5897:56) at AutoDetachObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at MergeAllObserver.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:3751:37) at MergeAllObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/home/michael/another-connector/node_modules/rx/dist/rx.js:1782:14) at MergeAllObserver.tryCatcher (/home/michael/another-connector/node_modules/rx/dist/rx.js:63:31) at AutoDetachObserverPrototype.completed (/home/michael/another-connector/node_modules/rx/dist/rx.js:5897:56)
here is config.json
{
"controls": {
"mongodbReadCapacity": 10000,
"elasticsearchBulkSize": 5000,
"elasticsearchBulkInterval": 5000,
"indexNameSuffix": "_v1"
},
"mongodb": {
"url": "mongodb://michael:[email protected]:3717,dds-xxxxxx.mongodb.rds.aliyuncs.com:3717/books?replicaSet=mgset-xxxxx",
"options": {
"authSource": "admin",
"readPreference": "secondaryPreferred"
}
},
"elasticsearch": {
"options": {
"host": "http://localhost:9200",
"apiVersion": "6.3"
},
"indices": [
{
"index": "banner",
"body": {
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 1,
"mapper.dynamic": false
}
}
}
}
]
},
"tasks": [
{
"from": {
"phase": "tail",
"time": "2017-08-16T10:55:24.474Z"
},
"extract": {
"db": "books",
"collection": "books",
"projection": {
"name": 1
}
},
"transform": {
"mapping": {
"name": "name"
}
},
"load": {
"index": "banner",
"type": "banner",
"body": {
"dynamic": false,
"properties": {
"name": {
"type": "text",
"fields": {
"exact": {
"type": "keyword"
}
}
}
}
}
}
}
]
}
https://github.com/jike-engineering/mongo-es/blob/d38596447740e6adae4f43ec2417225c08bbe5e6/src/processor.ts#L178
在源码这里指定了o2内元素必须只有一个,而在我的mongos集群中,o2的元素是两个,不知道这里为何会有强制要求呢?
o2: { ip: 'xx.xx.xx.xx', _id: 5c65e44c86d1f817c17bxxxx },
可以做到实时吗,今天刚试用了下,感觉还不错,同步上比mongo-connctor慢 有时时间差很多,是配置原因吗?
如题
RT
https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch
for example, 2 task: user and chat, map to one _type
named 'doc', and use type:1 type:2
to difference between them.
你们好,感谢你们的付出,我现在正在试用此插件,感觉很实用。
不过有个问题想咨询一下,目前有没有对抽取的数据进行值替换的方法呢,就是把符合条件的值替换成其他的值,比如把时间项中不符合时间格式的值替换成空值,或者丢弃该field呢?如果有的话,如何使用,请指点一下,非常感谢!
mongodb 做drop操作的时候mongo-es未同步删除,怎么解决?求大神指点!!!!
D:\mongo-es>mongo-es config.json
run 2017-06-20T02:43:49.685Z
run { Error: [mapper_parsing_exception] analyzer [ik_max_word] not found for field [property0]
at respond (C:\Users\tangniyuqi\AppData\Roaming\npm\node_modules\mongo-es\node_modules\elasticsearch\src\lib\transport.js:295:15)
at checkRespForFailure (C:\Users\tangniyuqi\AppData\Roaming\npm\node_modules\mongo-es\node_modules\elasticsearch\src\lib\transport.js:254:7)
at HttpConnector. (C:\Users\tangniyuqi\AppData\Roaming\npm\node_modules\mongo-es\node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
at IncomingMessage.bound (C:\Users\tangniyuqi\AppData\Roaming\npm\node_modules\mongo-es\node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1047:12)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
at process._tickCallback (internal/process/next_tick.js:161:9)
status: 400,
displayName: 'BadRequest',
message: '[mapper_parsing_exception] analyzer [ik_max_word] not found for field [property0]',
path: '/index0_v1/_mapping/type0',
query: {},
body: '{"dynamic":false,"_parent":{"type":"type1"},"properties":{"property0":{"type":"text","norms":false,"analyzer":"ik_max_word","search_analyzer":"ik_smart"},"property1":{"type":"keyword"}}}',
statusCode: 400,
response: '{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"analyzer [ik_max_word] not found for field [property0]"}],"type":"mapper_parsing_exception","reason":"analyzer [ik_max_word] not found for field [property0]"},"status":400}',
toString: [Function],
toJSON: [Function] }
I was running in scan mode, and always logging like this:
scan db.collection -> index.type 5000 59840e8edcbfc715cd9380b7
scan db.collection -> index.type 5000 59840e87dcbfc715cd936d2f
scan db.collection -> index.type 5000 59840e87dcbfc715cd9359a7
scan db.collection -> index.type 5000 59840e80dcbfc715cd93461f
scan db.collection -> index.type 5000 59840e80dcbfc715cd933297
But if I running in tail mode, and logging like this:
tail db.collection -> index.type start from xxxx
I never see the error message, so I think my config was correct and mongo-es was running fine, so why I can't see any data write to Elasticsearch?
FYI, I have used mongo-connector to import data to Elasticsearch to a same index, but now I'm using mongo-es to import data to Elasticsearch to different type which they are parent-child relationship
有更详细的文档吗? 如支持哪些mongo和ES版本
目前也有mongo同步到ES的需求 查mongo太慢了
调试模式不能运行 NODE_ENV=dev mongo-es ./config.json
非调试模式正常运行,需要怎么设置吗?
ID不只是有用ObjectID的,也有普通数字的情况,普通数字的时候会报错。
对于普通数字,更新支持的也不好。
目前有一个数据量很大,数据更新频繁的数据库,进行一次scan
同步需要2天以上。但是MongoDB的oplog
默认保留24小时,因此在scan
同步的时候,如何才能做到不丢数据呢?(例如刚同步完一个文档,这个文档就被更新了)
tail
和scan
进行同步么?tail
和scan
还是两个实例分别跑比较好?scan
同步完之后,会自动转换成tail
,这时两个tail
如何处理?(同步完成时间不能确定,就算手动关闭也会有一段双tail的时间)按照文档配置了 一切正常 但是就是ES中没有数据 为何?
# 启动mongo-es 表中添加了五条记录
➜ ~ mongo-es ./mongo_es/config.json
run 2018-08-13T10:01:11.672Z
put mapping foo anyong
tail test.anyong___foo.anyong 1 2018-08-13T10:18:31.000Z
tail test.anyong___foo.anyong 1 2018-08-13T10:24:04.000Z
tail test.anyong___foo.anyong 1 2018-08-13T10:28:15.000Z
tail test.anyong___foo.anyong 1 2018-08-13T10:29:31.000Z
tail test.anyong___foo.anyong 1 2018-08-13T10:41:42.000Z
# mongo表中有五条记录
rs0:PRIMARY> db.anyong.count()
5
为什么查询ES的时候 却是0呢?
➜ ~ curl -XGET '192.168.0.25:9200/foo/anyong/_count?pretty&pretty'
{
"count" : 0,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
}
}
tail 3小时前的数据(数据量很大)报如下错误:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
<--- Last few GCs --->
764250 ms: Mark-sweep 1274.1 (1434.2) -> 1274.0 (1434.2) MB, 2383.4 / 0 ms [allocation failure] [GC in old space requested].
766655 ms: Mark-sweep 1274.0 (1434.2) -> 1274.0 (1434.2) MB, 2404.9 / 0 ms [allocation failure] [GC in old space requested].
769073 ms: Mark-sweep 1274.0 (1434.2) -> 1274.0 (1434.2) MB, 2418.3 / 0 ms [last resort gc].
771554 ms: Mark-sweep 1274.0 (1434.2) -> 1273.9 (1434.2) MB, 2480.8 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x19ec2fcc9fa9
1: /* anonymous /(aka / anonymous */) [/root/.nvm/v6.0.0/lib/node_modules/mongo-es/dist/src/processor.js:6] [pc=0x62a4bb3a8d4] (this=0x19ec2fc04189 ,resolve=0x156d24938489 <JS Function CreateResolvingFunctions.value (SharedFunctionInfo 0x35c5ec268111)>)
2: arguments adaptor frame: 2->1
3: new Promise [native promise.js:53] [pc=0x62a4ad90b45] (this=0x19ec2fc041e9 <the ho...
更新有单条更新,这个测试了木有问题,但是如果是全doc更新,则不会生效。
RT. 默认是对collections中所有字段创建index
您好,
我在配置config.json并且运行时,总是报权限错误。
我已经用了admin的账户了,不知道为什么
···
run 2018-11-16T15:38:59.190Z
put mapping douban_v1 movie
from checkpoint douban.movie___douban_v1.movie CheckPoint {
phase: 'scan',
id: 000000000000000000000000,
time: 2018-11-16T15:38:59.189Z }
scan douban.movie___douban_v1.movie from 000000000000000000000000
scan douban.movie___douban_v1.movie end
tail douban.movie___douban_v1.movie from 2018-11-16T15:38:59.189Z
tail douban.movie___douban_v1.movie Error: should not complete
at AnonymousObserver.tail.bufferWithTimeOrCount.subscribe [as _onCompleted] (/usr/local/lib/node_modules/mongo-es/dist/src/processor.js:302:33)
at AnonymousObserver.Rx.AnonymousObserver.AnonymousObserver.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1843:12)
at AnonymousObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at AnonymousObserver.tryCatcher (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:63:31)
at AutoDetachObserverPrototype.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:5897:56)
at AutoDetachObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at MergeAllObserver.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:3751:37)
at MergeAllObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at MergeAllObserver.tryCatcher (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:63:31)
at AutoDetachObserverPrototype.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:5897:56)
tailOpLog Error: should not complete
at AnonymousObserver.tail.bufferWithTimeOrCount.subscribe [as _onCompleted] (/usr/local/lib/node_modules/mongo-es/dist/src/processor.js:302:33)
at AnonymousObserver.Rx.AnonymousObserver.AnonymousObserver.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1843:12)
at AnonymousObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at AnonymousObserver.tryCatcher (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:63:31)
at AutoDetachObserverPrototype.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:5897:56)
at AutoDetachObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at MergeAllObserver.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:3751:37)
at MergeAllObserver.Rx.internals.AbstractObserver.AbstractObserver.onCompleted (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:1782:14)
at MergeAllObserver.tryCatcher (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:63:31)
at AutoDetachObserverPrototype.completed (/usr/local/lib/node_modules/mongo-es/node_modules/rx/dist/rx.js:5897:56)
···
这是我的报错日志,经过搜索之后发现,似乎是我的mongo没有开oplog导致的。我并不会js,单看这段报错日志完全不知道是这个原因,希望能提供更好的报错信息来方便进行调试与排错。
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data 500
tail dna.dna_data -> dna_v1.dna_data 500
tail dna.dna_data -> dna_v1.dna_data 500
tail dna.dna_data -> dna_v1.dna_data 500
tail dna.dna_data -> dna_v1.dna_data 500
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
tail dna.dna_data -> dna_v1.dna_data Request Timeout after 30000ms
@renzholy
请问这个报错怎么解决?
英文虽然能勉强看明白,但还是不够精确理解,求大神做个中文文档
node:internal/tls/secure-context:278
context.loadPKCS12(toBuf(pfx));
^
Error: Unable to load PFX certificate
没办法规避
@onesuper
@fairytail111
@waterlee23
@hamberluo
@themez
@tiant167
现在我有一张mongodb数据表,数据量为1千万,同步到ES特别慢,而且同步一会就出现超时问题:
,速度问题有什么好的优化方案吗?超时问题有什么好的建议吗?
一个MongoDB的嵌套字段加入不在task.mapping
配置里,就不会被同步。
例如:
{"data": {"number": 1, "geo": [0, 0]}}
(在配置的task.mapping
部分已存在"data": "data"
的映射)
这个MongoDB文档,在仅修改data.geo
的值的时候,程序会进行ignoreUpdate
。必须手动将"data.geo": "data.geo"
添加进task.mapping
映射只后,才会实时同步。
希望在做ignoreUpdate
检查时,对key
的检测可以从其第一个元素开始,而不仅仅是完整的key
字段匹配。
This lib is very helpful for current project. Just curious about if available to mongo 4+ & es 6+
大神好 我通过通过一个工具同步mongodb到elasticsearch的时候出现异常:
POST http://localhost:9200/_bulk => socket hang up
这个是什么原因呢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.