Comments (29)
You can use workers to distribute the load. Also
namespace-regex to filter what collections get synced. Also Golang plugins will be much faster than JavaScript if you use that. You can drop fields in the go plugin.
from monstache.
There's no way to ignore certain oplog entries? This collection holds about 20mil docs and is refreshed daily, no matter how the load is distributed the write to elasticsearch will lag behind the oplog. I tried using a filter middleware and returning false and it just deletes the doc. Instead of delete can there an option to just ignore?
from monstache.
Yes I can add an ignore. But I would probably do this in a separate function so that it is ignored earlier in the process.
Do you adjust the refresh interval in ES when you load all these docs?
from monstache.
No as it is a continuous process throughout the day.
from monstache.
How many nodes in the ES cluster and do you point monstache to multiple nodes? Also do you override queue size for bulk thread pool? It’s only 50 by default last time I checked. Maybe 200 now in latest ES.
Changing the refresh interval even from default 1s to 10s might make a difference.
from monstache.
2 nodes for ES, monstache is pointed to one I believe. I think bulk thread pool is defaulted to 200 so I never changed it. Some of our collections need to be somewhat real time, should I still change the refresh interval in that case? Maybe I should try 2-3s?
from monstache.
Maybe hold off on changing the refresh interval if you need searches to have the very latest data. It's strange that ES would be slow to index in bulk. I'll try to push a fix to allow ignoring docs completely tomorrow.
from monstache.
Awesome ty!
from monstache.
Can you please try the latest version. Thanks.
from monstache.
Looks like its working! Thank you!
from monstache.
Actually I think filtered docs are still getting through. I still see counts of the collection go up even tho there are only ~50 docs with the fieldToTrack. And this is with the monstache instance that does the full sync closed.
[[filter]]
namespace = "db.collection"
script = """
module.exports = function(doc) {
return doc.fieldToTrack ? true : false
}
"""
I then have a transform middleware too
[[script]]
namespace = "db.collection"
routing = true
script = """
module.exports = function(doc) {
doc._meta_monstache = { parent: doc.parent, routing: doc.routing, index: "index", type: "collection" };
return _.omit(doc, "omitField");
}
"""
from monstache.
I think it was not applying this filter on updates.
from monstache.
Getting this error after updating from 4.2.0
ERROR 2018/03/12 19:20:54 ReferenceError: 'doc' is not defined
from monstache.
Not sure why that would be yet. Monstache should always be passing an non nil object to filter and map functions. It can be named anything in js but I use doc in the examples. I haven’t noticed an area where the doc is not passed as the 1st arg.
from monstache.
Is it possible it was left off of the js argument list in the function def?
from monstache.
I think it would have been a type error if the argument was null or undefined and then a property accessed? But I could be wrong.
from monstache.
I tested this quickly and I do that you would get the following error message if the doc passed into the function is by monstache undefined
ERROR 2018/03/12 22:48:08 TypeError: Cannot access member 'interesting' of undefined
and the following error if you do not define the doc in the function signature
[[filter]]
namespace = "test.test"
script = """
module.exports = function(/* should be "doc" here*/) {
return !!doc.interesting;
}
"""
ERROR 2018/03/12 22:51:19 ReferenceError: 'doc' is not defined
from monstache.
I haven't changed my code only updated Monstache. Maybe it has to do with the transformation middleware after being filtered?
from monstache.
Sorry I am not seeing any issues using the following and just inserting documents into the test collection.
Without seeing your latest javascript I just used what you posted 2 days ago and a fresh go get github.com/rwynn/monstache.
[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
return doc.fieldToTrack ? true : false
}
"""
[[script]]
namespace = "test.test"
routing = true
script = """
module.exports = function(doc) {
doc._meta_monstache = { parent: doc.parent, routing: doc.routing, index: "index", type: "collection" };
return _.omit(doc, "omitField");
}
"""
from monstache.
hmm is it possible to add more verbose logging so I can trace this error?
from monstache.
Can you run your functions in nodejs, passing test docs, and make sure they return OK? I ask only because Monstache does not include any Javascript code itself and thus no Javascript that references doc
.
from monstache.
Here's there error when it crashes:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xc44226]
goroutine 125 [running]:
github.com/robertkrimen/otto.(*_runtime).leaveScope(...)
/home/ec2-user/go/src/github.com/robertkrimen/otto/runtime.go:83
github.com/robertkrimen/otto.Otto.Call.func1(0x0, 0xc420448f00)
/home/ec2-user/go/src/github.com/robertkrimen/otto/otto.go:550 +0x36
github.com/robertkrimen/otto.Otto.Call(0x0, 0xc420448f00, 0xde0bf0, 0xe, 0xd22b00, 0xc426cf3b90, 0xc426ced950, 0x1, 0x1, 0x0, ...)
/home/ec2-user/go/src/github.com/robertkrimen/otto/otto.go:595 +0x3ee
main.filterWithScript.func1(0xc426b50370, 0x70d600)
/home/ec2-user/go/src/github.com/rwynn/monstache/monstache.go:739 +0x174
github.com/rwynn/gtm.ChainOpFilters.func1(0xc426b50370, 0xc42290dc00)
/home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:304 +0x65
github.com/rwynn/gtm.(*Op).matchesFilter(0xc426b50370, 0xc420962240, 0xc42290d900)
/home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:473 +0x4c
github.com/rwynn/gtm.(*OpBuf).Flush(0xc420c9d1a0, 0xc420b929c0, 0xc420b5e2a0, 0xc420962240)
/home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:447 +0x1085
github.com/rwynn/gtm.FetchDocuments(0xc420b5e2a0, 0xc420f051e0, 0xdfce40, 0xc420c9d1a0, 0xc420b5e300, 0xc420962240, 0x0, 0x0)
/home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:737 +0x2b1
created by github.com/rwynn/gtm.Start
/home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:976 +0x4af
doc is definitely passed in as arg and I haven't changed anything since I last gave you my code
module.exports = function(doc) {
from monstache.
Are you able to reproduce it with a simple script? I just tried using the following config and updating ~90K documents in one update to MongoDB.
I was not able to reproduce the panic and the docs got synced.
[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
return true;
}
"""
[[script]]
namespace = "test.test"
script = """
module.exports = function(doc) {
return true;
}
"""
rs1:PRIMARY> db.test.update({}, {$set: {foo: 2}}, {multi:true})
WriteResult({ "nMatched" : 90000, "nUpserted" : 0, "nModified" : 89999 })
from monstache.
Have you tested inserts with filter returning false?
from monstache.
Just tried it, nothing gets synced and I don't see any errors.
[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
return false;
}
"""
[[script]]
namespace = "test.test"
script = """
module.exports = function(doc) {
return doc;
}
"""
rs1:PRIMARY> for (var i=0; i<2000; ++i) {db.test.insert({foo: i});}
WriteResult({ "nInserted" : 1 })
rs1:PRIMARY> db.test.count()
2000
curl localhost:9200/test.test/_count
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"test.test","index_uuid":"_na_","index":"test.test"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"test.test","index_uuid":"_na_","index":"test.test"},"status":404}
Do you get different behavior for this simple test?
from monstache.
If I set filter to return false I no longer get the doc is undefined error but it still crashes:
goroutine 120 [IO wait]:
internal/poll.runtime_pollWait(0x7fbb4753ca30, 0x72, 0x0)
/usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420508518, 0x72, 0xffffffffffffff00, 0x13c9860, 0x13c48c0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc420508518, 0xc4210c1000, 0x1000, 0x1000)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420508500, 0xc4210c1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a
net.(*netFD).Read(0xc420508500, 0xc4210c1000, 0x1000, 0x1000, 0x298, 0x8, 0x0)
/usr/local/go/src/net/fd_unix.go:202 +0x52
net.(*conn).Read(0xc42000e008, 0xc4210c1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:176 +0x6d
crypto/tls.(*block).readFromUntil(0xc420ee07b0, 0x7fbb474f4418, 0xc42000e008, 0x5, 0xc42000e008, 0x0)
/usr/local/go/src/crypto/tls/conn.go:488 +0x95
crypto/tls.(*Conn).readRecord(0xc420f75180, 0xe1d717, 0xc420f752a0, 0xc420508500)
/usr/local/go/src/crypto/tls/conn.go:590 +0xe0
crypto/tls.(*Conn).Read(0xc420f75180, 0xc4210268d0, 0x24, 0x24, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:1134 +0x110
github.com/globalsign/mgo.fill(0x13cfe20, 0xc420f75180, 0xc4210268d0, 0x24, 0x24, 0x0, 0x257)
/home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:567 +0x53
github.com/globalsign/mgo.(*mongoSocket).readLoop(0xc420168a50)
/home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:583 +0x658
created by github.com/globalsign/mgo.newSocket
/home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:197 +0x23f
goroutine 198 [select]:
net/http.(*persistConn).writeLoop(0xc421130120)
/usr/local/go/src/net/http/transport.go:1759 +0x165
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1187 +0xa53
from monstache.
btw this is still an issue, should i create a new issue for it? docs do still get processed prior to erroring, so maybe it's more of an optimization thing?
from monstache.
Please create a new issue preferably with reproducibility (small data set, monstache CONFIG, and instructions). That would help because I can’t recreate your full setup and I’ve not been able to reproduce thus far.
from monstache.
I think I have a handle on this now. Filters may be called from multiple go routines and need a mutex. This is related to issue #66.
from monstache.
Related Issues (20)
- Monstache did not back off writing data when ElasticSearch disk was full (http code 429), causing log spam HOT 3
- Can't connect Monstache(local machine) with my MongoDB containers(3 replicas) and elasticsearch containers.
- Version conflict on collection relation
- Monstache starts backoff when getting 404 (deleted object is already deleted in ES) HOT 2
- Bug: Setting mongodb field value to null does not index it in Elasticsearch HOT 5
- Add an option to include mongo change stream in health check
- Obsessive-compulsive reading disorder
- golang plugin can't be mounted without building plugin from source code
- Is there a way to know lag / total pending sync
- decending sorting HOT 1
- Monstache monitoring HOT 1
- Migrating mongodb to ES, but lost _id ?
- Does each worker open a new change steam in monstache when it is running with multiple workers?
- cant build plugin for ARM64
- How to keep an embedded document in full sync?
- Docker container exits without error after 5 days despite restart=always HOT 1
- Elasticsearch 8 with PKI auth
- How to get reference field data in transform script HOT 1
- Token resume lead to Mongo HIGH CPU usage
- How can i return multiple after comparing data?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monstache.