Code Monkey home page Code Monkey logo

Comments (29)

rwynn avatar rwynn commented on July 18, 2024

You can use workers to distribute the load. Also
namespace-regex to filter what collections get synced. Also Golang plugins will be much faster than JavaScript if you use that. You can drop fields in the go plugin.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

There's no way to ignore certain oplog entries? This collection holds about 20mil docs and is refreshed daily, no matter how the load is distributed the write to elasticsearch will lag behind the oplog. I tried using a filter middleware and returning false and it just deletes the doc. Instead of delete can there an option to just ignore?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Yes I can add an ignore. But I would probably do this in a separate function so that it is ignored earlier in the process.

Do you adjust the refresh interval in ES when you load all these docs?

from monstache.

benan789 avatar benan789 commented on July 18, 2024

No as it is a continuous process throughout the day.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

How many nodes in the ES cluster and do you point monstache to multiple nodes? Also do you override queue size for bulk thread pool? It’s only 50 by default last time I checked. Maybe 200 now in latest ES.

Changing the refresh interval even from default 1s to 10s might make a difference.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

2 nodes for ES, monstache is pointed to one I believe. I think bulk thread pool is defaulted to 200 so I never changed it. Some of our collections need to be somewhat real time, should I still change the refresh interval in that case? Maybe I should try 2-3s?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Maybe hold off on changing the refresh interval if you need searches to have the very latest data. It's strange that ES would be slow to index in bulk. I'll try to push a fix to allow ignoring docs completely tomorrow.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Awesome ty!

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Can you please try the latest version. Thanks.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Looks like its working! Thank you!

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Actually I think filtered docs are still getting through. I still see counts of the collection go up even tho there are only ~50 docs with the fieldToTrack. And this is with the monstache instance that does the full sync closed.

[[filter]]
namespace = "db.collection"
script = """
module.exports = function(doc) {
    return doc.fieldToTrack ? true : false
}
"""

I then have a transform middleware too

[[script]]
namespace = "db.collection"
routing = true
script = """
module.exports = function(doc) {
    doc._meta_monstache = { parent: doc.parent, routing: doc.routing, index: "index", type: "collection" };
    return _.omit(doc, "omitField");
}
"""

from monstache.

rwynn avatar rwynn commented on July 18, 2024

I think it was not applying this filter on updates.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Getting this error after updating from 4.2.0

ERROR 2018/03/12 19:20:54 ReferenceError: 'doc' is not defined

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Not sure why that would be yet. Monstache should always be passing an non nil object to filter and map functions. It can be named anything in js but I use doc in the examples. I haven’t noticed an area where the doc is not passed as the 1st arg.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Is it possible it was left off of the js argument list in the function def?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

I think it would have been a type error if the argument was null or undefined and then a property accessed? But I could be wrong.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

I tested this quickly and I do that you would get the following error message if the doc passed into the function is by monstache undefined

ERROR 2018/03/12 22:48:08 TypeError: Cannot access member 'interesting' of undefined

and the following error if you do not define the doc in the function signature

[[filter]] 
namespace = "test.test" 
script = """ 
module.exports = function(/* should be "doc" here*/) { 
    return !!doc.interesting; 
} 
""" 
ERROR 2018/03/12 22:51:19 ReferenceError: 'doc' is not defined

from monstache.

benan789 avatar benan789 commented on July 18, 2024

I haven't changed my code only updated Monstache. Maybe it has to do with the transformation middleware after being filtered?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Sorry I am not seeing any issues using the following and just inserting documents into the test collection.
Without seeing your latest javascript I just used what you posted 2 days ago and a fresh go get github.com/rwynn/monstache.

[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
    return doc.fieldToTrack ? true : false
}
"""

[[script]]
namespace = "test.test"
routing = true
script = """
module.exports = function(doc) {
    doc._meta_monstache = { parent: doc.parent, routing: doc.routing, index: "index", type: "collection" };
    return _.omit(doc, "omitField");
}
"""

from monstache.

benan789 avatar benan789 commented on July 18, 2024

hmm is it possible to add more verbose logging so I can trace this error?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Can you run your functions in nodejs, passing test docs, and make sure they return OK? I ask only because Monstache does not include any Javascript code itself and thus no Javascript that references doc.

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Here's there error when it crashes:

panic: runtime error: invalid memory address or nil pointer dereference                                                            
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xc44226]                                                            
                                                                                                                                   
goroutine 125 [running]:                                                                                                           
github.com/robertkrimen/otto.(*_runtime).leaveScope(...)                                                                           
        /home/ec2-user/go/src/github.com/robertkrimen/otto/runtime.go:83                                                           
github.com/robertkrimen/otto.Otto.Call.func1(0x0, 0xc420448f00)                                                                    
        /home/ec2-user/go/src/github.com/robertkrimen/otto/otto.go:550 +0x36                                                       
github.com/robertkrimen/otto.Otto.Call(0x0, 0xc420448f00, 0xde0bf0, 0xe, 0xd22b00, 0xc426cf3b90, 0xc426ced950, 0x1, 0x1, 0x0, ...) 
        /home/ec2-user/go/src/github.com/robertkrimen/otto/otto.go:595 +0x3ee                                                      
main.filterWithScript.func1(0xc426b50370, 0x70d600)                                                                                
        /home/ec2-user/go/src/github.com/rwynn/monstache/monstache.go:739 +0x174                                                   
github.com/rwynn/gtm.ChainOpFilters.func1(0xc426b50370, 0xc42290dc00)                                                              
        /home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:304 +0x65                                                                
github.com/rwynn/gtm.(*Op).matchesFilter(0xc426b50370, 0xc420962240, 0xc42290d900)                                                 
        /home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:473 +0x4c                                                                
github.com/rwynn/gtm.(*OpBuf).Flush(0xc420c9d1a0, 0xc420b929c0, 0xc420b5e2a0, 0xc420962240)                                        
        /home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:447 +0x1085                                                              
github.com/rwynn/gtm.FetchDocuments(0xc420b5e2a0, 0xc420f051e0, 0xdfce40, 0xc420c9d1a0, 0xc420b5e300, 0xc420962240, 0x0, 0x0)      
        /home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:737 +0x2b1                                                               
created by github.com/rwynn/gtm.Start                                                                                              
        /home/ec2-user/go/src/github.com/rwynn/gtm/gtm.go:976 +0x4af                                                               

doc is definitely passed in as arg and I haven't changed anything since I last gave you my code

module.exports = function(doc) {

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Are you able to reproduce it with a simple script? I just tried using the following config and updating ~90K documents in one update to MongoDB.

I was not able to reproduce the panic and the docs got synced.

[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
    return true;
}
"""
[[script]]
namespace = "test.test"
script = """
module.exports = function(doc) {
   return true;
}
"""
rs1:PRIMARY> db.test.update({}, {$set: {foo: 2}}, {multi:true})
WriteResult({ "nMatched" : 90000, "nUpserted" : 0, "nModified" : 89999 })

from monstache.

benan789 avatar benan789 commented on July 18, 2024

Have you tested inserts with filter returning false?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Just tried it, nothing gets synced and I don't see any errors.

[[filter]]
namespace = "test.test"
script = """
module.exports = function(doc) {
    return false;
}
"""
[[script]]
namespace = "test.test"
script = """
module.exports = function(doc) {
   return doc;
}
"""

rs1:PRIMARY> for (var i=0; i<2000; ++i) {db.test.insert({foo: i});}
WriteResult({ "nInserted" : 1 })

rs1:PRIMARY> db.test.count()
2000

curl localhost:9200/test.test/_count
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"test.test","index_uuid":"_na_","index":"test.test"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"test.test","index_uuid":"_na_","index":"test.test"},"status":404}

Do you get different behavior for this simple test?

from monstache.

benan789 avatar benan789 commented on July 18, 2024

If I set filter to return false I no longer get the doc is undefined error but it still crashes:

goroutine 120 [IO wait]:
internal/poll.runtime_pollWait(0x7fbb4753ca30, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc420508518, 0x72, 0xffffffffffffff00, 0x13c9860, 0x13c48c0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitRead(0xc420508518, 0xc4210c1000, 0x1000, 0x1000)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc420508500, 0xc4210c1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:126 +0x18a
net.(*netFD).Read(0xc420508500, 0xc4210c1000, 0x1000, 0x1000, 0x298, 0x8, 0x0)
        /usr/local/go/src/net/fd_unix.go:202 +0x52
net.(*conn).Read(0xc42000e008, 0xc4210c1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:176 +0x6d
crypto/tls.(*block).readFromUntil(0xc420ee07b0, 0x7fbb474f4418, 0xc42000e008, 0x5, 0xc42000e008, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:488 +0x95
crypto/tls.(*Conn).readRecord(0xc420f75180, 0xe1d717, 0xc420f752a0, 0xc420508500)
        /usr/local/go/src/crypto/tls/conn.go:590 +0xe0
crypto/tls.(*Conn).Read(0xc420f75180, 0xc4210268d0, 0x24, 0x24, 0x0, 0x0, 0x0)
        /usr/local/go/src/crypto/tls/conn.go:1134 +0x110
github.com/globalsign/mgo.fill(0x13cfe20, 0xc420f75180, 0xc4210268d0, 0x24, 0x24, 0x0, 0x257)
        /home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:567 +0x53
github.com/globalsign/mgo.(*mongoSocket).readLoop(0xc420168a50)
        /home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:583 +0x658
created by github.com/globalsign/mgo.newSocket
        /home/ec2-user/go/src/github.com/globalsign/mgo/socket.go:197 +0x23f

goroutine 198 [select]:
net/http.(*persistConn).writeLoop(0xc421130120)
        /usr/local/go/src/net/http/transport.go:1759 +0x165
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1187 +0xa53

from monstache.

benan789 avatar benan789 commented on July 18, 2024

btw this is still an issue, should i create a new issue for it? docs do still get processed prior to erroring, so maybe it's more of an optimization thing?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Please create a new issue preferably with reproducibility (small data set, monstache CONFIG, and instructions). That would help because I can’t recreate your full setup and I’ve not been able to reproduce thus far.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

I think I have a handle on this now. Filters may be called from multiple go routines and need a mutex. This is related to issue #66.

from monstache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.