Code Monkey home page Code Monkey logo

Comments (21)

rwynn avatar rwynn commented on July 18, 2024 1

An index can contain a dot. The whole thing is the index. You can try it.

from monstache.

rwynn avatar rwynn commented on July 18, 2024 1

You probably don’t need to worry too much about that comment. There is a feature that monstache deletes the index when the collection or db is dropped. You will just need to do that delete index manually. Drop collections and db not too common event.

The ObjectId are converted to string before your function and is called.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Hi, I don't see a reason yet why it wouldn't already support this (with some work on your part). Monstache will not create the index mapping that contains the join field. You would need to do that yourself upfront before indexing. Then after that join field is in place you would need a Javascript or Golang plugin to alter the source document appropriately to include that join field AND ensure that routing is set such that parents and children are routed to the same shard.

See https://rwynn.github.io/monstache-site/advanced/#middleware and https://rwynn.github.io/monstache-site/advanced/#routing

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Thank you! I'll give it a try!

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Sorry to bother, but my "parent" documents and my "child" documents are in two different MongoDB collections.

Can Monstache bring them both under the same Elasticsearch namespace (i.e. both index and type) without conflict?

Thank you

from monstache.

rwynn avatar rwynn commented on July 18, 2024

@aguyinmontreal I've just added some docs around this at https://rwynn.github.io/monstache-site/advanced/#joins

To do what you want to accomplish you would have 2 mapping functions in your TOML config file. One for each collection. Then on the _meta_monstache attribute, in addition to a routing property set the index and type properties to what you want. You will be overriding the defaults where index is ${db}.${collection} and type is ${collection}.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

This link shows more about overriding the index and type. https://rwynn.github.io/monstache-site/advanced/#indexing-metadata

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

And it will not conflict on stateful resume?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

The only conflict I can think of when you merge 2 collections into 1 index like this is if document ids collide across those collections. MongoDB only ensures that _id is unique within a single collection. So it's possible, though unlikely if you are using auto-generated _ids, that they can collide and write to the same document in the converged target Elasticsearch index.

I don't think there is any problem with resuming related to this. What case are you thinking about? The stateful resume only saves timestamps of processed documents. When resuming it begins tailing after the last processed timestamp instead of fast-forwarding to the end of the oplog. You will notice if you don't replay, then it starts tailing from the end. If you replay it starts processing from the beginning. If you resume it starts processing from the last timestamp it has saved. But regardless of where it starts, it always evaluates these functions before indexing.

You will notice that when you do custom routing like this you get additional documents in MongoDB in the collection monstache.meta. These documents are there so that if you perform a delete in MongoDB the routing attribute is set correctly on the delete request to Elasticsearch. Monstache only needs to preserve the routing information for possible deletes when the default routing is overridden. The default is to route by Elasticsearch _id (which is done by Elasticsearch automatically). The _id is in the oplog so no need to save it additionally.

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Alright!

I just wanted to double-check before I start importing all my mongo-connector stuff to Monstache!

Thank you so much for your guidance!

from monstache.

rwynn avatar rwynn commented on July 18, 2024

@aguyinmontreal let me know how your transition to Monstache ends up going. I'm very interested in any gaps that exist between it and other solutions.

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

@rwynn Small comment, if I may.
When reading the documentation, I noticed that you use "test.test" as the example Elasticsearch namespace.

Elasticsearch recommends that we name new types as "_doc" from now on (see here) from version 6.2.1.

So it should be "test._doc", I guess. (But does Monstache allow types starting with "_"? I know mongo-connector doesn't.)

from monstache.

rwynn avatar rwynn commented on July 18, 2024

You can name the index and type whatever you want them to be. I don’t think monstache imposes restictions other than elastic/elasticsearch#6736

from monstache.

rwynn avatar rwynn commented on July 18, 2024

So actually no monstache would not allow type to be _doc because it used to be a rule that type could not start with underscore and monstache is still following.

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Would you consider allowing it? (see

ElasticSearch6.x:

The preferred type name is _doc, so that index APIs have the same path as they will have in 7.0: PUT {index}/_doc/{id} and POST {index}/_doc )

ElasticSearch7.x

The type parameter in URLs are optional. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

Yeah I’ll take a look at it. I may just remove the code that checks any of this and just let ES determine what is valid and not.

You can actually use _doc as the type if you set it from the javascript function. I think that is a “bug” that works in your favor here.

from monstache.

rwynn avatar rwynn commented on July 18, 2024

@aguyinmontreal this should be fixed now and the docs updated. Use the 4.0.0 release. Thanks and let me know if you run into any problems.

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

@rwynn Alright! I'll take a look at it!

So the default index name for Elastic 6.2+ becomes "mongodb database name . mongodb collection name"? So the full default namespace (index + type) becomes "mongodb database name . mongodb collection name . _doc"?

Are you sure two dots are allowed in a Elasticsearch namespace?

from monstache.

rwynn avatar rwynn commented on July 18, 2024

ES index is ‘db.collection’ and I’ve used dots with no problem. I don’t think elasticsearch has namespace just index and type. Type is _doc.

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Sorry for the confusion.

For me:
1- index + type equals namespace
2-index is the part before the dot
3-type is the part after the doc

from monstache.

aguyinmontreal avatar aguyinmontreal commented on July 18, 2024

Hi rwynn!

I saw this note here in the docs, and I'm a little confused:

image

In my case, as we discussed, my parent and my child documents are not coming from the same namespace. So even if I can correctly prefix the dynamic index of either the parent or the child, the remaining one will be prefixed with the wrong namespace.

What is best to do?


Another roadblock I hit is that my parent document IDs (i.e. the routing info) are stored as ObjectId in my child documents.

Can I .toString() them somehow inside the TOML Javascript script? Or does Monstache automatically .toString() the doc._meta_monstache.routing? Or do I absolutely need to store my parent document IDs as strings in MongoDB?

Thanks!

from monstache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.