Code Monkey home page Code Monkey logo

Comments (4)

yc-huang avatar yc-huang commented on June 16, 2024

Great idea! Currently it's not supported since we have different use cases:
we use mongodb to store some meta/user profile data, and we need to both
query and update to it.

The mongo dump file seems just a collection of BSON objects, so if there
have a delimiter for each row/bson object, which needed is just a bson
SerDe. (and a custom split implementation might also needed to enable
parallel processing). Not sure how difficult to implement this base on the
java driver's bson code, still need further investigation.

I think you could dump as CSV file using mongoexport as a workaround. If
the CSV is huge, compression(snappy, lzo,bz2,gzip) might helps.

On Tue, May 8, 2012 at 7:52 AM, Alessandro D. Gagliardi <
[email protected]

wrote:

It would be really cool of Hive-mongo could read directly from MongoDB
files rather than having to go through a mongod process (this way I could
run it directly against backups without having to start mongod on them). If
this is too difficult/impossible, the next best thing would be to be able
to run it against the bson files produced by mongodump (though at that
point, I'm already halfway to exporting the data to another format anyway).


Reply to this email directly or view it on GitHub:
#4

from hive-mongo.

MadDataScience avatar MadDataScience commented on June 16, 2024

CSV is no good as we have shifting schemata and nested documents and all kinds of other madness that make CSV a mess. I imagine you're already aware of https://github.com/mongodb/mongo-hadoop but I thought I'd mention it just in case.

from hive-mongo.

yc-huang avatar yc-huang commented on June 16, 2024

yeah, they have a wonderful shard-aware input split implementation and we'd
like to migrate Hive-mongo to use that...

On Wednesday, May 9, 2012, Alessandro D. Gagliardi wrote:

CSV is no good as we have shifting schemata and nested documents and all
kinds of other madness that make CSV a mess. I imagine you're already aware
of https://github.com/mongodb/mongo-hadoop but I thought I'd mention it
just in case.


Reply to this email directly or view it on GitHub:
#4 (comment)

from hive-mongo.

yc-huang avatar yc-huang commented on June 16, 2024

Just got message from 10gen engineer that they have a hive connector which currently support static bson file:
https://github.com/mongodb/mongo-hadoop/tree/master/hive

from hive-mongo.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.