Comments (4)
Great idea! Currently it's not supported since we have different use cases:
we use mongodb to store some meta/user profile data, and we need to both
query and update to it.
The mongo dump file seems just a collection of BSON objects, so if there
have a delimiter for each row/bson object, which needed is just a bson
SerDe. (and a custom split implementation might also needed to enable
parallel processing). Not sure how difficult to implement this base on the
java driver's bson code, still need further investigation.
I think you could dump as CSV file using mongoexport as a workaround. If
the CSV is huge, compression(snappy, lzo,bz2,gzip) might helps.
On Tue, May 8, 2012 at 7:52 AM, Alessandro D. Gagliardi <
[email protected]
wrote:
It would be really cool of Hive-mongo could read directly from MongoDB
files rather than having to go through a mongod process (this way I could
run it directly against backups without having to start mongod on them). If
this is too difficult/impossible, the next best thing would be to be able
to run it against the bson files produced by mongodump (though at that
point, I'm already halfway to exporting the data to another format anyway).
Reply to this email directly or view it on GitHub:
#4
from hive-mongo.
CSV is no good as we have shifting schemata and nested documents and all kinds of other madness that make CSV a mess. I imagine you're already aware of https://github.com/mongodb/mongo-hadoop but I thought I'd mention it just in case.
from hive-mongo.
yeah, they have a wonderful shard-aware input split implementation and we'd
like to migrate Hive-mongo to use that...
On Wednesday, May 9, 2012, Alessandro D. Gagliardi wrote:
CSV is no good as we have shifting schemata and nested documents and all
kinds of other madness that make CSV a mess. I imagine you're already aware
of https://github.com/mongodb/mongo-hadoop but I thought I'd mention it
just in case.
Reply to this email directly or view it on GitHub:
#4 (comment)
from hive-mongo.
Just got message from 10gen engineer that they have a hive connector which currently support static bson file:
https://github.com/mongodb/mongo-hadoop/tree/master/hive
from hive-mongo.
Related Issues (18)
- Exception HOT 2
- java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.SerDeUtils.hasAnyNullObject HOT 7
- Is there any plan to add support for username password? HOT 5
- Additional data types to be added? HOT 2
- java.lang.AbstractMethodError in 'insert into' while using org.yong3.hive.mongo.MongoStorageHandler.configureJobConf HOT 2
- Error:org.apache.hadoop.hive.serde2.ColumnProjectionUtils.getReadColumnIDs(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/ArrayList HOT 3
- Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.ColumnProjectionUtils.getReadColumnIDs(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/ArrayList; HOT 7
- Exception in thread "main" java.lang.NoSuchMethodError: HOT 3
- org.yong3.hive.mongo.MongoStorageHandler class not found HOT 3
- metadata exception HOT 3
- Exception in thread "main" java.lang.NoSuchMethodError: com.mongodb.DB.authenticate(Ljava/lang/String;[C)Z
- How do I compile? HOT 5
- Error in loading storage handler.org.yong3.hive.mongo.MongoStorageHandler HOT 4
- Compile and Install HOT 1
- Error in metadata: java.lang.NullPointerException HOT 7
- Joining to Hive Data? HOT 2
- More conversation about hive-mongo
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hive-mongo.