Code Monkey home page Code Monkey logo

mongodb's Introduction

MongoDB Source

This source is no longer actively maintained, and is only available as-is

Segment source for MongoDB. Syncs your production MongoDB instance with Segment Objects API.

Schema

A products collection in the test database that looks like this in your production MongoDB instance...

{
    "name": "Apple",
    "cost": 1.27,
    "translations": {
        "spanish": "manzana",
        "french" : "pomme"
    }
},
{
    "name": "Pear",
    "cost": 2.01,
    "translations": {
        "spanish": "pera"
    }
}

would be queryable in your analytics Redshift or Postgres database like this...

select * from <source-name>.test_products

Redshift

name cost translations_spanish translations_french
Apple 1.27 manzana NULL
Pear 2.01 pera pomme

Note that the user must explicitly define which fields they want imported from their DB ahead of time. See below for more on how that works.

Quick Start

Docker

If you're running docker in production, you can simply run this as a docker image:

$ docker run segment/mongodb-source <your-options>

Build and Run

Prerequisites: Go

go get github.com/segment-sources/mongodb

The first step is to initialize your schema. You can do so by running mongodb with --init flag.

mongodb --hostname=mongo-test.ksd31bacms.us-west-2.rds.amazonaws.com --port=27017 --username=segment --password=cndgks9102baajls --database=segment --init

The init step will store the schema of possible collections that the source can sync in schema.json. The user should then fill in which fields for each collection should be exported. If no fields for a collection are desired, feel free to remove that particular collection from the JSON entry altogether.

In the schema.json example below, our parser found the collection products in the database test.

{
    "test": {
        "products": {
        }
    }
}

Let's say a user wants to export 4 fields: name, cost, translations_spanish, translations_french as in the original example of this doc. The JSON should then be:

{
    "test": {
        "products": {
            "destination_name": "my_products",
            "fields": {
                "name": {
                  "destination_name": "my_name"
                },
                "cost": null,
                "translations.spanish": {
                  "destination_name": "translations_spanish"
                },
                "translations.french": {
                  "destination_name": "translations_french"
                }
            }
        }
    }
}

name and cost are first level fields, so their source values are simply the field names. The other two fields are nested fields so they need to refer to their nested field names using dot syntax, for example translations.spanish and translations.french.

Some notes:

  • โš ๏ธ The warehouse type for a particular field is set the first time data for that field is seen. If subsequent data inserted into the warehouse has a different type than the original type seen, the field value may not cast correctly and loaded into the warehouse properly.
  • Currently the only supported MongoDB data types are string, integer, long, double, boolean, date.
  • Each object's native _id_ field is already uploaded by default to Segment and is used as a unique identifier for that object. There is no need to put this field in schema.json.

Scan

To begin exporting fields out of the DB, remove the --init flag and add a --write-key value:

mongodb --hostname=mongo-test.ksd31bacms.us-west-2.rds.amazonaws.com --port=27017 --username=segment --password=cndgks9102baajls --database=segment --sslmode=prefer --write-key=ab-200-1alx91kx

Usage

Usage:
  mongodb
    [--debug]
    [--init]
    [--concurrency=<c>]
    [--write-key=<segment-write-key>]
    --hostname=<hostname>
    --port=<port>
    --username=<username>
    --password=<password>
    --database=<database>
    [-- <extra-driver-options>...]
  mongodb -h | --help
  mongodb --version

Options:
  -h --help                   Show this screen
  --version                   Show version
  --write-key=<key>           Segment source write key
  --concurrency=<c>           Number of concurrent collection scans [default: 1]
  --hostname=<hostname>       Database instance hostname
  --port=<port>               Database instance port number
  --password=<password>       Database instance password
  --database=<database>       Database instance name

mongodb's People

Contributors

f2prateek avatar liquidy avatar srthurman avatar tonyxiao avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mongodb's Issues

Specify type in schema

Would be nice for customers to be force a type in the schema so that we cast all values (as much as possible; null if impossible to do safely) to that type. This is especially useful w/ MongoDB since the first doc's type for a field may not accurately reflect the rest of them.

case-insensitive import collision

I'm receiving the following error when running go get for this project.

src/github.com/segmentio/ecs-logs-go/logrus/formatter.go:7:2: case-insensitive import collision: "github.com/sirupsen/logrus" and "github.com/Sirupsen/logrus"

This is consistent with the issue here sirupsen/logrus#543

I have a branch ready to fix this, I will fork the project first and test on my end, but I don't have contributor access to this repo.

Connecting to database but receiving error 'no reachable servers'

I've used the mysql segment-source before and it works great. I wanted to try it with the mongodb we are using on one app however when running init with debug on I'm seeing:

INFO[0000] Will output schema to schema.json
INFO[0000] Will connect to database [email protected]:19906/DATABASE
DEBU[0001] Connection to database 'DATABASE' established!
ERRO[0007] no reachable servers

I've doubled-checked the credentials and everything works fine when connecting from the shell.

I'm guessing it's something that has changed recently that isn't reflected in the readme. Anyone got any tips on this?

Build Docker image on CI

  • make build on CI
  • go get on CI
  • COPY ./bin/... in DOCKERFILE

No need to install Go in container :)

Hangs on auth error

Seems like source hangs on auth errors. May just be mgo's crazy long timeout. We should do something about this so that the customer gets notified of failure within reasonable time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.