Code Monkey home page Code Monkey logo

records-mover's Introduction

Records Mover

Documentation Status

Records mover is a command-line tool and Python library you can use to move relational data from one place to another.

Relational data here means anything roughly "rectangular" - with columns and rows. For example, it supports reading and writing from:

  • Databases, including using native high-speed methods of import/export of bulk data. Redshift, Vertica and PostgreSQL are well-supported, with some support for BigQuery and MySQL.
  • CSV files
  • Parquet files (initial support)
  • Google Sheets
  • Pandas DataFrames
  • Records directories - a structured directory of CSV/Parquet/etc files containing some JSON metadata about their format and origins. Records directories are especially helpful for the ever-ambiguous CSV format, where they solve the problem of 'hey, this may be a CSV - but what's the schema? What's the format of the CSV itself? How is it escaped?'

Records mover can be exended expand to handle additional databases and data file types. Databases are supported by building on top of their SQLAlchemy drivers. Records mover is able to auto-negotiate the most efficient way of moving data from one to the other.

CLI use example

Installing:

pip3 install 'records_mover[cli,postgres-binary,redshift-binary]'

Loading a CSV into a database:

mvrec file2table foo.csv redshiftdb1 myschema1 mytable1

Copying a table from a PostgreSQL to a Redshift database:

mvrec --help
mvrec table2table postgresdb1 myschema1 mytable1 redshiftdb2 myschema2 mytable2

Note records mover will automatically build an appropriate CREATE TABLE statement on the target end if the table doesn't already exist.

Note that the connection details for the database names here must be configured using db-facts.

For more installation notes, see INSTALL.md. To understand the security model here, see SECURITY.md.

CLI use demo (table creation and loading)

Python library use example

First, install records_mover. We'll also use Pandas, so we'll install that, too, as well as a driver for Postgres.

pip3 install records_mover[pandas,postgres-source]

Now we can run this code:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import sources, targets, move
from pandas import DataFrame
import sqlalchemy
import os

sqlalchemy_url = f"postgresql+psycopg2://username:{os.environ['DB_PASSWORD']}@hostname/database_name"
db_engine = sqlalchemy.create_engine(sqlalchemy_url)

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = sources.dataframe(df=df)
target = targets.table(schema_name='myschema',
                       table_name='mytable',
                       db_engine=db_engine)
results = move(source, target)

When moving data, the sources supported can be found here, and the targets supported can be found here.

Advanced Python library use example

Here's another example, using some additional features:

  • Loading from an existing dataframe.
  • Secrets management using db-facts, which is a way to configure credentials in YAML files or even fetch them dynamically from your secrets store.
  • Logging configuration to show the internal processing steps (helpful in optimizing performance or debugging issues)

you can use this:

#!/usr/bin/env python3

# Pull in the records-mover library - be sure to run the pip install above first!
from records_mover import Session
from pandas import DataFrame

session = Session()
session.set_stream_logging()
records = session.records

db_engine = session.get_default_db_engine()

df = DataFrame.from_dict([{'a': 1}])  # or make your own!

source = records.sources.dataframe(df=df)
target = records.targets.table(schema_name='myschema',
                               table_name='mytable',
                               db_engine=db_engine)
results = records.move(source, target)

Python library API documentation

You can can find more API documentation here. In particular, note:

records-mover's People

Contributors

vinceatbluelabs avatar cwegrzyn avatar archetypalsxe avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.