Code Monkey home page Code Monkey logo

ferry's Introduction

ferry

Build Status Gem Version

What is Ferry?

Ferry is a command-line tool rubygem designed for Rails data migrations and manipulation, primarily maintained as an open-source project by the students of Carnegie Mellon's Information Systems department since August 2014. The inspiration for ferry was brought from collective internship experiences and from the growing prevalence of big data migration and manipulation challenges that companies, corporations, universities, and organizations face in today's information age. A large thanks in part to CustomInk's Technology Team and their blog articles for advice and guidance during the first semester (Fall 2014) of development.

What can I use Ferry for?

See the ferry_demo app or our GitHub pages site for further documentation on using Ferry!

Rails Migration and Manipulation use cases

  • Exporting and Importing data to various file formats (.csv, .yml, .json, .sql)
  • Migrating data to third party services (Amazon S3, Oracle) and different databases and manipulating data via a Custom Built DSL (see coming soon)

Coming soon ...

  • Configurable Migration Scripting
    • The idea behind this feature is for developers to provide options for arguments in an executable script that contains the configuration and necessary tasks/ actions for the operations of whatever data migration or manipulation they are seeking to carry out.
    • Similar to how capistrano configures deploy.rb.
    • To do this we will be investigating custom DSL's and building our own in order to provide a quick and easy solution that will allow developers to carry out complex migration and manipulation strategies in just an afternoon!
  • Data Visualization
    • With inspiration from d3, we are hoping to create functionality that allows developers to deploy informative and visually appealing graphs and documents that can be shared over an internal network to be broadcasted to servers for display either on internal office displays or to URL's ... all from executing a simple command-line statement.
    • We will be making use of d3 for visualizations and are looking for current solutions to this business need and if there are any successful or not-so-successful solutions out there to compete with.
    • After some consideration, this will either be built into our DSL or pulled out into a separate gem that does data proessing and visualization all on its own.

Installation

Add this line to your Rails application's Gemfile:

gem 'ferry'

And then execute:

$ bundle

Or install it yourself as:

$ gem install ferry

To view what Ferry can do for you just run:

$ ferry --help

Exporting

Ferry can export data from a database connected to a Rails app into a CSV or YAML file. We currently only support exporting of SQLite3, MySQL2, and PostgreSQL databases.

Run ferry --to_csv [environment] [table] in your Rails directory to export to csv:

$ ferry --to_csv production users

Running the above command will export the "users" table from the database connected to the "production" environment. A csv file populated with the "users" table data will be created at /db/csv/test/users.csv (the path will be created and if there is a users.csv it will be overwritten).

Run ferry --to_yaml [environment] [table] in your Rails directory to export to yaml:

$ ferry --to_yaml development users

Similarly, running the above command in the Rails directory will export the "users" table from the database connected to the "development" environment. A yaml file populated with the "users" table data will be created at /db/yaml/test/users.yaml (the path will be created and if there is a users.yaml it will be overwritten).

Run ferry --to_json [environment] [table] in your Rails directory to export to json:

$ ferry --to_json development users

Similarly, running the above command in the Rails directory will export the "users" table from the database connected to the "development" environment. A json file populated with the "users" table data will be created at /db/yaml/test/users.json (the path will be created and if there is a users.json it will be overwritten).

Importing

Ferry can import a csv or json file of validated records into a table of a Rails-connected database. The csv and json files must:

  • Have headers (or keys) that match field names of the table
  • Have values that meet the table's constraints (i.e. required fields, correct data types, unique PKs, etc.)

Run ferry --import_csv [environment] [table] [file path] in your Rails directory to import a csv to a database table:

$ ferry --import_csv development users db/csv/import_data.csv

Running the above command will import the import_data.csv to the "users" table in the "development" environment, and the same goes for ferry --import_json

Dumping and Filling .sql

Ferry can either dump or fill your current database to a .sql file or with reference to a .sql file you would like to import respectively with just these simple commands.

$ ferry --dump [environment]
$ ferry --fill [environment] [path/to/file.sql]

Where [environment] is just the development, production, test, etc database environment you are developing with.

Contributing

  1. Fork it ( https://github.com/[my-github-username]/ferry/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

If you wish to open a pull request or issue for anything at all please feel free to do so!

A large driving factor in the development of Ferry is contributing something meaningful to the open-source community and the developer community at large. Being college students who have access to such an unbelievable amount of developement resources for creating cool projects, we felt a need to give back - we wanted to start a project that would face unique challenges such that others who face similar challenges could turn to us for help and guidance. We hope that Ferry continues to be a project that both provides benefit to businesses and developers along with giving back to the open-source and greater developer community.

ferry's People

Contributors

anthonycorletti avatar loganwatanabe avatar profh avatar rojoko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ferry's Issues

ferry --help

setup a menu for things to do when someone runs ferry, ferry -h, or ferry --help

Extraneous requirements

Not sure if this is avoidable, but when trying to include ferry in a project of mine, it requires me to install postgres and mysql.

I see the reason for this, but if I am only using postgres or sqlite (which is what I normally use) it is a pain to install mysql for something I won't use

maybe split it into seperate gems, like ferry-mysql or ferry-pg

Data Visualization

Being able to run a quick command prompt that deploys informative graphs and charts to a designated location could hold large value for ferry and the environment around developers. We will be looking into developing functionality like this with d3 visualizations.

Dumping a Remote Database

Is ferry able to dump a remote database? I have attempted this with a mysql database located on remote server, but I get this error:

ferry --dump phenotype
operating with mysql2
mysqldump: Got error: 1044: Access denied for user ''@'localhost' to database 'phenotypes' when selecting the database
"Complete!"

This is what is in my database.yml for the phenotype database:

phenotype:
adapter: mysql2
encoding: utf8
reconnect: false
database: phenotypes
host: pawn.hss.cmu.edu
username: phenotypes
password: –––––––––
socket: /tmp/mysql.sock

Unix style commands

It would be useful to allow the options to follow a "unix style" syntax vs the comma syntax defined.

for example
ferry --import development table path/to/data.csv vs
ferry --import development,table,path/to/data.csv

This allows for Unix tab completion of paths, and is implemented by other gems like bropages & capistrano

ActiveRecord::NoDatabaseError if current user does not have a postgres role

When I try to run ferry in development I end up with an ActiveRecord::NoDatabaseError (see stack trace below). I solved the issue in a fork and will create a pull request for it. It seems like the problem occurs because ferry tries to use the system user name to access the postgreSQL server.

$ ferry --to_csv development items

operating with postgresql
/home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/postgresql_adapter.rb:898:in `rescue in connect': FATAL:  role "roland" does not exist (ActiveRecord::NoDatabaseError)
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/postgresql_adapter.rb:888:in `connect'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/postgresql_adapter.rb:568:in `initialize'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `new'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `postgresql_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:435:in `new_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:445:in `checkout_new_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:416:in `acquire_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:351:in `block in checkout'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:350:in `checkout'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:541:in `retrieve_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_handling.rb:113:in `retrieve_connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/activerecord-4.1.6/lib/active_record/connection_handling.rb:87:in `connection'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/ferry-1.0.1/lib/ferry/exporter.rb:11:in `to_csv'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/ferry-1.0.1/bin/ferry:12:in `block (2 levels) in <top (required)>'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1359:in `call'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1359:in `block in parse_in_order'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1346:in `catch'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1346:in `parse_in_order'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1340:in `order!'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1432:in `permute!'
  from /home/roland/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/optparse.rb:1454:in `parse!'
  from /home/roland/.rvm/gems/ruby-2.1.2/gems/ferry-1.0.1/bin/ferry:50:in `<top (required)>'
  from /home/roland/.rvm/gems/ruby-2.1.2/bin/ferry:23:in `load'
  from /home/roland/.rvm/gems/ruby-2.1.2/bin/ferry:23:in `<main>'
  from /home/roland/.rvm/gems/ruby-2.1.2/bin/ruby_executable_hooks:15:in `eval'
  from /home/roland/.rvm/gems/ruby-2.1.2/bin/ruby_executable_hooks:15:in `<main>'

Rspec also fails with:

$ rspec spec/tests/utilities_tests.rb
Failures:

  1) utility functions #db_connect #postgresql
     Failure/Error: expect(ActiveRecord::Base.connection.adapter_name).to eql('PostgreSQL')
     ActiveRecord::NoDatabaseError:
       FATAL:  role "roland" does not exist
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/postgresql_adapter.rb:898:in `rescue in connect'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/postgresql_adapter.rb:888:in `connect'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/postgresql_adapter.rb:568:in `initialize'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `new'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `postgresql_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:435:in `new_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:445:in `checkout_new_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:416:in `acquire_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:351:in `block in checkout'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:350:in `checkout'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:541:in `retrieve_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_handling.rb:113:in `retrieve_connection'
     # /home/roland/.rvm/gems/ruby-2.1.1/gems/activerecord-4.1.8/lib/active_record/connection_handling.rb:87:in `connection'
     # ./tests/utilities_tests.rb:15:in `block (3 levels) in <top (required)>'

SQL Dumper & Filler

These objects will allow us to dump an applications entire database to a .sql file, or fill an empty specified database from a .sql file. Some challenges we will face here are ...

  • how to test whether or not a dump was successful
  • how to test whether or not a fill was successful
  • making sure all database knowledge has been dumped / filled (like functions, calculated columns, triggers, indexing patters, etc are carried over)
  • that filling a database matches the correct schema of an application

reviews, revisions, repairs

- [ ] Review the code in place - [ ] Make sure old activerecord::relation tasks work - [ ] TESTS!!!

Please comment / add more to the list

documentation

clean up readme

  • include link to git page
  • install
  • examples
    • to_csv
    • to_yaml
    • import
    • db_switch
  • running tests?

Options for Import

As mentioned when I first merged the import functionality, other importers give the option to validate the data against AR before inserting, and also automatically generating PKs for the data if they do not have any.
I think the PK generation would be simple, just need to query the db for the index of the table. But we'd need a case if the PK is not the default "id" field. Then we can assign that to the record and import it.

For the validation, I think that is best left out for now. The dbs should have the basic type validation, but beyond that it might be a big performance hit if we need to check every row of import data against a model.

Customizable migrations

Creating a customizable script that takes in arguments that a user provides is something I feel is a necessary component of ferry that does not yet exist. With the current $ ferry -init command we create a base file ferry.rake in the tasks directory of the developers Rails app. While this provides some value, it's not very impactful. Digging into how capistrano allows users to configure their deploys in a deploy.rb file is going to be a starting point for us as we look to develop a way to provide our own equivalent of deploy.rb for data migrations and manipulations.

Init command for DB setup?

Connecting to ActiveRecord requires we know the DB type, and also the credentials that go along with it depending on the DB type. For now, I'm putting the current type as an argument on the export call, but I think we should look into having a initialize for ferry where they specify what the current db is (and any credentials).
Unless there is a better way to detect it, but I can't figure out how to find it through Active Record because I actually have to connect to the DB first before asking what type it is.

Exports do not work

After Installing this gem via the bundle and via gem install...I got this error in both situations:

gems/ferry-1.0.0/bin/ferry:10:in 'block (2 levels) in <top (required)>': 
please enter a field for environment and table (RuntimeError)

weekend of 9/26-9/28 refactor and sql switcher

my weekend project is going to entail ...

  • refactor code up to 9.25/26
  • creating db switcher that takes a sqlite db and makes it postgresql
  • cleaning up git history
  • refactor readme to make documentation a little nicer

again - sorry for being really MIA on this. the job search has really been relentless and i've been devoting alot of time to that in place of putting more effort in to 475. hopefully this weekend will have some nice deliverables in it.

psql and mysql installation wizard for db switcher

  • asks if you want to proceed
  • takes your input and installs
    • where to pull code from to install
    • path configuration
    • checking for previously installed versions?
    • giving the user feedback during installation
    • rollback if break

Exporting calculated 'columns'

Any planned support for something like this? May not be in the scope of the gem, as it seems like this focuses on data import/export

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.