Code Monkey home page Code Monkey logo

activerecord-import's Introduction

Activerecord-Import Build Status

Activerecord-Import is a library for bulk inserting data using ActiveRecord.

One of its major features is following activerecord associations and generating the minimal number of SQL insert statements required, avoiding the N+1 insert problem. An example probably explains it best. Say you had a schema like this:

  • Publishers have Books
  • Books have Reviews

and you wanted to bulk insert 100 new publishers with 10K books and 3 reviews per book. This library will follow the associations down and generate only 3 SQL insert statements - one for the publishers, one for the books, and one for the reviews.

In contrast, the standard ActiveRecord save would generate 100 insert statements for the publishers, then it would visit each publisher and save all the books: 100 * 10,000 = 1,000,000 SQL insert statements and then the reviews: 100 * 10,000 * 3 = 3M SQL insert statements,

That would be about 4M SQL insert statements vs 3, which results in vastly improved performance. In our case, it converted an 18 hour batch process to <2 hrs.

The gem provides the following high-level features:

  • Works with raw columns and arrays of values (fastest)
  • Works with model objects (faster)
  • Performs validations (fast)
  • Performs on duplicate key updates (requires MySQL, SQLite 3.24.0+, or Postgres 9.5+)

Table of Contents

Examples

Introduction

This gem adds an import method (or bulk_import, for compatibility with gems like elasticsearch-model; see Conflicts With Other Gems) to ActiveRecord classes.

Without activerecord-import, you'd write something like this:

10.times do |i|
  Book.create! name: "book #{i}"
end

This would end up making 10 SQL calls. YUCK! With activerecord-import, you can instead do this:

books = []
10.times do |i|
  books << Book.new(name: "book #{i}")
end
Book.import books    # or use import!

and only have 1 SQL call. Much better!

Columns and Arrays

The import method can take an array of column names (string or symbols) and an array of arrays. Each child array represents an individual record and its list of values in the same order as the columns. This is the fastest import mechanism and also the most primitive.

columns = [ :title, :author ]
values = [ ['Book1', 'George Orwell'], ['Book2', 'Bob Jones'] ]

# Importing without model validations
Book.import columns, values, validate: false

# Import with model validations
Book.import columns, values, validate: true

# when not specified :validate defaults to true
Book.import columns, values

Hashes

The import method can take an array of hashes. The keys map to the column names in the database.

values = [{ title: 'Book1', author: 'George Orwell' }, { title: 'Book2', author: 'Bob Jones'}]

# Importing without model validations
Book.import values, validate: false

# Import with model validations
Book.import values, validate: true

# when not specified :validate defaults to true
Book.import values

Import Using Hashes and Explicit Column Names

The import method can take an array of column names and an array of hash objects. The column names are used to determine what fields of data should be imported. The following example will only import books with the title field:

books = [
  { title: "Book 1", author: "George Orwell" },
  { title: "Book 2", author: "Bob Jones" }
]
columns = [ :title ]

# without validations
Book.import columns, books, validate: false

# with validations
Book.import columns, books, validate: true

# when not specified :validate defaults to true
Book.import columns, books

# result in table books
# title  | author
#--------|--------
# Book 1 | NULL
# Book 2 | NULL

Using hashes will only work if the columns are consistent in every hash of the array. If this does not hold, an exception will be raised. There are two workarounds: use the array to instantiate an array of ActiveRecord objects and then pass that into import or divide the array into multiple ones with consistent columns and import each one separately.

See #507 for discussion.

arr = [
  { bar: 'abc' },
  { baz: 'xyz' },
  { bar: '123', baz: '456' }
]

# An exception will be raised
Foo.import arr

# better
arr.map! { |args| Foo.new(args) }
Foo.import arr

# better
arr.group_by(&:keys).each_value do |v|
 Foo.import v
end

ActiveRecord Models

The import method can take an array of models. The attributes will be pulled off from each model by looking at the columns available on the model.

books = [
  Book.new(title: "Book 1", author: "George Orwell"),
  Book.new(title: "Book 2", author: "Bob Jones")
]

# without validations
Book.import books, validate: false

# with validations
Book.import books, validate: true

# when not specified :validate defaults to true
Book.import books

The import method can take an array of column names and an array of models. The column names are used to determine what fields of data should be imported. The following example will only import books with the title field:

books = [
  Book.new(title: "Book 1", author: "George Orwell"),
  Book.new(title: "Book 2", author: "Bob Jones")
]
columns = [ :title ]

# without validations
Book.import columns, books, validate: false

# with validations
Book.import columns, books, validate: true

# when not specified :validate defaults to true
Book.import columns, books

# result in table books
# title  | author
#--------|--------
# Book 1 | NULL
# Book 2 | NULL

Batching

The import method can take a batch_size option to control the number of rows to insert per INSERT statement. The default is the total number of records being inserted so there is a single INSERT statement.

books = [
  Book.new(title: "Book 1", author: "George Orwell"),
  Book.new(title: "Book 2", author: "Bob Jones"),
  Book.new(title: "Book 1", author: "John Doe"),
  Book.new(title: "Book 2", author: "Richard Wright")
]
columns = [ :title ]

# 2 INSERT statements for 4 records
Book.import columns, books, batch_size: 2

If your import is particularly large or slow (possibly due to callbacks) whilst batch importing, you might want a way to report back on progress. This is supported by passing a callable as the batch_progress option. e.g:

my_proc = ->(rows_size, num_batches, current_batch_number, batch_duration_in_secs) {
  # Using the arguments provided to the callable, you can
  # send an email, post to a websocket,
  # update slack, alert if import is taking too long, etc.
}

Book.import columns, books, batch_size: 2, batch_progress: my_proc

Recursive

Note This only works with PostgreSQL and ActiveRecord objects. This won't work with hashes or arrays as recursive inputs.

Assume that Books has_many Reviews.

books = []
10.times do |i|
  book = Book.new(name: "book #{i}")
  book.reviews.build(title: "Excellent")
  books << book
end
Book.import books, recursive: true

Options

Key Options Default Description
:validate true/false true Whether or not to run ActiveRecord validations (uniqueness skipped). This option will always be true when using import!.
:validate_uniqueness true/false false Whether or not to run ActiveRecord uniqueness validations. Beware this will incur an sql query per-record (N+1 queries). (requires >= v0.27.0).
:validate_with_context Symbol :create/:update Allows passing an ActiveModel validation context for each model. Default is :create for new records and :update for existing ones.
:track_validation_failures true/false false When this is set to true, failed_instances will be an array of arrays, with each inner array having the form [:index_in_dataset, :object_with_errors]
:on_duplicate_key_ignore true/false false Allows skipping records with duplicate keys. See here for more details.
:ignore true/false false Alias for :on_duplicate_key_ignore.
:on_duplicate_key_update :all, Array, Hash N/A Allows upsert logic to be used. See here for more details.
:synchronize Array N/A An array of ActiveRecord instances. This synchronizes existing instances in memory with updates from the import.
:timestamps true/false true Enables/disables timestamps on imported records.
:recursive true/false false Imports has_many/has_one associations (PostgreSQL only).
:recursive_on_duplicate_key_update Hash N/A Allows upsert logic to be used for recursive associations. The hash key is the association name and the value has the same options as :on_duplicate_key_update. See here for more details.
:batch_size Integer total # of records Max number of records to insert per import
:raise_error true/false false Raises an exception at the first invalid record. This means there will not be a result object returned. The import! method is a shortcut for this.
:all_or_none true/false false Will not import any records if there is a record with validation errors.

Duplicate Key Ignore

MySQL, SQLite, and PostgreSQL (9.5+) support on_duplicate_key_ignore which allows you to skip records if a primary or unique key constraint is violated.

For Postgres 9.5+ it adds ON CONFLICT DO NOTHING, for MySQL it uses INSERT IGNORE, and for SQLite it uses INSERT OR IGNORE. Cannot be enabled on a recursive import. For database adapters that normally support setting primary keys on imported objects, this option prevents that from occurring.

book = Book.create! title: "Book1", author: "George Orwell"
book.title = "Updated Book Title"
book.author = "Bob Barker"

Book.import [book], on_duplicate_key_ignore: true

book.reload.title  # => "Book1"     (stayed the same)
book.reload.author # => "George Orwell" (stayed the same)

The option :on_duplicate_key_ignore is bypassed when :recursive is enabled for PostgreSQL imports.

Duplicate Key Update

MySQL, PostgreSQL (9.5+), and SQLite (3.24.0+) support on duplicate key update (also known as "upsert") which allows you to specify fields whose values should be updated if a primary or unique key constraint is violated.

One big difference between MySQL and PostgreSQL support is that MySQL will handle any conflict that happens, but PostgreSQL requires that you specify which columns the conflict would occur over. SQLite models its upsert support after PostgreSQL.

This will use MySQL's ON DUPLICATE KEY UPDATE or Postgres/SQLite ON CONFLICT DO UPDATE to do upsert.

Basic Update

book = Book.create! title: "Book1", author: "George Orwell"
book.title = "Updated Book Title"
book.author = "Bob Barker"

# MySQL version
Book.import [book], on_duplicate_key_update: [:title]

# PostgreSQL version
Book.import [book], on_duplicate_key_update: {conflict_target: [:id], columns: [:title]}

# PostgreSQL shorthand version (conflict target must be primary key)
Book.import [book], on_duplicate_key_update: [:title]

book.reload.title  # => "Updated Book Title" (changed)
book.reload.author # => "George Orwell"          (stayed the same)

Using the value from another column

book = Book.create! title: "Book1", author: "George Orwell"
book.title = "Updated Book Title"

# MySQL version
Book.import [book], on_duplicate_key_update: {author: :title}

# PostgreSQL version (no shorthand version)
Book.import [book], on_duplicate_key_update: {
  conflict_target: [:id], columns: {author: :title}
}

book.reload.title  # => "Book1"              (stayed the same)
book.reload.author # => "Updated Book Title" (changed)

Using Custom SQL

book = Book.create! title: "Book1", author: "George Orwell"
book.author = "Bob Barker"

# MySQL version
Book.import [book], on_duplicate_key_update: "author = values(author)"

# PostgreSQL version
Book.import [book], on_duplicate_key_update: {
  conflict_target: [:id], columns: "author = excluded.author"
}

# PostgreSQL shorthand version (conflict target must be primary key)
Book.import [book], on_duplicate_key_update: "author = excluded.author"

book.reload.title  # => "Book1"      (stayed the same)
book.reload.author # => "Bob Barker" (changed)

PostgreSQL Using partial indexes

book = Book.create! title: "Book1", author: "George Orwell", published_at: Time.now
book.author = "Bob Barker"

# in migration
execute <<-SQL
      CREATE INDEX books_published_at_index ON books (published_at) WHERE published_at IS NOT NULL;
    SQL

# PostgreSQL version
Book.import [book], on_duplicate_key_update: {
  conflict_target: [:id],
  index_predicate: "published_at IS NOT NULL",
  columns: [:author]
}

book.reload.title  # => "Book1"          (stayed the same)
book.reload.author # => "Bob Barker"     (changed)
book.reload.published_at # => 2017-10-09 (stayed the same)

PostgreSQL Using constraints

book = Book.create! title: "Book1", author: "George Orwell", edition: 3, published_at: nil
book.published_at = Time.now

# in migration
execute <<-SQL
      ALTER TABLE books
        ADD CONSTRAINT for_upsert UNIQUE (title, author, edition);
    SQL

# PostgreSQL version
Book.import [book], on_duplicate_key_update: {constraint_name: :for_upsert, columns: [:published_at]}


book.reload.title  # => "Book1"          (stayed the same)
book.reload.author # => "George Orwell"      (stayed the same)
book.reload.edition # => 3               (stayed the same)
book.reload.published_at # => 2017-10-09 (changed)
Book.import books, validate_uniqueness: true

Return Info

The import method returns a Result object that responds to failed_instances and num_inserts. Additionally, for users of Postgres, there will be two arrays ids and results that can be accessed.

articles = [
  Article.new(author_id: 1, title: 'First Article', content: 'This is the first article'),
  Article.new(author_id: 2, title: 'Second Article', content: ''),
  Article.new(author_id: 3, content: '')
]

demo = Article.import(articles, returning: :title) # => #<struct ActiveRecord::Import::Result

demo.failed_instances
=> [#<Article id: 3, author_id: 3, title: nil, content: "", created_at: nil, updated_at: nil>]

demo.num_inserts
=> 1,

demo.ids
=> ["1", "2"] # for Postgres
=> [] # for other DBs

demo.results
=> ["First Article", "Second Article"] # for Postgres
=> [] # for other DBs

Counter Cache

When running import, activerecord-import does not automatically update counter cache columns. To update these columns, you will need to do one of the following:

  • Provide values to the column as an argument on your object that is passed in.
  • Manually update the column after the record has been imported.

ActiveRecord Timestamps

If you're familiar with ActiveRecord you're probably familiar with its timestamp columns: created_at, created_on, updated_at, updated_on, etc. When importing data the timestamp fields will continue to work as expected and each timestamp column will be set.

Should you wish to specify those columns, you may use the option timestamps: false.

However, it is also possible to set just :created_at in specific records. In this case despite using timestamps: true, :created_at will be updated only in records where that field is nil. Same rule applies for record associations when enabling the option recursive: true.

If you are using custom time zones, these will be respected when performing imports as well as long as ActiveRecord::Base.default_timezone is set, which for practically all Rails apps it is.

Note If you are using ActiveRecord 7.0 or later, please use ActiveRecord.default_timezone instead.

Callbacks

ActiveRecord callbacks related to creating, updating, or destroying records (other than before_validation and after_validation) will NOT be called when calling the import method. This is because it is mass importing rows of data and doesn't necessarily have access to in-memory ActiveRecord objects.

If you do have a collection of in-memory ActiveRecord objects you can do something like this:

books.each do |book|
  book.run_callbacks(:save) { false }
  book.run_callbacks(:create) { false }
end
Book.import(books)

This will run before_create and before_save callbacks on each item. The false argument is needed to prevent after_save being run, which wouldn't make sense prior to bulk import. Something to note in this example is that the before_create and before_save callbacks will run before the validation callbacks.

If that is an issue, another possible approach is to loop through your models first to do validations and then only run callbacks on and import the valid models.

valid_books = []
invalid_books = []

books.each do |book|
  if book.valid?
    valid_books << book
  else
    invalid_books << book
  end
end

valid_books.each do |book|
  book.run_callbacks(:save) { false }
  book.run_callbacks(:create) { false }
end

Book.import valid_books, validate: false

Supported Adapters

The following database adapters are currently supported:

  • MySQL - supports core import functionality plus on duplicate key update support (included in activerecord-import 0.1.0 and higher)
  • MySQL2 - supports core import functionality plus on duplicate key update support (included in activerecord-import 0.2.0 and higher)
  • PostgreSQL - supports core import functionality (included in activerecord-import 0.1.0 and higher)
  • SQLite3 - supports core import functionality (included in activerecord-import 0.1.0 and higher)
  • Oracle - supports core import functionality through DML trigger (available as an external gem: activerecord-import-oracle_enhanced
  • SQL Server - supports core import functionality (available as an external gem: activerecord-import-sqlserver

If your adapter isn't listed here, please consider creating an external gem as described in the README to provide support. If you do, feel free to update this wiki to include a link to the new adapter's repository!

To test which features are supported by your adapter, use the following methods on a model class:

  • supports_import?(*args)
  • supports_on_duplicate_key_update?
  • supports_setting_primary_key_of_imported_objects?

Additional Adapters

Additional adapters can be provided by gems external to activerecord-import by providing an adapter that matches the naming convention setup by activerecord-import (and subsequently activerecord) for dynamically loading adapters. This involves also providing a folder on the load path that follows the activerecord-import naming convention to allow activerecord-import to dynamically load the file.

When ActiveRecord::Import.require_adapter("fake_name") is called the require will be:

require 'activerecord-import/active_record/adapters/fake_name_adapter'

This allows an external gem to dynamically add an adapter without the need to add any file/code to the core activerecord-import gem.

Requiring

Note These instructions will only work if you are using version 0.2.0 or higher.

Autoloading via Bundler

If you are using Rails or otherwise autoload your dependencies via Bundler, all you need to do add the gem to your Gemfile like so:

gem 'activerecord-import'

Manually Loading

You may want to manually load activerecord-import for one reason or another. First, add the require: false argument like so:

gem 'activerecord-import', require: false

This will allow you to load up activerecord-import in the file or files where you are using it and only load the parts you need. If you are doing this within Rails and ActiveRecord has established a database connection (such as within a controller), you will need to do extra initialization work:

require 'activerecord-import/base'
# load the appropriate database adapter (postgresql, mysql2, sqlite3, etc)
require 'activerecord-import/active_record/adapters/postgresql_adapter'

If your gem dependencies aren’t autoloaded, and your script will be establishing a database connection, then simply require activerecord-import after ActiveRecord has been loaded, i.e.:

require 'active_record'
require 'activerecord-import'

Load Path Setup

To understand how rubygems loads code you can reference the following:

http://guides.rubygems.org/patterns/#loading-code

And an example of how active_record dynamically load adapters:

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/connection_specification.rb

In summary, when a gem is loaded rubygems adds the lib folder of the gem to the global load path $LOAD_PATH so that all require lookups will not propagate through all of the folders on the load path. When a require is issued each folder on the $LOAD_PATH is checked for the file and/or folder referenced. This allows a gem (like activerecord-import) to define push the activerecord-import folder (or namespace) on the $LOAD_PATH and any adapters provided by activerecord-import will be found by rubygems when the require is issued.

If fake_name adapter is needed by a gem (potentially called activerecord-import-fake_name) then the folder structure should look as follows:

activerecord-import-fake_name/
|-- activerecord-import-fake_name.gemspec
|-- lib
|   |-- activerecord-import-fake_name.rb
|   |-- activerecord-import-fake_name
|   |   |-- version.rb
|   |-- activerecord-import
|   |   |-- active_record
|   |   |   |-- adapters
|   |   |       |-- fake_name_adapter.rb

When rubygems pushes the lib folder onto the load path a require will now find activerecord-import/active_record/adapters/fake_name_adapter as it runs through the lookup process for a ruby file under that path in $LOAD_PATH

Conflicts With Other Gems

Activerecord-Import adds the .import method onto ActiveRecord::Base. There are other gems, such as elasticsearch-rails, that do the same thing. In conflicts such as this, there is an aliased method named .bulk_import that can be used interchangeably.

If you are using the apartment gem, there is a weird triple interaction between that gem, activerecord-import, and activerecord involving caching of the sequence_name of a model. This can be worked around by explicitly setting this value within the model. For example:

class Post < ActiveRecord::Base
  self.sequence_name = "posts_seq"
end

Another way to work around the issue is to call .reset_sequence_name on the model. For example:

schemas.all.each do |schema|
  Apartment::Tenant.switch! schema.name
  ActiveRecord::Base.transaction do
    Post.reset_sequence_name

    Post.import posts
  end
end

See #233 for further discussion.

More Information

For more information on Activerecord-Import please see its wiki: https://github.com/zdennis/activerecord-import/wiki

To document new information, please add to the README instead of the wiki. See #397 for discussion.

Contributing

Running Tests

The first thing you need to do is set up your database(s):

  • copy test/database.yml.sample to test/database.yml
  • modify test/database.yml for your database settings
  • create databases as needed

After that, you can run the tests. They run against multiple tests and ActiveRecord versions.

This is one example of how to run the tests:

rm Gemfile.lock
AR_VERSION=7.0 bundle install
AR_VERSION=7.0 bundle exec rake test:postgresql test:sqlite3 test:mysql2

Once you have pushed up your changes, you can find your CI results here.

Docker Setup

Before you begin, make sure you have Docker and Docker Compose installed on your machine. If you don't, you can install both via Homebrew using the following command:

brew install docker && brew install docker-compose
Steps
  1. In your terminal run docker-compose up --build
  2. In another tab/window run docker-compose exec app bash
  3. In that same terminal run the mysql2 test by running bundle exec rake test:mysql2

Issue Triage Open Source Helpers

You can triage issues which may include reproducing bug reports or asking for vital information, such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to subscribe to activerecord-import on CodeTriage.

License

This is licensed under the MIT license.

Author

Zach Dennis ([email protected])

activerecord-import's People

Contributors

abrandoned avatar ahmohsen46 avatar amatsuda avatar aquajach avatar arashm avatar aristat avatar codeodor avatar diclophis avatar dillonwelch avatar dombesz avatar dougo avatar empact avatar gee-forr avatar imtayadeway avatar jkowens avatar johnnaegle avatar jturkel avatar leonidkroka avatar mishina2228 avatar mizukami234 avatar mseal avatar ramblex avatar seanlinsley avatar sebcoetzee avatar sferik avatar spectator avatar stokarenko avatar y-yagi avatar zdennis avatar zmariscal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

activerecord-import's Issues

hash instead of arrays for 'fastest' mode?

Already supported:

columns = [ :title, :author ]
values = [ ['Book1', 'FooManChu'], ['Book2', 'Bob Jones'] ]

# Importing without model validations

Book.import columns, values, :validate => false

Would it make sense to support a basic array import too, which would presumably be nearly as fast (no AR models), but in some cases easier to read/write?

hashes = []
hashes << {:title => 'Book1', :author => 'FooManChu'}
hashes << {:title => 'Book2', :author => 'Bob Jones'}

Book.import hashes, :validate => false

?

Or I guess the danger there is that the different hashes in the array might have different keys, messing up the bulkness of the bulk import.

Okay, so how about explicit columns, but it will still take an array of hashes, instead of an array of order-dependent arrays, as the second arg?

Book.import [:title, :author], hashes, :validate => false

Gem depends on Rails 3.0.0 release candidate

The gem seems to work with AR 3.0.0, but the gemspec depends on 'rails' '3.0.0rc' (so I don't think it would even work with rails 3.0.0 final).

I edited the gemspec to get it to work in my non-Rails app, since the documentation seemed to imply it should work without Rails, but if import is intended to work with AR3 (or even just Rails3) and not just Rails3 release candidates, it should be changed in the pushed gemspec. Thanks.

#import should dup column_names

array_of_attributes is dup'ed, but not column_names, leading to a couple cases where the array passed by the user could unexpectedly be mutated. For example, I was passing the same array for column names and the :on_duplicate_key_update option, which led to the :updated_at column name getting specified twice in the SQL statement.

Have option to import all or none records

Want to know what do you think about having an option to tell #import method to insert all or none records to the db.

This may be useful for example on rails when you are receiving bulk data from the user (e.g. large csv file which may be converted to db records) and you want to show validations errors without saving anything, and enforce all valid records in order to commit the data.

Is there any reason this is not supported yet? Thanks!

establish_connection on boot?

When activerecord-import is loaded, ActiveRecord's establish_connection seems to be called on boot (at least in dev mode), in a way that it isn't without active-record import.

I'm still trying to debug all the details. But does anyone recogonize why this might be?

Why does AR-import need to monkey patch establish_connection, as well as call it's custom ActiveRecord::Import.load_from_connection_pool in an init hook?

#import adds timestamps to input array

ActiveRecord::Base#import should not mutate (change) the array of data being imported, and there is a call to Object#dup to do this:
https://github.com/zdennis/activerecord-import/blob/master/lib/activerecord-import/import.rb#L197

Unfortunately, it doesn't seem to be working, as I experienced it adding timestamps to an array.

I haven't been able to figure out the cause, I suspect maybe ActiveRecord is being a little too helpful with its magic.

Sometimes calling dup on my dataset /before/ the call to #import avoided the problem, sometimes not; it seemed to differ between reboots of my Resque worker.

Reproduced on Ruby 1.8.7 & 1.9.2 with Rails 3.0.1 & 3.0.7.

I have written a failing test here:
jamiecobbett@7e45e23

Test output:
1) Failure:
test: #import ActiveRecord timestamps when the timestamps columns are present should not alter the input dataset(#import ActiveRecord timestamps when the timestamps columns are present)
[./test/import_test.rb:201:in test: #import ActiveRecord timestamps when the timestamps columns are present should not alter the input dataset' /home/jamie/.rvm/gems/ruby-1.8.7-p334/gems/activesupport-3.0.1/lib/active_support/testing/setup_and_teardown.rb:67:insend'
/home/jamie/.rvm/gems/ruby-1.8.7-p334/gems/activesupport-3.0.1/lib/active_support/testing/setup_and_teardown.rb:67:in run' /home/jamie/.rvm/gems/ruby-1.8.7-p334/gems/activesupport-3.0.1/lib/active_support/callbacks.rb:423:in_run_setup_callbacks'
/home/jamie/.rvm/gems/ruby-1.8.7-p334/gems/activesupport-3.0.1/lib/active_support/testing/setup_and_teardown.rb:65:in `run']:
<[["LDAP", "Big Bird", "Del Rey"]]> expected but was
<[["LDAP",
"Big Bird",
"Del Rey",
Mon May 09 13:12:57 +0000 2011,
Mon May 09 13:12:57 +0000 2011,
Mon May 09 13:12:57 +0000 2011,
Mon May 09 13:12:57 +0000 2011]]>.

97 tests, 163 assertions, 1 failures, 0 errors
rake aborted!
Command failed with status (1): [/home/jamie/.rvm/rubies/ruby-1.8.7-p334/bi...]

Associations for import

Hi,

I have two tables that look like:

products

id
name
manufacturer

prices

id
product_id
size
price

And am importing from a CSV like (from a supplier):

manufacturer,name,size,price
Wattyl,Paint,2L,20
Wattyl,Paint,10L,100
Dulux,Paint,2L,25
Dulux,Paint,10L,200
Dulux,Primer,2L,30

Which I parse into a group of Product's and their associated Price's.

Is there an easy way to import this with the associations? My problem is that I don't know how to set the product_id field without importing all of the products and retrieving them all again.

Currently I've been importing all of the products, then retrieving them all to get the id's then adding the ids to all of the prices and then importing the prices. This seems much harder than it should be!

Is there an easier way?

Jeff

SQLite3: Too many terms in compound SELECT

According to this page in the SQLite docs, the maximum number of items that can be inserted in a compound SELECT or INSERT statement is 500, so if you try to insert more records than that in the latest activerecord-import, you get the following bug:

 Failure/Error: Jobs::CreateDataset.new(:user_id => @user.to_param,
 ActiveRecord::StatementInvalid:
   SQLite3::SQLException: too many terms in compound SELECT:
     INSERT INTO "dataset_entries" ("shasum","dataset_id","created_at","updated_at")
     VALUES ('00040b66948f49c3a6c6c0977530e2014899abf9',1,
         '2013-03-04 22:28:49.262801','2013-03-04 22:28:49.263236'),
     (a few thousand more items)

It looks like you need to add something like a NO_MAX_RECORDS in addition to NO_MAX_PACKET in AbstractAdapter, and increment that record-by-record as well. If you'd like, I could probably put a pull request together, but I'm a bit swamped and it would take me quite a while to get to it.

Support for rails 3.1?

Hey,

Awesome gem. I tried to use activerecord-import on a rails 3.1RC project, only to get this (#7) issue again. Do you have any plans to support/port to 3.1, and if so, is there any sort of timeline?

Thanks!
Alexey

NoMethodError: undefined method `=' for && NoMethodError: undefined method `type_cast' for nil:NilClass

I dont know I how to put this forward when I started build the application the code was working fine but some how now it seem to drop me with the following 2 error

  1. NoMethodError: undefined method `='
  2. NoMethodError: undefined method `type_cast' for nil:NilClass (is used with validate => false)

Here the code

PartPriceRecord.import(a1,b1)

here is a1 and b1 value

a1 = ["supplier_id", "part_number", "part_description", "core_indicator", "part_price", "core_price", "fleet_price", "distributor_price", :basic_file_id]

b1 = [["Supplier0", "PartNumber0", "LoremIpsums", "Y", "1300", "1400", "12", "10", 7], ["Supplier1", "PartNumber1", "Loremipsum", nil, "10", "1300", "1400", "12", "10", 7], ["Supplier2", "PartNumber2", "Loremipsum", nil, "11", "1300", "1400", "12", "10", 7], ["Supplier3", "PartNumber3", "Loremipsum", nil, "12", "1300", "1400", "12", "10", 7], ["Supplier4", "PartNumber4", "Loremipsum", nil, "13", "1300", "1400", "12", "10", 7], ["Supplier5", "PartNumber5", "Loremipsum", "Y", "14", "1300", "1400", "12", "10", 7]]
=> [["Supplier0", "PartNumber0", "LoremIpsums", "Y", "1300", "1400", "12", "10", 7], ["Supplier1", "PartNumber1", "Loremipsum", nil, "10", "1300", "1400", "12", "10", 7], ["Supplier2", "PartNumber2", "Loremipsum", nil, "11", "1300", "1400", "12", "10", 7], ["Supplier3", "PartNumber3", "Loremipsum", nil, "12", "1300", "1400", "12", "10", 7], ["Supplier4", "PartNumber4", "Loremipsum", nil, "13", "1300", "1400", "12", "10", 7], ["Supplier5", "PartNumber5", "Loremipsum", "Y", "14", "1300", "1400", "12", "10", 7]]

Here are PartPriceRecord Attributes

 t.string :supplier_id
t.string :part_number
t.text :part_description
t.string :core_indicator
 t.float :part_price
t.float :core_price
 t.float :fleet_price
 t.float :distributor_price
 t.integer :basic_file_id
 t.timestamps(update_at,created_at)

There isn't an Active Validation for now

Also when I decide to run the below code like ruby console something like this

PartPriceRecord.import(a1,[b1[0]])

mport was successful and a single record dumped in database

but soon after that I did this

PartPriceRecord.import(a1,b1)

I see the getting inserted in the database i.e 6 which is the total count in b1

I can attach the screenshot if your interested

using activerecord-import with Rails 2.3.10

When trying to use activerecord-import with Rails 2.3.10, the gem installation installs the 3.0.3 versions of activerecord, activesupport, and activemodel. Can the minimum required version be dropped to 2.3.something?

importing with 'id' column

Is there any way to import with 'id' column? I have many big tables that have 'id' column. Right now id column is alway auto incremented even if 'id' column is set.

Requiring section of wiki to be updated for Isolated usage

In given situation, you might need to have the Import functionality for specific path with bundler having the gem not required by default.

gem 'activerecord-import', '~> 0.2.9', :require => false

This does not work if we just do:
require 'activerecord-import' in the file.

We still want to manually require though.

This can still be done using following snippet for Manual include instead of just using require 'activerecord-import':

if !ActiveRecord.const_defined?(:Import) || !ActiveRecord::Import.respond_to?(:load_from_connection_pool)
require "activerecord-import/base"
end
ActiveRecord::Import.require_adapter('mysql2')

The mysql2 is your adapter. Could be dynamically got from ActiveRecord::Base.configurations.

This allows it be used in a more isolated fashion, impacting only the file that it is needed.

Import doesn't care about serializable attributes

I think this example says it all:

class User
  serialize :properties
end

User.create!(:properties => {:property => 'value'}).properties
# => {:property => 'value'}

User.import([User.new(:properties => {:property => 'value'})])
User.last.properties
# => "propertyvalue" 

So it seems it just casts any serializable columns to string instead of serializing to yaml.

`load_from_connection': undefined method `[]' for nil:NilClass (NoMethodError)

Version 0.2.7 crashes my rails app entirely, I cannot run the console nor rake. I have oracle db ruby-oci8 and the adapter version: activerecord-oracle_enhanced-adapter =1.3.1

/var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import/base.rb:19:in load_from_connection': undefined method[]' for nil:NilClass (NoMethodError)
from /var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import.rb:15
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:36:in instance_eval' from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:36:inexecute_hook'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:43:in run_load_hooks' from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:42:ineach'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:42:in run_load_hooks' from /var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import.rb:5:inestablish_connection'
from /var/lib/gems/1.8/gems/activerecord-3.0.1/lib/active_record/connection_adapters/abstract/connection_specification.rb:80:in establish_connection_without_activerecord_import' from /var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import.rb:4:inestablish_connection'
from /var/lib/gems/1.8/gems/activerecord-3.0.1/lib/active_record/connection_adapters/abstract/connection_specification.rb:60:in establish_connection_without_activerecord_import' from /var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import.rb:4:inestablish_connection'
from /var/lib/gems/1.8/gems/activerecord-3.0.1/lib/active_record/connection_adapters/abstract/connection_specification.rb:55:in establish_connection_without_activerecord_import' from /var/lib/gems/1.8/gems/activerecord-import-0.2.7/lib/activerecord-import.rb:4:inestablish_connection'
from /var/lib/gems/1.8/gems/activerecord-3.0.1/lib/active_record/railtie.rb:59
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:36:in instance_eval' from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:36:inexecute_hook'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/lazy_load_hooks.rb:26:in on_load' from /var/lib/gems/1.8/gems/activerecord-3.0.1/lib/active_record/railtie.rb:57 from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/initializable.rb:25:ininstance_exec'
from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/initializable.rb:25:in run' from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/initializable.rb:50:inrun_initializers'
from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/initializable.rb:49:in each' from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/initializable.rb:49:inrun_initializers'
from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/application.rb:134:in initialize!' from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/application.rb:77:insend'
from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/application.rb:77:in method_missing' from /home/rails3/weblive/wwwroot/rails/alumni/config/environment.rb:5 from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:239:inrequire'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:239:in require' from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:225:inload_dependency'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:591:in new_constants_in' from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:225:inload_dependency'
from /var/lib/gems/1.8/gems/activesupport-3.0.1/lib/active_support/dependencies.rb:239:in require' from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/application.rb:103:inrequire_environment!'
from /var/lib/gems/1.8/gems/railties-3.0.1/lib/rails/commands.rb:22
from script/rails:6:in `require'
from script/rails:6

Problems with default_scope in 0.3.0

Upgrading from 0.2.11, I'm running into an issue importing a model with a default_scope. Specifically, all of my models have a default_scope of the logged in user's organization_id.

With 0.3.0, I get the following whenever I try an import:

ActiveRecord::StatementInvalid:
  PG::Error: ERROR:  column "organization_id" specified more than once
  LINE 1: ...ernal_id","depth","property_id","organization_id","organizat...
                                                               ^
  : INSERT INTO "categories" ("id","name","parent_id","created_at","updated_at","external_id","depth","property_id","organization_id","organization_id") VALUES (nextval('categories_id_seq'),'v2',NULL,'2013-02-11 22:31:39.347565','2013-02-11 22:31:39.347580','v2',NULL,5,3,3) RETURNING "id"

unable to create db

If my database is already created/migrated, and I add this gem to my Gemfile, everything is fine. If however, I rake db:drop; rake db:create with this gem in my Gemfile I get the error "Unknown database 'myproject_dev'. Removing this gem from Gemfile and re-running bundle install, fixes the issue.

It seems like a class loading issue caused by this gem. I get a similar error if I set config.cache_classes = true in development.rb. Hopefully this issue can be resolved, because I like the solution. Thanks.

Attaching stack trace:
rake db:create --trace
rake aborted!
Unknown database 'myproject_dev'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/mysql_adapter.rb:600:in real_connect' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/mysql_adapter.rb:600:inconnect_without_redhillonrails_core'
.../gems/redhillonrails_core-1.0.8/lib/red_hill_consulting/core/active_record/connection_adapters/mysql_adapter.rb:11:in connect' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/mysql_adapter.rb:164:ininitialize'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/mysql_adapter.rb:36:in new' ../gems/activerecord-3.0.3/lib/active_record/connection_adapters/mysql_adapter.rb:36:inmysql_connection'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:228:in send' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:228:innew_connection'
../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:236:in checkout_new_connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:190:incheckout'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:186:in loop' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:186:incheckout'
~/.rvm/rubies/ruby-1.8.7-p174/lib/ruby/1.8/monitor.rb:242:in synchronize' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:185:incheckout'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:93:in connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:316:inretrieve_connection'
.../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_specification.rb:97:in retrieve_connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_specification.rb:89:inconnection'
.../gems/activerecord-import-0.2.3/lib/activerecord-import.rb:5:in establish_connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_specification.rb:80:inestablish_connection_without_activerecord_import'
.../gems/activerecord-import-0.2.3/lib/activerecord-import.rb:4:in establish_connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_specification.rb:60:inestablish_connection_without_activerecord_import'
.../gems/activerecord-import-0.2.3/lib/activerecord-import.rb:4:in establish_connection' .../gems/activerecord-3.0.3/lib/active_record/connection_adapters/abstract/connection_specification.rb:55:inestablish_connection_without_activerecord_import'
.../gems/activerecord-import-0.2.3/lib/activerecord-import.rb:4:in establish_connection' .../gems/activerecord-3.0.3/lib/active_record/railtie.rb:59 ~/.rvm/gems/ruby-1.8.7-p174@trakstar/gems/activesupport-3.0.3/lib/active_support/lazy_load_hooks.rb:36:ininstance_eval'
.../gems/activesupport-3.0.3/lib/active_support/lazy_load_hooks.rb:36:in execute_hook' .../gems/activesupport-3.0.3/lib/active_support/lazy_load_hooks.rb:26:inon_load'
.../gems/activerecord-3.0.3/lib/active_record/railtie.rb:57
.../gems/railties-3.0.3/lib/rails/initializable.rb:25:in instance_exec' .../gems/railties-3.0.3/lib/rails/initializable.rb:25:inrun'
.../gems/railties-3.0.3/lib/rails/initializable.rb:50:in run_initializers' .../gems/railties-3.0.3/lib/rails/initializable.rb:49:ineach'
.../gems/railties-3.0.3/lib/rails/initializable.rb:49:in run_initializers' .../gems/railties-3.0.3/lib/rails/application.rb:134:ininitialize!'
.../gems/railties-3.0.3/lib/rails/application.rb:77:in send' .../gems/railties-3.0.3/lib/rails/application.rb:77:inmethod_missing'
./config/environment.rb:5
...

gem license

Hi
I want to use the gem in a commercial project and not sure about the implications.
is there a special reason why the license is 'ruby license' ?
can you distribute it in dual mode with MIT license as well?

thanks,
d.

inserted id

How can i get the inserted ids of my array object?
Can i do something like that:
ids = Entry.import entries

The gem published went with wrong code for sqlite version

def supports_import?(current_version=self.sqlite_version)
    minimum_supported_version = "3.2.11"   # <<<<<<< 3.7.11 right?
    if current_version >= minimum_supported_version
        true
    else
        false
    end
end

In repository the version to compare is "3.7.11"

:has_many support

I've tried out activerecord-import doesn't seem to support has_many relationships. Is this feature not supported or am I missing something?

Thanks.

Activerecord-import is not working for Mysql If sqlserver 2005 is included in application

Hi Zdennis,

I am using two DB in my application one is sql server 2005 and the other is mysql. and I want to use activerecord-import for mysql but it is throwing me below error.

no such file to load -- /usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import/active_record/adapters/sqlserver_adapter
(in /usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11)
rake aborted!

And if I add trace to it.

no such file to load -- /usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import/active_record/adapters/sqlserver_adapter
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `block in require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:236:in `load_dependency'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import/base.rb:11:in `require_adapter'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import/base.rb:16:in `load_from_connection_pool'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import.rb:15:in `block in <top (required)>'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:36:in `instance_eval'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:36:in `execute_hook'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:43:in `block in run_load_hooks'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:42:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:42:in `run_load_hooks'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-import-0.2.11/lib/activerecord-import.rb:5:in `establish_connection_with_activerecord_import'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-3.2.0/lib/active_record/railtie.rb:76:in `block (2 levels) in <class:Railtie>'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:36:in `instance_eval'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:36:in `execute_hook'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/lazy_load_hooks.rb:26:in `on_load'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activerecord-3.2.0/lib/active_record/railtie.rb:74:in `block in <class:Railtie>'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/initializable.rb:30:in `instance_exec'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/initializable.rb:30:in `run'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/initializable.rb:55:in `block in run_initializers'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/initializable.rb:54:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/initializable.rb:54:in `run_initializers'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/application.rb:136:in `initialize!'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/railtie/configurable.rb:30:in `method_missing'
/mnt/hgfs/nuscentralbilling/config/environment.rb:5:in `<top (required)>'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `block in require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:236:in `load_dependency'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/activesupport-3.2.0/lib/active_support/dependencies.rb:251:in `require'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/application.rb:103:in `require_environment!'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/railties-3.2.0/lib/rails/application.rb:292:in `block (2 levels) in initialize_tasks'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:176:in `block in invoke_prerequisites'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:174:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:174:in `invoke_prerequisites'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:157:in `block in invoke_with_call_chain'
/usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/resque-scheduler-2.0.0/lib/resque_scheduler/tasks.rb:30:in `block (2 levels) in <top (required)>'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `call'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:205:in `block in execute'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:200:in `execute'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:176:in `block in invoke_prerequisites'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:174:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:174:in `invoke_prerequisites'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:157:in `block in invoke_with_call_chain'
/usr/local/rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/task.rb:144:in `invoke'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:116:in `invoke_task'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block (2 levels) in top_level'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `each'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:94:in `block in top_level'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:88:in `top_level'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:66:in `block in run'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:133:in `standard_exception_handling'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/lib/rake/application.rb:63:in `run'
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/rake-0.9.2.2/bin/rake:33:in `<top (required)>'
/usr/local/rvm/gems/ruby-1.9.2-p290/bin/rake:19:in `load'
/usr/local/rvm/gems/ruby-1.9.2-p290/bin/rake:19:in `<main>'
Tasks: TOP => resque:setup => environment

Regards,

Vishakha

Mysql2::Error: Column 'column_name' specified twice

I appears that when using :synchronize_keys column names are listed twice.

Gemfile.lock

...
  activerecord-import (0.3.0)
    activerecord (~> 3.0)
    activerecord (~> 3.0)
...

In the model

# Messages are polymorphic to source

import messages, :synchronize => messages, :synchronize_keys => [:source_id, :source_type], :on_duplicate_key_update => { :action_count => :action_count, :last_updated_at => :updated_at }

MySQL Statement that generated the error

INSERT INTO `messages` (`id`,`source_id`,`tweet_id`,`author`,`body`,`action_count`,`read`,`posted_at`,`created_at`,`updated_at`,`source_type`,`status`,`mentioned_id`,`internal_origin`,`auto_labeled`,`last_updated_at`,`source_id`,`source_type`)
VALUES (NULL,17,'772',NULL,NULL,NULL,0,NULL,'2013-02-08 21:57:36','2013-02-08 21:57:36','TwitterAccount','active',NULL,0,0,NULL,17,'TwitterAccount')
ON DUPLICATE KEY UPDATE `messages`.`action_count`=VALUES( `action_count` ),`messages`.`last_updated_at`=VALUES( `updated_at` ),`messages`.`updated_at`=VALUES( `updated_at` )

Newest version breaks Rails < 3.1

Our application uses Rails 3.0.20. However, a commit on Sept. 20 breaks earlier Rails 3 applications by its use of scope_attributes. This method was introduced in Rails 3.1 and will break earlier versions.

There is also the use of "dump" in commit 1ae5122 on Dec. 14 - I don't know where this method is coming from but it does not appear to be part of the hash that serialized_attributes returns in Rails 3.0. Not sure if it's part of Rails 3.1 or some other gem.

Output SQL as string

It would be nice if there was a method like import_to_sql or something that would return the SQL string and not run the command against the database.

I'm not sure if this is the case, but I have a hunch that it would be less overhead for Ruby to output the SQL string to a file so that I can use mysql < the_outputted.sql to import the file.

Thoughts?

max_allowed_packet

The code from ar-extensions for max_allowed_packet: https://github.com/zdennis/ar-extensions/blob/master/ar-extensions/lib/ar-extensions/adapters/mysql.rb#L4
seems to be missing in activerecord-import. Instead there's a reference to it: https://github.com/zdennis/activerecord-import/blob/master/lib/activerecord-import/adapters/abstract_adapter.rb#L121 which always returns 0.

Is the plan to remove this function, or eventually implement it? Is there really no max allowed packet limit on any database anymore?

Import with has_many associations

Hi,
is there a way to import records with has_many associations set.

For example

u = User.new(name: 'John')
u.addresses << Address.new(city: 'NY')

User.import [u]

In this case, addresses are ignored and only user is inserted.

Is there a way to insert addresses also, since I need to import large number of users with associations?

WARNING: Can't mass-assign protected attributes: id

Using the Array version of import, i.e.

posts = [ BlogPost.new :author_name=>'Zach Dennis', :title=>'AREXT',
BlogPost.new :author_name=>'Zach Dennis', :title=>'AREXT2',
BlogPost.new :author_name=>'Zach Dennis', :title=>'AREXT3' ]
BlogPost.import posts

The :id attribute should be removed from the attribute_array prior to insert.

Support REPLACE and INSERT IGNORE

The :on_duplicate_key_update option is fine when only parts of existing records need to change. It would be helpful to also support the native REPLACE syntax directly when the whole record needs to be replaced, and INSERT IGNORE when existing records should remain untouched.

Ideas for syntax?

:on_duplicate_key => :ignore

:on_duplicate_key => :replace

On MySQL, "SHOW VARIABLES like 'max_allowed_packet';" is called all the time

hi,

Every time you run BlogPost.import (for example) a query goes out:

SHOW VARIABLES like 'max_allowed_packet';

which is because this method gets called:

module ActiveRecord::Import::MysqlAdapter
  # Returns the maximum number of bytes that the server will allow
  # in a single packet
  def max_allowed_packet # :nodoc:
    result = execute( "SHOW VARIABLES like 'max_allowed_packet';" )
    # original Mysql gem responds to #fetch_row while Mysql2 responds to #first
    val = result.respond_to?(:fetch_row) ? result.fetch_row[1] : result.first[1]
    val.to_i
  end

Do you want a pull request that memoizes this?

Thanks for your work!

Breaks with `can't convert Time into String` error

Here is full log:

ciembor@ciembor ~/p/App> rake investors:seed
rake aborted!
can't convert Time into String
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:355:in `block (2 levels) in add_special_rails_stamps'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:355:in `each'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:355:in `block in add_special_rails_stamps'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:347:in `each_pair'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:347:in `add_special_rails_stamps'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/activerecord-import-0.3.1/lib/activerecord-import/import.rb:215:in `import'
/home/ciembor/projekty/App/lib/tasks/investors.rake:61:in `block (2 levels) in <top (required)>'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:246:in `call'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:246:in `block in execute'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:241:in `each'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:241:in `execute'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:184:in `block in invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:177:in `invoke_with_call_chain'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/task.rb:170:in `invoke'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:143:in `invoke_task'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:101:in `block (2 levels) in top_level'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:101:in `each'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:101:in `block in top_level'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:110:in `run_with_threads'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:95:in `top_level'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:73:in `block in run'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:160:in `standard_exception_handling'
/usr/local/rvm/gems/ruby-1.9.3-p392/gems/rake-10.0.4/lib/rake/application.rb:70:in `run'
Tasks: TOP => investors:seed
(See full trace by running task with --trace)

and here is part of my code:

investors = ['A', 'B', 'C', 'D']
Investor.import([:name], investors, validation: false)

Rails 3.2.12
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux]

Failing datetime tests

I'm trying to verify and fix an unrelated bug.

I tried running the tests for mysql and sqlite3, and got these 2 errors:

  1) Error:
test: #import importing a datetime field should import a date with MM/DD/YYYY format just fine(#<Class:0x000001016fc910>):
ArgumentError: invalid date
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/core_ext/string/conversions.rb:44:in `to_date'
    test/import_test.rb:239:in `block (3 levels) in <top (required)>'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/testing/setup_and_teardown.rb:35:in `block in run'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/callbacks.rb:418:in `_run_setup_callbacks'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/testing/setup_and_teardown.rb:34:in `run'

  2) Error:
test: #import importing a datetime field should import a date with YYYY/MM/DD format just fine(#<Class:0x000001016fc910>):
ArgumentError: invalid date
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/core_ext/string/conversions.rb:44:in `to_date'
    test/import_test.rb:244:in `block (3 levels) in <top (required)>'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/testing/setup_and_teardown.rb:35:in `block in run'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/callbacks.rb:418:in `_run_setup_callbacks'
    /Users/jamie/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.0/lib/active_support/testing/setup_and_teardown.rb:34:in `run'

Trying this in Rails console gives a "ArgumentError: invalid date":
"05/14/2010".to_date

I'm in the UK, so maybe there's some difference in locale setting that means it passes for you?

Ruby 1.9.2, Rails 3.0.1, bundler 1.0.11, CentOS & OS X.

Synchronize doesn't seem to be working

I'm attempting to use AR-I to update existing columns in a database, though they synch option doesn't seem to work for me, when i run this:

user = User.first
columns = [:name]
values = [["foo"]]
User.import columns, values, :synchronize=> [user]

I get an error

ArgumentError: Synchronization needs a mutex. Supply an options hash with a :with key as the last argument (e.g. synchronize :hello, :with => :@mutex).

I tried modifying Import.rb to pass in a mutex, according to ( http://rails.rubyonrails.org/classes/Module.html ) but still couldn't get it to work.

Update wiki to say "requires Rails 3" rather than that its compatible with Rails 3

The description in the wiki for activerecord-import didn't immediately convey to me that it's not just compatible with Rails 3, but requires it. Perhaps that could be made plain in the README. I mistakenly tried to use it with a Rails 2.3 project and scratched my head for a few minutes before realized I needed to use the older ar-extensions gem until this project is updated to Rails 3.

Caching validation data? Possible

Hi,

I've just started using AR-import, and it's a fantastic gem. However something is unclear to me.
It seems that its running over and validating all the records before inserting them in one giant statement (this is great, I love that I'm still getting validations happening).

However validating the record is causing AR to perform heaps of SELECTs from the database (obviously to verify the relationships exist). This is fine too, but in my case it's largely the same data.

"SELECT TOP (1) [categories].* FROM [categories] WHERE [categories].[id] = 10"

Hundreds of times as all these records will belong to the same category by default. Is there anyway to tell AR-import to 1) cache the data it has, or 2) specify the values possible for relationships/validations?.

I guess I could just turn of validations and this would speed it up, but I'd like to keep validations happening if possible.

OracleEnhancedAdapter connection.instance_variable_get :@config

it seems gem is nt working with activerecord oracleEnhanced Adapter.

Im getting a nil exception in self.load_from_connection(connection), cause @config instance variable is nt defined in connection class, which leads to a nil exception in:
require_adapter config[:adapter]

My question: There is no special Import oracle Adapter at all in gem, is it save to just skip requiring a special one for oracle, or is a db specific adapter a must for your gem to work?

Add sqlite support with transactions

http://www.sqlite.org/speed.html
shows that sqlite inserts are massively improved for speed by enclosing them in a transaction

Could import support for sqlite be added by using the pre_sql_statements and post_sql_statements methods?

Rails uses transactions internally so BEGIN..COMMIT may not work but sqlite can nest transactions using the SAVEPOINT and RELEASE commands. http://www.sqlite.org/lang_savepoint.html

Rails does this internally with
Model.connection.transaction(:requires_new => true) do # CREATE SAVEPOINT active_record_1
#operations

active_record_1 now automatically released

end # RELEASE SAVEPOINT active_record_1
(from http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html )

Over an import of 108466 records in batches of 5000, wrapping my import statement in a transaction decreased processing time from 1584 seconds to 834 seconds. A saving of nearly 50%.

Error "cannot load em_postgresql_adapter" when using gem em-postgresql-adapter

my gemfile:

gem 'pg', '>=0.14.0'
gem 'em-postgresql-adapter', :git => 'git://github.com/leftbee/em-postgresql-adapter.git'
gem 'em-synchrony', :git     => 'git://github.com/igrigorik/em-synchrony.git',
    :require => ['em-synchrony','em-synchrony/activerecord', 'em-synchrony/em-http']
gem 'rack-fiber_pool',  :require => 'rack/fiber_pool'
gem 'activerecord-import', '>= 0.3.0'

database.yml:

  database: sqoffers_development
  adapter: em_postgresql
  pool: 20
  connections: 20
  encoding: unicode
  port: 5432
  host: 127.0.0.1
  username: user
  password: 123

Error when start application:

Uncaught exception: cannot load such file -- /home/charger/.rvm/gems/ruby-1.9.3-p374@global/gems/activerecord-import-0.3.0/lib/activerecord-import/active_record/adapters/em_postgresql_adapter
    /home/charger/.rvm/gems/ruby-1.9.3-p374@global/gems/activesupport-3.2.11/lib/active_support/dependencies.rb:251:in `require'
    /home/charger/.rvm/gems/ruby-1.9.3-p374@global/gems/activesupport-3.2.11/lib/active_support/dependencies.rb:251:in `block in require'
    /home/charger/.rvm/gems/ruby-1.9.3-p374@global/gems/activesupport-3.2.11/lib/active_support/dependencies.rb:236:in `load_dependency'
    /home/charger/.rvm/gems/ruby-1.9.3-p374@global/gems/activesupport-3.2.11/lib/active_support/dependencies.rb:251:in `require'
...

how fix this?

Model callbacks not being called

I've had some problem trying to get this gem to work with my app. Eventually I figured my models callbacks were not being called. Then I noticed in the source that import.rb that "It does not utilize the ActiveRecord::Callbacks during creation/modification while performing the import."

I think this should be mentioned in the wiki documentation.

Is there some way round this - other than not using activerecord-import, or modifying my model ? Could it be made optional, like validation ?

Problems with Unicode Characters?

I am getting an activerecod error trying to process a file with this line:

columns = [ :id, :country_id, :region_id, :name ]
values = [ [285,10,122,"Arrufó"] ]
City.import columns, values, :validate => false

ActiveRecord::ActiveRecordError: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''Arruf���' at line 1: INSERT INTO cities (id,country_id,region_id,name) VALUES(285,10,122,'Arrufó')

But when I try to run this line via the command prompt, no problem:

mysql> INSERT INTO cities (id,country_id,region_id,name) VALUES(285,10,122,'Arrufó');
Query OK, 1 row affected (0.02 sec)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.