Code Monkey home page Code Monkey logo

bulk_insert's People

Contributors

bencolon avatar dave-slash-null avatar glaucocustodio avatar huacnlee avatar jamis avatar jfiander avatar knu avatar lukeledet avatar mberlanda avatar mehdi-farsi avatar mstruve avatar renspr avatar sobstel avatar therusskiy avatar thoughtafter avatar varyform avatar zjohl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bulk_insert's Issues

before_create not working, Is there any solution?

rails: 6.0
ruby: 2.7.2
bulk_insert: 1.8.1

used

WrittenPadQuestion.bulk_insert(.....) do |worker|
   .....
   .....
   .....
end

app/models/questions.rb

xxxxs.rb:

belongs_to :original_question, optional: true

has_many :question_tag_relations, dependent: :destroy
has_many :tags, through: :question_tag_relations

before_create :setup_tags

private

def setup_tags
  self.tag_ids = original_question.tag_ids
end

before_create not working, Is there any solution?

Update duplicates option currently updates all columns which may not be desirable

The current implementation of update_duplicates is such that it updates all columns if there is a conflict during the insert phase. This is not always desirable. A classic case is the created_at.

If an insert fails, and we want the update to happen instead we would typically want created_at to keep its existing value. An option to control this behaviour would be great.

POC: Use Ruby inherited callback instead InsertBulk module

Firstly, I really like your project ! Really helpful 👍

I would suggest you to move the code inside InsertBulk module ActiveRecord::Base by using inherited callback.

  • That permit to don't pollute the looong ancestor chain of ActiveRecord::Base.
  • ActiveRecord Models already inherit from ActiveRecord::Base.
  • You have access to the model's name via the inherited callback argument.

I've coded a POC there: inherited_callback

I've decided to create a bulk_insert per Model. But it's possible to create the method only once in ActiveRecord::Base

module ActiveRecord
  class Base
    def self.inherited(child_class)
      super
      return if methods.include?(:insert_bulk)
      ...
    end
  end
end

So, you have the same flexibility and behaviour with less constraints.
What do you think about this design ? Maybe I missed something ? :)

Bulk Insert Concurrency

When trying to insert 60k records in a go , the total results in the database is 60k * 7

ActiveRecord::Base.establish_connection(adapter: 'sqlserver', host: "xxx", username: "xxx", password: "xxx", database: "xxx", azure: true, port: 1433, timeout: 5000)

class Report < ActiveRecord::Base
  self.primary_key = 'id'
end

my_array = [] #count of 100000 records
Report.bulk_insert(:account_owner_id) do |worker|
  my_array.each do |arr|
    worker.add account_owner_id: arr[0]
  end
end

This issue occurs when I add this to a cron/scheduler. This also occurs when the table to which insertion happens have a lot of data

Rails dependency indirectly required

I might be wrong but on bulk_insert/lib/bulk_insert/worker.rb:67 you seem to be requiring rails instead of activerecord and that, when saving a worker causes:

*** NameError Exception: uninitialized constant BulkInsert::Worker::Rails

I discovered it while using the gem in a project where I only use ActiveRecord.
That can be bypassed if a module Rails that respond to version is defined inside the project

Result returned from INSERT is nil

Upgrading from v1.6.0 to v1.7.0, I'm seeing the following failure when trying to call save! on a bulk worker (using mysql2 v0.4.5 database connector against mysql v5.7.23 server):

D, [2018-10-11T09:17:07.739444 #72035] DEBUG -- : 2018-10-11 09:17:07.734938 D [72035:70195788170560] (0.727ms) ActiveRecord::Base -- SQL -- {:sql=>"INSERT  INTO `recent_files` (`repo_id`,`md5`,`size`,`created_at`,`updated_at`) VALUES (1,'ae3e83e2fab3a7d8683d8eefabd1e74d',3,'2018-10-11 16:17:07','2018-10-11 16:17:07')"}
E, [2018-10-11T09:17:07.740832 #72035] ERROR -- : 2018-10-11 09:17:07.735079 E [72035:70195788170560 bulk_insert_thread.rb:29] BulkInsertThread -- Exception: NoMethodError: undefined method `fields' for nil:NilClass
.../gems/activerecord-4.2.10/lib/active_record/connection_adapters/mysql2_adapter.rb:222:in `exec_query'
.../gems/bulk_insert-1.7.0/lib/bulk_insert/worker.rb:92:in `execute_query'
.../gems/bulk_insert-1.7.0/lib/bulk_insert/worker.rb:82:in `save!'
.../lib/modules/bulk_insert_thread.rb:27:in `block in initialize'

Make it work properly without *_id when it's a relation

When not using *_id on a relation when building the attr hash, .bulk_insert runs without errors, but no records are inserted.

Proposal:

  1. Take advantage of AR to discover if it's a relation and properly format with *_id when building the SQL
  2. Throw an error if something goes wrong

Incorrect default value inserted for JSON column types

In case of a JSON column, bulk_insert will incorrectly insert the default value. It inserts "{}" (ie. a string) instead of {} (ie. an object).

Here's a failing test:

begin
  require "bundler/inline"
rescue LoadError => e
  $stderr.puts "Bundler version 1.10 or later is required. Please update your Bundler"
  raise e
end

gemfile(true) do
  source "https://rubygems.org"
  gem "rails"
  gem "sqlite3"
  gem "bulk_insert"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :profiles, force: true do |t|
    t.string :name
    t.json :json_data, default: {}, null: false
  end
end

class Profile < ActiveRecord::Base
end

class BugTest < Minitest::Test
  def test_bulk_insert_json_default
    worker = Profile.bulk_insert
    worker.add(name: "Foo", json_data: {})
    worker.add(name: "Bar")
    worker.save!

    profile1 = Profile.find_by!(name: "Foo")
    profile2 = Profile.find_by!(name: "Bar")
    assert_equal profile1.json_data, {}
    assert_equal profile2.json_data, {}
  end
end

The reason is the following code:

mapped = @columns.map.with_index do |column, index|
value_exists = values.is_a?(Hash) ? values.key?(column.name) : (index < values.length)
if !value_exists
if column.default.present?
column.default
elsif column.name == "created_at" || column.name == "updated_at"
:__timestamp_placeholder
else
nil
end
else
values.is_a?(Hash) ? values[column.name] : values[index]
end
end

In case a value does not exist, the column default is used. The column default in rails is expressed as a string (ie. in the example above: Profile.columns_hash["json_data"].default == {}). This is problematic for JSON columns, as both "{}" and {} are valid JSON (one a string, the other an object).

Falsey column defaults not used

Column default values like '' or false aren't used when a value is omitted for that column.

For example, if I have a table foo with a column bar that has a default value of '', and I don't provide a value for bar in my call to bulk_insert, the newly-inserted row will have a bar value of NULL, and not '' like I expect.

The issue seems to be the use of #present? to determine a column has a default value:

if column.default.present?

counter_cache values are not updated

It seems like counter_cache columns aren't updated after a bulk insert. What's the best way to remedy this?

It would be bad to have to run reset_counter on every record that's affected.

Error when default column names are different

I am using bulk insert in Rails Table.
My id column is named as PRODUCT_ID and created_at and updated_at are respectively RECORD_CREATE_DATE and RECORD_UPDATE_DATE.

The bulk insert takes these values as NULL and I get an error because these fields have not null constraint.

How do I solve these?

CI: fix ruby 2.7 build

Make sure that ruby 2.7 builds can run.
Currently there is a minor issue with bundler installation in travis:

The command "eval bundle install --jobs=3 --retry=3 --deployment " failed. Retrying, 2 of 3.

Traceback (most recent call last):

	4: from /home/travis/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `<main>'

	3: from /home/travis/.rvm/gems/ruby-2.7.1/bin/ruby_executable_hooks:24:in `eval'

	2: from /home/travis/.rvm/gems/ruby-2.7.1/bin/bundle:23:in `<main>'

	1: from /home/travis/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/rubygems.rb:294:in `activate_bin_path'

/home/travis/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/rubygems.rb:275:in `find_spec_for_exe': Could not find 'bundler' (1.17.2) required by your /home/travis/build/jamis/bulk_insert/Gemfile.lock. (Gem::GemNotFoundException)

To update to the latest version installed on your system, run `bundle update --bundler`.

To install the missing version, run `gem install bundler:1.17.2`

https://travis-ci.com/github/jamis/bulk_insert/jobs/337238175

NoMethodError: undefined method 'result_sets' for nil:NilClass

Hello there!

I just updated to latest version v1.7.0 because I'm trying to use Return Primary Keys feature.

So, I just wrote down:

while my_condition
  packs << [some_value, another_value, Time.now, Time.now]
  # decrement my_condition
end

destination_columns = [:column1, :column2, :created_at, :updated_at]
worker = Pack.bulk_insert(*destination_columns, return_primary_keys: true) { |worker|
  packs.each { |pack|
    worker.add pack
  }
  # That's how I use bulk_insert everywhere
}

I'm doing that because I need to associate these packs with another model. Sometime I need to write pack.my_model_associated << [ids] or just something like that. I do not know if it is possible not using Return Primary Keys feature. I was thinking about something like:

worker.result_sets.each { |pack|
  pack.my_model_associated << [1, 2, 3, 4]
  pack.save!
}

I know worker.result_sets is an ActiveRecord::Result and does not contain each method. But like I said, it is just an idea. I was going to code something like that.
Anyway, I get the error:
NoMethodError: undefined method 'result_sets' for nil:NilClass, because my worker is nil.

I cloned the repository and modified line 22 in lib/bulk_insert.rb. I returned worker instead of nil even on block_given? is true and the error stopped happening. But, I really don't know why I did that and I was not confident, so I did not submit a pull request or something else (I'm not an expert ruby developer). I just know worker.result_sets was not nil anymore. Although I couldn't do what I wanted (bulk_insert packs and relate all them with my_model_associated quickly).

Could you help me?

Thanks.

Drop support for Ruby < 2.4

Ruby 2.4 has recently reached its EOL.
All previous versions can be safely considered unsupported

  • Update CI
  • Update .gemspec

bulk_insert does not return a worker object

I am using version 1.8.1, ruby 2.3.8, rails 4.2.5
I am trying to use non-block mode, passing an array of attribute hashes. My model is named Gap:

worker = Gap.bulk_insert(return_primary_keys: true)

however, bulk_insert returns nil. (block mode also returns nil).
Looking at the code it looks like it only returns a worker if bulk_insert is only called without values or a block:

    def bulk_insert(*columns, values: nil, set_size:500, ignore: false, update_duplicates: false, return_primary_keys: false)
      columns = default_bulk_columns if columns.empty?
      worker = BulkInsert::Worker.new(connection, table_name, primary_key, columns, set_size, ignore, update_duplicates, return_primary_keys)

      if values.present?
        transaction do
          worker.add_all(values)
          worker.save!
        end
        nil
      elsif block_given?
        transaction do
          yield worker
          worker.save!
        end
        nil
      else
        worker
      end
    end

How would I access the worker object in order to look at result_sets?
thanks, fw

Support for psql ON CONFLICT

Would it be possible to extend the ignore: true option for PostgreSQL adapter as well?
When I added it to my Rails application, I got the following error:

ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR:  syntax error at or near "IGNORE"
LINE 1: INSERT IGNORE INTO "table_name" ("date","i...

As a suggestion/benchmark, you can find below the way it was implemented in activerecord-import gem:

# postgresql_adapter.rb
# ...
  def post_sql_statements( table_name, options ) # :nodoc:
    sql = []

    if supports_on_duplicate_key_update?
      # Options :recursive and :on_duplicate_key_ignore are mutually exclusive
      if (options[:ignore] || options[:on_duplicate_key_ignore]) && !options[:on_duplicate_key_update] && !options[:recursive]
        sql << sql_for_on_duplicate_key_ignore( table_name, options[:on_duplicate_key_ignore] )
      end
    elsif options[:on_duplicate_key_ignore] && !options[:on_duplicate_key_update]
      logger.warn "Ignoring on_duplicate_key_ignore because it is not supported by the database."
    end

    sql += super(table_name, options)

    unless options[:no_returning] || options[:primary_key].blank?
      primary_key = Array(options[:primary_key])
      sql << " RETURNING \"#{primary_key.join('", "')}\""
    end

    sql
  end
  def sql_for_on_duplicate_key_ignore( table_name, *args ) # :nodoc:
    arg = args.first
    conflict_target = sql_for_conflict_target( arg ) if arg.is_a?( Hash )
    " ON CONFLICT #{conflict_target}DO NOTHING"
  end

I haven't found any CONTRIBUTING guideline in your project. Thank you

bulk_insert version 1.8.1 doesn't play nice with activerecord and mysql2

I have created a test project here: https://github.com/grdw/bulk_insert_bug

When cloning said project and running ruby main.rb the following bug appears:

Traceback (most recent call last):
	23: from main.rb:18:in `<main>'
	22: from /Users/gerard/.ruby/gems/bulk_insert-1.8.1/lib/bulk_insert.rb:18:in `bulk_insert'
/Users/gerard/.ruby/gems/bulk_insert-1.8.1/lib/bulk_insert/worker.rb:112:in `block (2 levels) in compose_insert_query': 
undefined method `type_cast_from_column' for #<ActiveRecord::ConnectionAdapters::Mysql2Adapter:0x00007ffc9488e788> (NoMethodError)

It seems something has changed from ActiveRecord 6.0.3.4 to 6.1.0.

Not null contraint but PARAM exists

Hello, I am trying to use bulk_insert on a model whose not null constraint is ID, the problema is I have an object with ID on it's value

image

But when I try to use bulk_insert on it:

image

My schema.rb of Tweet

image

Any ideas how to fix this?

Abstract database-specific statements

I think that library structure would probably benefit from having database-specific behaviour abstracted to some sort of statement adapters. Currently it's heavily relying on ifs which is not really easy to maintain or modify.

It might be also good to consider keyword arguments. It's always easy mix an order of long list of arguments.

Sample code ans structure below.

module BulkInsert
  module StatementAdapters
    class GenericAdapter
      def insert_ignore_statement
      end

      def on_conflict_ignore_statement
      end

      def on_conflict_update_statement
      end

      def primary_key_return_statement
      end
    end
  end
end
require 'bulk_insert/statement_adapters/generic_adapter'
require 'bulk_insert/statement_adapters/mysql_adapter'
require 'bulk_insert/statement_adapters/postgresql_adapter'
require 'bulk_insert/statement_adapters/sqlite_adapter'

module StatementAdapter
  def self.for(connection)
    case connection.adapter_name
    when /^mysql/i
      MySQLAdapter.new
    when /\APost(?:greSQL|GIS)/i
      PostgreSQLAdapter.new
    when /\ASQLite/i
      SQLiteAdapter.new
    else
      GenericAdapter.new
    end
  end
end

screenshot 2018-11-22 at 19 07 28

screenshot 2018-11-22 at 19 07 47

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.