Code Monkey home page Code Monkey logo

Comments (5)

jkowens avatar jkowens commented on August 15, 2024 1

I would expect it to perform about the same doing each_slice(1000) as providing batch_size: 1000. It would be interesting to see some benchmarks tho.

from activerecord-import.

jkowens avatar jkowens commented on August 15, 2024

Yeah, the built in batch_size option is not efficient. You already have all 500K records in memory at that point. It then tries to slice it down to batches. I recommend managing batching yourself so a smaller number of models are initialized. I would batch inserts into at least 1K, if not more.

from activerecord-import.

james-em avatar james-em commented on August 15, 2024

Yeah, the built in batch_size option is not efficient. You already have all 500K records in memory at that point. It then tries to slice it down to batches. I recommend managing batching yourself so a smaller number of models are initialized. I would batch inserts into at least 1K, if not more.

Thanks for the quick reply! It's indeed a valid suggestion!

Just wondering as curiosity, when you say "batch_size" option is not efficient, would you say that doing

records = 500000.times.map { |i| MyModel.new(id: i, name: i.to_s) }

# Doing this
records.each_slice(1000).to_a.each do |slice|
  MyModel.import! slice, validate: false, on_duplicate_key_update: :all # Import to Postgres DB
end

# Is more efficient than doing
MyModel.import! records, validate: false, on_duplicate_key_update: :all, batch_size: 1000 # Import to Postgres DB

Would the slice way be more efficient than simply passing everything at once with the batch_size option? In this case both 500K records are pre-initialized

from activerecord-import.

jkowens avatar jkowens commented on August 15, 2024

Yeah that's true, if you have to load all the records at once while importing, that's probably not going to help. I was thinking if you were incrementally reading them from another source.

from activerecord-import.

james-em avatar james-em commented on August 15, 2024

Yeah that's true, if you have to load all the records at once while importing, that's probably not going to help. I was thinking if you were incrementally reading them from another source.

I don't have to, but if I did, would each_slice(1000) be exactly the same as doing batch_size: 1000 or the gem builds all the SQL queries for all batches at once first in RAM before executing the batches?

from activerecord-import.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.