Code Monkey home page Code Monkey logo

spout-pdo-example's Introduction

Spout & PDO Example

The code in this repository demonstrates how to efficiently extract large data set from a MySQL database and export it to an Excel file. It turns out that this is not as easy at it seems. Both of these tasks often leads to terrible performance and out of memory crashes with big data sets.

This example uses PDO as the MySQL driver as well as Spout to create XLSX files. [Read why.]

How to run the example?

  1. Clone or download this repository.
  2. Add data in your MySQL database. This example provides test data that can be found in sql/fixtures.sql. To create a new database and import the test data into it, you can run mysql -u {USERNAME} -p {password} < sql/fixtures.sql. You may need to comment the first line of the sql file for the first import.
  3. Run the example: php index.php

Benchmarks

These metrics have been obtained by fetching and writing 50,000 rows (200,000 cells) to a XLSX file.

Row fetching mode Elapsed time Memory peak
One by one 8.32s 1MB
Batch (500) 8.02s 2.25MB
Batch (1000) 9.72s 3.75MB
All at once 9.24s 73.25MB

These metrics have been obtained by fetching and writing 200,000 rows (800,000 cells) to a XLSX file.

Row fetching mode Elapsed time Memory peak
One by one 32.95s 1MB
Batch (500) 36.86s 2.25MB
Batch (1000) 38.70s 3.75MB
All at once 40.61s 289MB

FAQ

Why Spout? Why not PHPExcel?

Even though PHPExcel is a great and popular library, it reaches its limits with large data sets. Spout was designed to support any size of data sets, making it super easy to scale without needing to worry about caching, optimizations, etc.

To support the previous claim, the same set of metrics was collected, but using PHPExcel instead of Spout. Here are the results obtained:

Num written rows Metric PHPExcel Spout Ratio
50,000 Elapsed time 33.56s 8.02s 4.2x
Memory peak 186MB 2.25MB 83x
200,000 Elapsed time 172.91s 36.86s 4.5x
Memory peak 720.5MB 2.25MB 320x
How can the example consume so little memory?

In this example, several techniques are used to achieve this goal without compromising the performance.

The single row fetching mode uses unbuffered queries. Contrary to buffered queries where the whole result set is immediately sent to the client and stored in memory, the results of unbuffered queries are stored on the MySQL server and can be fetched one row at a time. This greatly reduces the amount of memory needed by the PHP script since only one row is stored in memory at any given time.

The batch fetching mode uses a primary key to split the final result set in small chunks. Most importantly, it does not use OFFSET to fetch next batches but remembers the ID of the last fetched row and uses this as the offset (WHERE id > $lastFetchedId). This works great for very large tables and prevents a bad full table scan. By using LIMIT XXX, only few rows are stored in memory at any given time.

Finally, Spout uses streams to write the data it is being passed. Streaming means that Spout only acts as a passthrough, and once the data is written, it just forgets about it freeing the memory.

How to generate a smaller/bigger data set?

The data set is generated by a script: gen-fixtures.php

This script contains the number of rows to generate (by default 50,000). You can update this number and run php scripts/gen-fixtures.php to re-generate the fixtures.sql file.

spout-pdo-example's People

Contributors

adrilo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.