Perl script that can be used to copy a lot of objects (files) between MongoDB GridFS, PostgreSQL BLOBs, and Amazon S3.
-
Copy configuration template file:
cp sample-config/gridfs-to-s3.yml.dist gridfs-to-s3.yml
-
Edit configuration file
gridfs-to-s3.yml
to set the MongoDB GridFS / Amazon S3 connection settings and other properties. -
To copy files from GridFS to S3, run:
perl bin/copy-kvs.pl gridfs-to-s3.yml mongodb_gridfs amazon_s3
-
To copy files from S3 to GridFS, run:
perl bin/copy-kvs.pl gridfs-to-s3.yml amazon_s3 mongodb_gridfs
Install perltidy
Git hook to automatically fix script formatting:
cpanm --installdeps .
githook-perltidy install
When copying from GridFS, the script will sort files roughly by insertion date. The script uses object's ObjectId
for sorting. As the default ObjectId
contains an insertion timestamp (rounded to seconds), the objects are sorted in 1 second precision.
For example, if your GridFS deployment contains the following files:
> db.fs.files.find( { }, { filename: 1, uploadDate: 1 } )
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "1" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "2" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "3" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "4" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "5" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "6" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "7" }
{ "uploadDate" : ISODate("2013-06-11T08:00:00Z"), "filename" : "8" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "9" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "10" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "11" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "12" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "13" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "14" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "15" }
{ "uploadDate" : ISODate("2013-06-11T08:00:01Z"), "filename" : "16" }
...the script will copy files 1-8
in any order (because they were inserted during the same second), but will upload files 1-8
before uploading files 9-16
(because the latter were inserted in the next second).
When copying from PostgreSQL, the script will sort files by primary key.
So, unless you do something funky with your primary keys, the objects will be sorted in an insertion order.
When copying from S3, the script will sort files in lexicographical order.
For example, if your S3 bucket contains the following files:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...these files will be copied from S3 to GridFS in the following order:
1
10
11
12
13
14
15
16
2
3
4
5
6
7
8
9
This is because lexicographical order is the only one Amazon supports for sorting S3 contents (at the time of writing).
Execute:
COPY_KVS_S3_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE" \
COPY_KVS_S3_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYzEXAMPLEKEY" \
COPY_KVS_S3_BUCKET_NAME="copy-kvs-unit-tests" \
\
COPY_KVS_POSTGRES_HOST="localhost" \
COPY_KVS_POSTGRES_USERNAME="copy_kvs_test" \
COPY_KVS_POSTGRES_PASSWORD="copy_kvs_test" \
COPY_KVS_POSTGRES_DATABASE="copy_kvs_test" \
\
make test
to run unit tests provided with the script.