Code Monkey home page Code Monkey logo

google-open-image-download's Introduction

Google open image download

A py2/py3 script for downloading and rescaling the open image dataset in parallel. Here it is maxing out a 200mbit pipe over 5 days.

Maxing out a 200mbit pipe

setup

To install dependencies run

pip install -r requirements

Follow the instructions on the open image data repo to get the list of image urls.

usage

The two requirement arguments are input and output. Input is the csv file of urls from the open image data set. Output is a directory where the scaled images will be saved.

By default, the images will be scaled so that the smallest dimension is equal to 256 (controlled by the min-dim arg). The saved images are placed in sub-directories for efficiency (the number of which is controlled by the sub-dirs arg). The name of the saved image corresponds to Google's ImageID which can be used to look up labels in the open image dataset.

Use --help to see the other optional args.

notes

I'm not using asyncio because the processes also scale the image so we wouldn't see much speed up

google-open-image-download's People

Contributors

ejlb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

google-open-image-download's Issues

UTF-8 charaters on Windows

Thanks for this script! I am running it on Windows and it seems to dump out when it hits some of the foreign characters in the description (?) column. Consequently, I am only able to download about 90 image files. Any way to make it decode those characters? Or this there another issue?

No row.iteritems()

Thanks for the fix; I see what you did with the UTF-8 chars. However, the new code gives me the following error: AttributeError: 'dict' object has no attribute 'iteritems'. See details below.

python download.py --timeout 10 --sub-dirs 100 --min-dim -1 "S:\Google OpenImages\images_2016_08\train\images.csv" "S:\Google OpenImages\images_2016_08\train\images"

8044 @ 2016-10-06 09:11:05,259 (266) download - DEBUG - Namespace(consumers=5, force=False, input='S:\Google OpenImages\images_2016_08\train\images.csv', min_dim=-1, output='S:\Google OpenImages\images_2016_08\train\images', queue_size=1000, sub_dirs=100, timeout=10.0)
Process Process-1:
Traceback (most recent call last):
File "C:\Python35\lib\multiprocessing\process.py", line 249, in _bootstrap
self.run()
File "C:\Python35\lib\multiprocessing\process.py", line 93, in run
self._target(_self._args, *_self._kwargs)
File "S:\Google OpenImages\download\download.py", line 145, in producer
for row in unicode_dict_reader(f):
File "S:\Google OpenImages\download\download.py", line 58, in unicode_dict_reader
yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}
AttributeError: 'dict' object has no attribute 'iteritems'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.