Code Monkey home page Code Monkey logo

observations's People

Contributors

arvinds-ds avatar dustinvtran avatar kashif avatar zepx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

observations's Issues

Porting 1000+ R datasets to observations

Hi Dustin,
I have written a python script to generate observations ready python files for over 1100 datasets available in R and related packages. The project page is https://github.com/Arvinds-ds/datasets_r2py. Could you kindly verify/comment on the following

  1. Whether the generated python source files conform to edward/observationsrequirements, the files are automatically generated in 'observations/rdata/', you can look at these files. If you have changes, kindly let me know the modifications to the templated python files init_template.py, template.py or test_template.py
  2. If we were to use the files, how will it be structured in edward.I currently generated files in observations/rdata folder with tests in observations/rdata/tests
    Let me know.

Error when trying to download de mnist dataset

In file Util.py at line 129 (version 0.1.4):

file_size = int(response.headers.get('content-range').split('/')[1])

I get this error:

AttributeError: 'NoneType' object has no attribute 'split'

uci data sets

print/logging messages in data loading functions

All functions are silent whether it be loading, preprocessing, or saving the data. Currently, the only step that prints to stdout is downloading. These other steps can somtimes be very expensive, such as the preprocessing in small32_imagenet.py.

We should consider adding stdout messages to the loading, preprocessing, and saving steps—depending on how long it may take. And we should establish a standard that applies across all data sets.

decide on standard for saving/loading data sets that fit in memory

For data sets like multi-MNIST and small ImageNet, we preprocess the data and cache by writing to disk so that future calls can load it into memory. More generally, we need to save and load data when its function requires preprocessing and the data fits in memory to be loaded.

We should decide on a specific option such as pickle, np.savez, or hdf5.

standardize use of open()

Searching open( in the repo shows that we use Python's open() function sometimes with the 'rb' arg, sometimes no arg, sometimes with w. We should make its usage consistent.

write 4-page paper for JMLR software track

We're currently in (pre-)alpha. Writing the 4-page paper will (a) announce the library for an official public release; (b) formalize design principles; (c) formalize design details such as how we handle various data domains, data structures, and data sizes; (d) provide statistics on Observations' collection of data sets.

All contributors are authors.

tensorflow.contrib.data in observations

I am heavily using tf.contrib.data datasets api for image based tasks. With observations for images (LSUN/celebA) etc being no more than an downloader for these datasets, would it be worthwhile to return a tensorflow dataset something along the lines of

lsun_bedroom_x_train  = lsun('~/data',category='bedroom', set='training', 
                                                   batch_size=32, shuffle=True)
training_data = lsun_bedroom_x_train.make_one_shot_iterator()
.....
for i in range(inference.n_iter):
   x_batch = training_data.get_next()
   inference.update(....{x_ph: x_batch)

allow for optional filename

All functions check for the default filename in path. This doesn't allow the user to load from a renamed file. Enable an optional filename argument.

There must be some care when the filename is a group of files or a directory. I don't know how to handle the arbitrary case.

add API docs to edwardlib.org/api/observations

  • on TOC, add sidebar, then observations link with description in TOC linking to all other functions
  • references
    • use [@] and @
    • store bibtex in observations/ repo
    • have edward link to it in pandoc command inside compile.sh
  • add link in README.md

How do we do unit testing?

Testing each loading function requires downloading data from the url(s) and verifying the returned data objects. How do we test without having to download many files for each Travis build? Is storing every file on a Travis server feasible? (no)

text data sets

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.