Code Monkey home page Code Monkey logo

dataget's Introduction

Dataget

Dataget is an easy to use, framework-agnostic, dataset library that gives you quick access to a collection of Machine Learning datasets through a simple API.

Main features:

  • Minimal: Downloads entire datasets with just 1 line of code.
  • Framework Agnostic: Loads data as numpy arrays or pandas dataframes which can be easily used with the majority of Machine Learning frameworks.
  • Transparent: By default stores the data in your current project so you can easily inspect it.
  • Memory Efficient: When a dataset doesn't fit in memory it will return metadata instead so you can iteratively load it.
  • Integrates with Kaggle: Supports loading datasets directly from Kaggle in a variety of formats.

Checkout the documentation for the list of available datasets.

Getting Started

In dataget you just have to do two things:

  • Instantiate a Dataset from our collection.
  • Call the get method to download the data to disk and load it into memory.

Both are usually done in one line:

import dataget


X_train, y_train, X_test, y_test = dataget.image.mnist().get()

This example downloads the MNIST dataset to ./data/image_mnist and loads it as numpy arrays.

Kaggle Support

Kaggle promotes the use of csv files and dataget loves it! With dataget you can quickly download any dataset from the platform and have immediate access to the data:

import dataget

df_train, df_test = dataget.kaggle(dataset="cristiangarcia/pointcloudmnist2d").get(
    files=["train.csv", "test.csv"]
)

To start using Kaggle datasets just make sure you have properly installed and configured the Kaggle API. In the future we want to expand Kaggle support in the following ways:

  • Be able to load any file that numpy or pandas can read.
  • Have generic support for other types of datasets like images, audio, video, etc.
    • e.g dataget.data.kaggle(..., type="image").get(...)

Installation

pip install dataget

Contributing

Adding a new dataset is easy! Read our guide on Creating a Dataset if you are interested in contributing a dataset.

License

MIT License

dataget's People

Contributors

cgarciae avatar charlielito avatar

Stargazers

Jacob A Rose avatar Ryan Holbrook avatar Kyle Mitchell avatar amrrs avatar mikey avatar Timothée Mazzucotelli avatar  avatar  avatar Shirish Kayastha avatar David Lopera avatar David Parra avatar Jose R. Zapata avatar Tatsuya Shirakawa avatar Jose Miguel Arrieta avatar Sergio Lucero avatar Sreedhar avatar  avatar  avatar

Watchers

James Cloos avatar Sreedhar avatar David Cardozo avatar  avatar  avatar Esteban Maya avatar David Lopera avatar  avatar

dataget's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.