Code Monkey home page Code Monkey logo

red-datasets's Introduction

Red Datasets

Build Status Gem Version

Description

Red Datasets provides classes that provide common datasets such as iris dataset.

You can use datasets easily because you can access each dataset with multiple ways such as #each and Apache Arrow Record Batch.

Install

% gem install red-datasets

Available datasets

  • Adult Dataset
  • Aozora Bunko
  • California Housing
  • CIFAR-10 Dataset
  • CIFAR-100 Dataset
  • CLDR language plural rules
  • Communities and crime
  • Diamonds Dataset
  • E-Stat Japan
  • Fashion-MNIST
  • Fuel Economy Dataset
  • Geolonia Japanese Addresses
  • Hepatitis
  • Iris Dataset
  • Libsvm
  • MNIST database
  • Mushroom
  • Penguins
  • The Penn Treebank Project
  • PMJT - Pre-Modern Japanese Text dataset list
  • Postal Codes in Japan
  • Rdatasets
  • Seaborn
  • Sudachi Synonym Dictionary
  • Wikipedia
  • Wine Dataset

Usage

Here is an example to access Iris Data Set by #each or Table#to_h or Table#fetch_values.

require "datasets"

iris = Datasets::Iris.new
iris.each do |record|
  p [
     record.sepal_length,
     record.sepal_width,
     record.petal_length,
     record.petal_width,
     record.label,
  ]
end
# => [5.1, 3.5, 1.4, 0.2, "Iris-setosa"]
# => [4.9, 3.0, 1.4, 0.2, "Iris-setosa"]
  :
# => [7.0, 3.2, 4.7, 1.4, "Iris-versicolor"]


iris_hash = iris.to_table.to_h
p iris_hash[:sepal_length]
# => [5.1, 4.9, .. , 7.0, ..
p iris_hash[:sepal_width]
# => [3.5, 3.0, .. , 3.2, ..
p iris_hash[:petal_length]
# => [1.4, 1.4, .. , 4.7, ..
p iris_hash[:petal_width]
# => [0.2, 0.2, .. , 1.4, ..
p iris_hash[:label]
# => ["Iris-setosa", "Iris-setosa", .. , "Iris-versicolor", ..


iris_table = iris.to_table
p iris_table.fetch_values(:sepal_length, :sepal_width, :petal_length, :petal_width).transpose
# => [[5.1, 3.5, 1.4, 0.2],
      [4.9, 3.0, 1.4, 0.2],
      :
      [7.0, 3.2, 4.7, 1.4],
      :

p iris_table[:label]
# => ["Iris-setosa", "Iris-setosa", .. , "Iris-versicolor", ..

Here is an example to access The CIFAR-10/100 dataset by #each:

CIFAR-10

require "datasets"

cifar = Datasets::CIFAR.new(n_classes: 10, type: :train)
cifar.metadata
#=> #<struct Datasets::Metadata name="CIFAR-10", url="https://www.cs.toronto.edu/~kriz/cifar.html", licenses=nil, description="CIFAR-10 is 32x32 image dataset">licenses=nil, description="CIFAR-10 is 32x32 image datasets">
cifar.each do |record|
  p record.pixels
  # => [59, 43, 50, 68, 98, 119, 139, 145, 149, 143, .....]
  p record.label
  # => 6
end

CIFAR-100

require "datasets"

cifar = Datasets::CIFAR.new(n_classes: 100, type: :test)
cifar.metadata
#=> #<struct Datasets::Metadata name="CIFAR-100", url="https://www.cs.toronto.edu/~kriz/cifar.html", licenses=nil, description="CIFAR-100 is 32x32 image dataset">
cifar.each do |record|
  p record.pixels
  #=> [199, 196, 195, 195, 196, 197, 198, 198, 199, .....]
  p record.coarse_label
  #=> 10
  p record.fine_label
  #=> 49
end

MNIST

require "datasets"

mnist = Datasets::MNIST.new(type: :train)
mnist.metadata
#=> #<struct Datasets::Metadata name="MNIST-train", url="http://yann.lecun.com/exdb/mnist/", licenses=nil, description="a training set of 60,000 examples">

mnist.each do |record|
  p record.pixels
  # => [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .....]
  p record.label
  # => 5
end

NArray compatibility

How to develop Red Datasets

  1. Fork https://github.com/red-data-tools/red-datasets
  2. Create a feature branch from master
  3. Develop in the feature branch
  4. Pull request from the feature branch to https://github.com/red-data-tools/red-datasets

License

The MIT license. See LICENSE.txt for details.

red-datasets's People

Contributors

kou avatar yahonda avatar rilmayer avatar heronshoes avatar a-tommyyy avatar bkmgit avatar abcdefg-1234567 avatar kojix2 avatar strviola avatar otegami avatar mrkn avatar chimame avatar naitoh avatar kazu-5 avatar colspan avatar masayuki14 avatar hatappi avatar okadakk avatar youchan avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.