Add an op.recode function.

Arquero

Arquero is a JavaScript library for query processing and transformation of array-backed data tables. Following the relational algebra and inspired by the design of dplyr, Arquero provides a fluent API for manipulating column-oriented data frames. Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations.

Fast: process data tables with million+ rows.
Flexible: query over arrays, typed arrays, array-like objects, or Apache Arrow columns.
Full-Featured: perform a variety of wrangling and analysis tasks.
Extensible: add new column types or functions, including aggregate & window operations.
Lightweight: small size, minimal dependencies.

To get up and running, start with the Introducing Arquero tutorial, part of the Arquero notebook collection.

Have a question or need help? Post to the Arquero GitHub Discussions board.

Arquero is Spanish for "archer": if datasets are arrows, Arquero helps their aim stay true. 🏹 Arquero also refers to a goalkeeper: safeguard your data from analytic "own goals"! 🥅 ✋ ⚽

API Documentation

Top-Level API - All methods in the top-level Arquero namespace.
Table - Table access and output methods.
Verbs - Table transformation verbs.
Op Functions - All functions, including aggregate and window functions.
Expressions - Parsing and generation of table expressions.
Extensibility - Extend Arquero with new expression functions or table verbs.

Example

The core abstractions in Arquero are data tables, which model each column as an array of values, and verbs that transform data and return new tables. Verbs are table methods, allowing method chaining for multi-step transformations. Though each table is unique, many verbs reuse the underlying columns to limit duplication.

import { all, desc, op, table } from 'arquero';

// Average hours of sunshine per month, from https://usclimatedata.com/.
const dt = table({
  'Seattle': [69,108,178,207,253,268,312,281,221,142,72,52],
  'Chicago': [135,136,187,215,281,311,318,283,226,193,113,106],
  'San Francisco': [165,182,251,281,314,330,300,272,267,243,189,156]
});

// Sorted differences between Seattle and Chicago.
// Table expressions use arrow function syntax.
dt.derive({
    month: d => op.row_number(),
    diff:  d => d.Seattle - d.Chicago
  })
  .select('month', 'diff')
  .orderby(desc('diff'))
  .print();

// Is Seattle more correlated with San Francisco or Chicago?
// Operations accept column name strings outside a function context.
dt.rollup({
    corr_sf:  op.corr('Seattle', 'San Francisco'),
    corr_chi: op.corr('Seattle', 'Chicago')
  })
  .print();

// Aggregate statistics per city, as output objects.
// Reshape (fold) the data to a two column layout: city, sun.
dt.fold(all(), { as: ['city', 'sun'] })
  .groupby('city')
  .rollup({
    min:  d => op.min(d.sun), // functional form of op.min('sun')
    max:  d => op.max(d.sun),
    avg:  d => op.average(d.sun),
    med:  d => op.median(d.sun),
    // functional forms permit flexible table expressions
    skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0
  })
  .objects()

Usage

In Browser

To use in the browser, you can load Arquero from a content delivery network:

<script src="https://cdn.jsdelivr.net/npm/arquero@latest"></script>

Arquero will be imported into the aq global object. The default browser bundle does not include the Apache Arrow library. To perform Arrow encoding using toArrow() or binary file loading using loadArrow(), import Apache Arrow first:

<script src="https://cdn.jsdelivr.net/npm/apache-arrow@latest"></script>
<script src="https://cdn.jsdelivr.net/npm/arquero@latest"></script>

Alternatively, you can build and import arquero.min.js from the dist directory, or build your own application bundle. When building custom application bundles for the browser, the module bundler should draw from the browser property of Arquero's package.json file. For example, if using rollup, pass the browser: true option to the node-resolve plugin.

Arquero uses modern JavaScript features, and so will not work with some outdated browsers. To use Arquero with older browsers including Internet Explorer, set up your project with a transpiler such as Babel.

In Node.js or Application Bundles

First install arquero as a dependency, for example via npm install arquero --save. Arquero assumes Node version 12 or higher.

Import using CommonJS module syntax:

const aq = require('arquero');

Import using ES module syntax, import all exports into a single object:

import * as aq from 'arquero';

Import using ES module syntax, with targeted imports:

import { op, table } from 'arquero';

Build Instructions

To build and develop Arquero locally:

Clone https://github.com/uwdata/arquero.
Run npm i to install dependencies.
Run npm test to run test cases, npm run perf to run performance benchmarks, and npm run build to build output files.

	/**
	* Format this table as a JavaScript Object Notation (JSON) string.
	* @param {JSONFormatOptions} options The formatting options.
	* @return {string} A JSON string.
	*/
	toJSON(options) {
	return toJSON(this, options);
	}

	* @param {object} [config] Configuration settings for the new table:
	* - data: The data payload to use.
	* - names: An ordered list of column names.
	* - filter: An additional filter bitset to apply.
	* - groups: The groupby specification to use (null for no groups).
	* - order: The orderby comparator to use (null for no order).
	* - params: Table expression parameters.

	/**
	* Returns the row order comparator function, if specified.
	* @return {Function} The row order comparator function.
	*/
	comparator() {
	return this._order;
	}

x	y	z
01	1	123
01	2	256
01	3	854
02	1	652
02	2	734
02	3	222

1	2	3	x
123	256	854	01
652	734	222	02

	/**
	* Create a new table for a set of named columns.
	* @param {object} columns
	* The set of named column arrays.
	* Object keys are the column names.
	* The enumeration order of the keys determines the column indices.
	* Object values must be arrays (or array-like values) of identical length.
	* @return {ColumnTable} the instantiated table
	* @example table({ colA: ['a', 'b', 'c'], colB: [3, 4, 5] })
	*/
	export function table(columns, names) {
	return new ColumnTable(mapObject(columns, x => x), names);
	}

uwdata / arquero Goto Github PK

arquero's Introduction

Arquero

API Documentation

Example

Usage

In Browser

In Node.js or Application Bundles

Build Instructions

arquero's People

Contributors

Stargazers

Watchers

Forkers

arquero's Issues

Problem

How to fix

Recommend Projects

Recommend Topics

Recommend Org