accurat / data-juggler Goto Github PK
View Code? Open in Web Editor NEWA data wrapper and enricher π€ΉββοΈ
License: MIT License
A data wrapper and enricher π€ΉββοΈ
License: MIT License
const datasetRaw = [
{ height: 190, gender: 'male', timeOfMeasure: 1552397833139 },
{ height: 170, gender: 'female', age: 22, timeOfMeasure: 1552397832139 },
{ height: 164, gender: 'female', age: 20, timeOfMeasure: 15523912333139 },
{ height: 176, gender: 'female', age: 12 }
]
function autoTypeColumn(datasetColumn) {
if (...) return { type: 'continuous', max: 100, min: 0, nullable: true }
else if (...) return { type: 'categorical', enum: ['male', 'female', 'neutral'], nullable: false }
else if (...) return { type: 'date', max: 000, min: 000, readFormat: 'Y-m-d', nullable: true }
}
function autoCleanData(dataset, missingData = []) {
/*
- Transform missing data in `null` (do it also for undefined)
- Transform strings containing numbers to Numbers
- Missing days or temporal periods? Maybe add nulls
*/
return [...]
}
const columnTypes = {
// height: { type: 'continuous', min: 164, max: 190 },
// gender: { type: 'categorical', enum: ['male', 'female', 'neutral'], nullable: false },
// age: { type: 'continuous', ... },
// timeOfMeasure: { type: 'date', max: 000, min: 000, readFormat: 'Y-m-d', nullable: true },
[x]: autoTypeColumn(data.map(d => d[x])),
}
const dataset = dataJuggle({
dataset: autoCleanData(datasetRaw, ['', 'N/A']),
columnTypes,
formatters: null, // Will be added the default ones for each data type
// virtualColumns, // FUTURE
})
dataset[0] === {
height: { raw: 190, scaled: 1 },
timeOfMeasure: { ... },
}```
Inside tsconfig.json
there is an attribute called module: ""
. Its value can be commonjs
, es6
, ...
If in data-juggler
the value is module: "commonjs"
, you can:
yarn test
yarn build
npx ts-node ./benchmark-test/create-dataset.ts
npx ts-node ./benchmark-test/main.ts
but if you run views
, then you have this error:
So, change module: "commonjs"
in module: "es6"
then:
yarn build
views
.Hello sir, how do I use this library with the really popular library mobx-state-tree?
Here is an example of what I'm doing today
export const Data = t
.model('Data', {
values: t.frozen(VALUES as ValueType[]),
})
export const State = t
.model('State', {
data: t.optional(Data, {}),
// other substates...
})
const state = State.create({})
The library is coming along and this is a way to gather some feedback before implementing the more advanced features.
The purpose... of the library is to force us to define the properties of the data we would like to propagate in the application in one point and to abstract a bit of getters and scaling. If you go into a project you know that every component will have access to all the properties of the datum, included a scaled value between 0 and 1, which could become the standard. No need to browse an ad hoc dataStore
, because you know that the properties could have been declared in only one point.
This library could solve some issues like, as an example, you plotted another ugly scatterplot and you have to make a tooltip but you did not propagate the raw data or the formatted data so now you have to deal with changing the code in multiple points. With this library (in theory) either you have it already inside, or you just added as a property at the source.
So far... the library works with three basic inputs, two necessary, an instance of CSV like data and an object which indicates the variables "meta-data", and a, optional, set of functions which will define custom datum properties. Exempli gratia:
data = [
{ height: 190, gender: 'male', timeOfMeasure: 1552397833139 },
{ height: 170, gender: 'female', age: 22, timeOfMeasure: 1552397832139 },
{ height: 164, gender: 'female', age: 20, timeOfMeasure: 15523912333139 },
{ height: 176, gender: 'female', age: 12 }
];
types = {
height: 'continuous',
gender: 'categorical',
age: 'continuous',
timeOfMeasure: 'date'
};
formatter = {
height: [{
property: 'feet',
compute: (datum) => datum * 0.0328084
},
{
property: 'rescaled',
compute: (datum, min, max) => datum / max
}],
timeOfMeasure: [{
property: 'year',
compute: (day) => day.format('YYYY')
}]
}
// dataStoreFactory is imported eventually
const storeStateAndVariousNames = dataStoreFactory(data, types, formatter)
This gives you nice properties. Getters per "column"...
storeStateAndVariousNames.height[0] // { raw: 190, scaled: 1, feet: 6,233596, rescaled: 1},
storeStateAndVariousNames.timeOfMeasure[0] // { dateTime: dayjs.Dayjs(this.raw) , isValid: true, iso: '2019-03-12T14:37:13+01:00', raw: 1552397832139, scaled: 1 }
... some stats ...
storeStateAndVariousNames.stats.height // { min: 164, max: 190 }
storeStateAndVariousNames.stats.gender // { frequencies: { male: 1, female: 3 } }
... and more coming up hopefully.
Going forward... there are certain things that we would like to implement and that we are certain will come:
Virtual columns computed with a function that takes as only argument the whole "row", in the fashion of the formatter
object ( as proposed by @YeasterEgg )
Detaching from mobx-state-tree
The whole issue is that I would like to know what use cases and issues you had in your projects that you would like to see implemented and that would speed up your process. As an example, do you want to pass a fetching function in order to abstract this junk?
The datum should also contain a:+
{...a, logScaled: number}
being the scaled logarithmic and normalised value
Currently the available types are set via a string, maybe export them as constants
Allow the creation of custom types
The NPM package https://www.npmjs.com/package/data-juggler should be owned by the @accurat organization: https://www.npmjs.com/settings/accurat/packages
Thanks!
a1, a2, 3, 4, 5
{ a1: 1, a2: 1 }
(no 3, 4, 5 values)TRUE, FALSE, FALSE
[]
(Will fill the form later.)
Steps to reproduce
Actual behavior
[...]
Expected behavior
[...]
Device info:
Data Juggler converts date into a unix timestamp (seconds) so we loses informations about milliseconds.
Take a look to Dayjs unix().
Related to pull request #360 in Views.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.