Comments (9)
While Arquero does not provide a built-in operator for this, you can add your own by extending Arquero via the addFunction
method. The following code block can be directly pasted into an Observable cell:
{
// generate a recode function for the given recoding parameters
function recoder(map, other) {
return value => map.has(value) ? map.get(value)
: other === undefined ? value
: other;
}
// register a custom recode function
aq.addFunction(
'my_recode',
recoder(new Map([['foo', 'farp'], ['bar', 'borp']]), 'other'),
{ override: true } // suppress errors if we re-evaluate this code
);
// apply the recode function to a data table
return aq.table({ x: ['foo', 'bar', 'baz'] })
.derive({ z: d => op.my_recode(d.x) })
.toJSON();
// returns: '{"x":["foo","bar","baz"],"z":["farp","borp","other"]}'
}
from arquero.
Yes, absolutely, I should have mentioned it but I already use this possibility. I also added custom (quick and dirty) "batch_derive" and "batch_recode" functions to tables thanks to Arquero's awesome customisability.
Many thanks for your detailed answer, your recoder
code is of course better than mine, so I'll gladly take it !
from arquero.
An op.recode()
method is now staged for v1.2.0. Unlike my earlier example, it uses a vanilla object to specify the value map, as this can be specified and serialized more easily in Arquero table expressions:
table.derive({ val: d => op.recode(d.val, { oldA: 'newA', oldB: 'newB' }, '?') });
This also avoids the need to register a new function, though you can still define the map outside of Arquero and then bind it as a parameter:
const map = { oldA: 'newA', oldB: 'newB' };
table.params({ map }).derive({ val: (d, $) => op.recode(d.val, $.map, '?') });
from arquero.
Thanks a lot !
Please excuse me if I am wrong, but one reason I used a Map
instead of an Object
is because Map
can accept any type of values as keys, so with it I can recode numbers or even undefined
values. I think this would not be possible with a vanilla object ?
from arquero.
Any value that can be coerced to a string (including numbers, booleans, null, undefined, dates, etc) can be recoded with the new function, which suffices for many use cases I've encountered. However, there can of course be collisions (e.g., the string 'true' and boolean true map to the same key). If strict object equality is a requirement for you, I would stick with the earlier solution.
from arquero.
On one hand using a vanilla object makes it easier to create, whereas a Map
is a bit more complicated with new Map([[..]])
. But I think having the possibility to declare precise recodings, not prone to errors or confusion due to conversion to strings, is really important.
I imagine accepting both and testing for the type before recoding would add unnecessary code complexity ?
Many thanks for looking into it.
from arquero.
Can you provide a specific example (illustrative of a common real-world use case) where the string conversion is notably problematic? If you are recoding primitive types or dates, string coercion works reasonably well. Whether loose equivalence (e.g., 2
and '2'
being equivalent for recoding) is a "feature" or a "bug" is use case dependent.
Meanwhile, here is an example where Map's use of strict object equality can lead to potentially unexpected results:
const d1 = new Date(2000, 0, 1);
const d2 = new Date(2000, 0, 1);
const m = new Map().set(d1, 'foo');
m.get(d1) // 'foo'
m.get(d2) // undefined
const o = {[d1]: 'foo'};
o[d1] // 'foo'
o[d2] // 'foo
And you can always use the alternative method above (with custom function registration) if you need it!
from arquero.
I thought about this some more and realized there is no reason op.recode
can't support either an Object or a Map. I've updated the v1.2.0 branch accordingly. Note that (for the time being) when using a Map you must define it externally and bind it as a parameter, as Arquero table expressions do not permit use of the new
operator.
from arquero.
I think that would be perfect. A simpler solution suitable for most use cases with vanilla object, and the ability to use a Map
if it is needed to be very specific.
You're right, though, Map
use cases should be quite rare, I don't find any other cases that the one you mentioned, when there is the same value with a different type in the same variable (such as undefined
and "undefined"
) and you would want to recode them to different values.
Many thanks again for implementing this.
from arquero.
Related Issues (20)
- Interpolate missing values
- arquero op.first_value function issue HOT 2
- table.print() should return table to enable chaining HOT 2
- Citation for academic paper HOT 2
- Nuxt 2.0 build failing due to apache-arrow exports HOT 6
- Add op functions to work with Map and Set objects HOT 1
- `fromCSV` fails with uncaught TypeError on CSV with headers only HOT 1
- Join ignores empty string as suffix
- Table expressions do not support underscores as numeric separators in numeric literals
- CSV parse functions don't get run on null values
- Failing during production build: minification problem? HOT 10
- Verb to drop columns by name? HOT 2
- derive can not handle string? HOT 2
- Table concatenation results in empty table
- Problems getting Arquero to find it's types in Typescript HOT 5
- array_agg and undefined/none values HOT 3
- fromArrow -> Unrecognized type: "undefined"
- COUNTIF-like aggregate function
- NextJS swcMinify is breaking arquero
- Apache Arrow 15 support HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arquero.