Comments (5)
It sounds interesting! I'll need a bit to get my head around what it entails, but it seems quite reasonable to 1. limit the use of assumptions about time, and 2. be as serious as possible about usize
sizes.
If it's alright, I'll take a few to read up on things. Mostly, I'd love to avoid having code in TD/DD that I don't understand, because it's largely on me if something is broken elsewhere. But I have to imagine if nothing else a feature flag for time would be really easy, as should be fixing DD's 64 bit assumption (which is probably a bug anyhow).
from differential-dataflow.
I think all changes look fine from a high level. The time changes don't seem to change any behavior on non-wasm targets. The RHH code isn't used anywhere at this point. Before committing to supporting wasm, I'd like to know if there's a regression test we could have. It seems testing isn't as simple as it is for other targets, but if we can get it into CI then I don't think there's anything blocking this.
As @frankmcsherry points out, there are some parts of the code that might assume usize
to have the same length as u64
. Removing this assumption would be a win in itself.
from differential-dataflow.
I've started to take a look (sorry for the delay) and have some quick thoughts:
- The DD changes .. could probably just instead be deletion of the associated code. At least, the time changes are a. an unused
_timer
, and b. aYieldingIter
that isn't used other than in some commented-out Kafka code (it could become commented out also, and place a similar obligation on uncommenting the code as therdkafka
connections). If we clean this all up, then no one needs timers at all in here. - The DD changes miss an important use in the
dogsdogsdogs
project, which is the only load-bearing subproject (though I can understand that it isn't obvious). Specifically,half_join.rs
usesInstant
as an argument to theyield_function
, allowing the user to call.elapsed()
if they are interested and not otherwise. We could also just not have dogs3 compile on web assembly / delay that work until someone needs it. - The TD changes are harder to avoid, but do seem some amount of harmless. There's some weirdly dead-ish code (viz
Worker::timer()
), but things like scheduling, sequencing, and activation seem to want to do things based on external notions of time, and logging just seems somewhat wedged without access to time.
I'm going to look into a DD PR that essentially tracks your changes (fixing RHH) but subtracts out the time-based code rather than modifies it. I'll report back here about that.
from differential-dataflow.
The RHH fix seems more subtle than at first glance. The code also has elsewhere this
/// Indicates both the desired location and the hash signature of the key.
fn desired_location<K: Hashable>(&self, key: &K) -> usize {
let hash: usize = key.hashed().into().try_into().unwrap();
hash / self.divisor
}
where the try_into()
is from u64
to usize
.
The code is not currently active, though .. it could become so in the future when I get some time. Ideally it wouldn't silently break the WASM builds, though this is an example of where it could/would.
My understanding of WASM is that 64-bit types are fine, just that usize
is 32 bits. So, probably the right thing here (and perhaps elsewhere) is to be more serious about either using u64
throughout, or pivoting to usize
earlier (and using it throughout).
from differential-dataflow.
Apologies for my lack of reply sooner - I've been away. Thanks for making progress on these changes 😃.
The DD changes miss an important use in the dogsdogsdogs project
Ah yes, I haven't used that before.
We could also just not have dogs3 compile on web assembly / delay that work until someone needs it.
I'd be very happy with that - I have had a lot of fun composing DD operators without having to reach for what dogsdogsdogs provides.
My understanding of WASM is that 64-bit types are fine, just that usize is 32 bits
Yep that's right, similar to using u128
on a 64-bit machine. There may be other issues that aren't exposed until runtime of some code paths - full disclosure: I tried out making the surgical changes required to get DD working in WASM, found issues, fixed, rinse-and-repeat, rather than reading through all the code to find where usize
being 32 might be problematic. The left shift was picked up by the compiler - you can reproduce with cargo build --target wasm32-unknown-unknown
.
Let me know if there's something I can help with that doesn't involve changing production "load-bearing" code. E.g. setting up some tests as suggested - details for WASM: https://rustwasm.github.io/wasm-bindgen/wasm-bindgen-test/index.html. Note this does run the compiled WASM using Node JS. Or I can provide a basic WASM example for motivation - play around with an interactive dataflow by visiting a web page and clicking on some buttons.
from differential-dataflow.
Related Issues (20)
- Strategies for maintaining persistent states (the data in Collections) HOT 2
- Consolidate Timestamps and Time Windowed Dataflows
- what different with flink Retraction
- Optional Abomonation? HOT 1
- Replicate Cross Join Situation HOT 2
- Operator to flatten `Collection<Collection<G, D, R>>` into `Collection<G, D, R>`
- Difficulty understanding how to use prefix_sum / how to implement topK HOT 6
- miri: Undefined Behavior: trying to retag from <20432167> for Unique permission in push_unchecked HOT 2
- Support `TimelyStack` as storage for `(T, R)` in arrangement leafs HOT 1
- Holding on to a trace with physical/logical compaction to the empty frontier stalls compaction
- Question: how to change data timestamp for late arriving data HOT 4
- Question: how to query data from past timestamps? HOT 5
- maybe the doc should add some instructions at geting started section
- Does all data have to be in memory? HOT 3
- Revisit the stashing logic in MergeBatcherColumnation
- Arrangement batch formation costs in proportion to outstanding updates HOT 3
- Getting Started Guide for Newcomers Doesn't Work HOT 3
- Improve clarity around `Cursor` method requirements
- Implement flat container support for `PointStamp`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from differential-dataflow.