Comments (11)
OK, so I talked with Eunji(our new lab member) and got some great insights for this.
In traditional compilers, jit compilation is usually used for code blocks that are executed multiple times(think loops). They generate a complete execution plan prior to the execution, and then as the iteration goes on, re-write the plan based on obtained stats.
In contrast, the DAGs we execute do not contain cycles, and runtime optimization(e.g., dynamic partitioning) of an operator usually depends on the results of previous operators. This lets us get away with a simpler design where the compiler generates the execution plan piece by piece, without having to re-write the plan across multiple layers(ir/runtime-logical/runtime-physical).
For example, let's say we're applying a dynamic partitioning optimization to a MapReduce application. First, the compiler generates a partial execution plan for Map and dynamically added KeyHistogram operators. After that, it can generate the rest of the execution plan for Reduce. We do not need to do any re-writes during the process.
I know Optimus does re-writing. But my feeling is that it is more to do with the system's legacy code than with the actual requirements.
Of course, we might want to do a re-write while executing the operator(e.g., change the number of reducers while executing the reducer), especially for streaming applications. Thus, I guess we need the re-writing mechanism after all? Then I guess we really need to put in efforts to make the re-writing across multiple layers easy. What are your thoughts?
from nemo.
Hmm... I guess we need the re-writing mechanism after all. Even the Beam applications that we have now(MLR, ALS) have implicit conditional cycles/loops. It'd be great if we can explicitly express the cycles/loops in our IR, and use the traditional compiler's techniques on them.
from nemo.
How do we handle the cycles/loops currently?
It'd be nice to have cycles/loops in our IR.
from nemo.
Currently, we do not have cycle/loop in our IR. I'll file an issue for it.
Note that Beam also does not have cycle/loop in its language. I suppose we need to identify it ourselves in our Beam frontend.
from nemo.
Then, how do we handle cycles (iterations) now?
from nemo.
Right now, we just have a long DAG with duplicate sets of operators.
Because Beam does not have the concept of conditional loop, we have fixed number of iterations for ALS and MLR.
For example, if we have 2 operators in a loop, and there are 10 iterations, we simply have a long chain of 20 operators.
from nemo.
@johnyangk Do you plan to add a loop in the current IR? This can be a good topic to discuss in our meeting.
from nemo.
Sure this is a good discussion topic. If we have concrete optimization techniques to use for loops, then we can make this a high priority task , since this is something the Optimus paper did not have. However, if we don't, then I think it'd be better to postpone this task.
from nemo.
If we decide to add a loop, there are two potential approaches
- Add a high-level loop construct
- Add a low-level jump/condition construct to create a loop
from nemo.
Loops will be handled with #121
from nemo.
See issues marked with DynOpt
from nemo.
Related Issues (20)
- Sending data from executor to master. HOT 1
- Make PhysicalPlan scalable
- Split Source in Executor Instead of Driver HOT 1
- Remove TaskGroup class HOT 1
- Fix ALSPadoPass#allFromReserved
- Change AlternatingLeastSquare.java
- Handling Spark Source in a distributed manner HOT 1
- Support multiple byte streams per one block request
- Run Java Spark ALS in distributed mode HOT 1
- Implement collection of data from executor to client. HOT 1
- Fix DataSkewRuntimePass to play well with compression
- Implement SourceLocationAwareSchedulingPolicy HOT 2
- Reuse invariant data when unrolling loop vertex
- Set HashRange to a prime number HOT 1
- Implement the unimplemented transformations/actions & dataset initialization methods for Spark frontend. HOT 1
- Fully support SparkSQL
- Executor side metric logs aren't sent to Master HOT 2
- Pack Block Location Request Messaging
- Fix loop optimization pass to work with operations locating after the loop
- SparkSQL Examples and Integration Tests.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nemo.