Comments (9)
@gregory-marton question for you :)
from crosscat.
Sounds like you've done some investigation already. Can you share the details? In what way did it not work, which settings did you find, etc.?
from crosscat.
Oh, I thought it was intentional.
Line 27 in d7765df
There appears to not be a crosscat.settings since v0.1.25
The change that has broken seems sensible and straightforward but I would ask if the implementation, if fixed, would still be functional in the current version.
from crosscat.
I'm actually not sure of the history. I came onboard around v0.1.40, and this was not on my radar.
@riastradh-probcomp if you have time, can you give more context?
@kadwanev, pull requests are always welcome!
from crosscat.
Nobody has touched the Hadoop code in years, and it apparently requires various moving parts that were customized for one developer's setup years ago, with some private network layout and Amazon S3 account and local Hadoop installation &c.
I expect it would be easier to start from scratch than to try to revive what's there.
from crosscat.
Given that context, I expect the appropriate "fix" would be to remove HadoopEngine. @kadwanev, if you want to take this on instead, we would absolutely welcome it. If interested, let me know a time frame to check back with you?
from crosscat.
Understood. Thanks for the responses.
Just want to ask:
Is the distribution technique still sound?
Did it ever work?
I ask this because the only reducer reference I see is /bin/cat, which leads me to question if it collected results back into a single response.
I want to know if the current implementation is a good starting point or not.
from crosscat.
There is no 'reduce' step because Crosscat's job is just to apply a transition operator to each of a number of independent states -- it's all 'map', and it is embarrassingly parallelizable, so any parallelism you throw at it should stick, no matter how trivial.
The MultiprocessingEngine is just LocalEngine with Python map
replaced by multiprocessing.pool().map
to transition the states in separate processes. Doing the same on different computers will certainly work just fine.
from crosscat.
Thanks. That definitely answers my question. I'll be looking to contribute some code as soon as possible.
from crosscat.
Related Issues (20)
- test_multiple_col_ensure.py is stochastic
- pls help run on windows 10 machine with anaconda python 2.7 HOT 3
- dha_example_multiprocessing HOT 1
- missing dependencies / environment compatibilities HOT 5
- rename pypi package: CrossCat -> crosscat
- How to sample from the posterior HOT 12
- Is it possible to engage more than one core? HOT 5
- .
- How to avoid collapsing to one view?
- Prevent test suite from generating matplotlib figures using X11-based backend HOT 3
- cannot import crosscat.LocalEngine HOT 1
- Allow multistate queries in sample_utils to utilize multiprocessing
- Python 3 compatibility HOT 1
- continuous component model can't handle constant column HOT 1
- Crosscat hyperprior grid on variance parameter is broader than it needs to be HOT 9
- Crosscat should use 256-bit seeds
- "pip install crosscat" fails HOT 3
- Crosscat build fails in anaconda environment: "cpp_code/src/weakprng.cpp:314:38: error: βUINT64_Cβ was not declared in this scope~ HOT 6
- Multistate conditional sampling is inaccurate HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crosscat.