Comments (18)
I suggest creating a starter repo using the structure and styling of the computing-tidymodels
repo. Once you get all the Python bits set up, ping me.
from website.
We would make a computing-python
repo and keep the same css and organization. The owner of that repo would have to decide on using Jupyter Notebooks or a more markdown approach for the most Pythonic approach.
For libraries... ¯\(ツ)/¯ I'd like to avoid extra complexity but would defer to the python community for those decisions.
from website.
I would also like to work on python supplement. @bmreiniger may we collaborate on this?
from website.
I'll also toss in my hat for a collaboration on sklearn/python code. Could be a fun project!
from website.
I'd like to work on this too. I have a decent knowledge of python for ML (Kaggle notebooks GM). As already mentionned it is difficult to imagine working without pandas sklearn and matplotlib. If plotnine is mentionned to replace matplotlib, I should mention polars that has a grammar closer to the tidyverse and is significantly better than pandas.
from website.
There is another plotting option, lets-plot
from website.
I too would like to work on the python code, @bmreiniger lets collaborate on this?
from website.
I'll recant my previous statement:
I'd like to avoid extra complexity but would defer to the python community for those decisions.
Use whatever libraries you see fit. We use a ton of R packages to make the book (that's the way R is); use anything that you think makes the best results.
from website.
Also, I can export the data sets to a more suitable format to Python to ingest. What do you suggest? csv?
from website.
I'll probably be more useful on content, but I have a little site deployment experience; when I get some time I'll draft something. If anybody else knows more and/or has more time, jump in. My first thoughts:
- Quarto in the same repo as tidyverse coding, with panelsets. I still think this is attractive enough to do a demo of. On the other hand, fully rebuilding the site would require both an R and a python env...
- Quarto with qmd files and python snippets. This mirrors the tidyverse version the closest, and styling should be trivially very close as well.
- Quarto with ipynb files. Nice that the jupyter notebooks could be downloaded and executed directly, but git diffs will be unpleasant.
- sphinx-gallery I think is how sklearn generates its examples. Straight python means easy diffs and easily runnable, markup in comments for text sections. But styling will be harder, I imagine.
- ...?
As for data format, csv is probably fine. At least until something comes up to suggest otherwise.
On plotting, I'd lean toward starting out with matplotlib (and using the plotting functionality of pandas and sklearn), and if anyone can make much nicer plots much easier with another package, then make a PR for us all to look at. Similarly, I'd start with pandas, but if @lcrmorin or others can make something look nicer (or much faster, even for the toy datasets I imagine we'll have here?) using polars then let's see that and decide together.
from website.
A (very) rough demo for option 1: https://bmreiniger.github.io/aml4td-demo-computing-python/chapters/whole-game.html
from website.
I like that! Sphix-gallery from option 3 looks nice as well but this is an area I'm not well versed in so I don't have a strong opinion.
On the subject of plotting libraries, another option I'm fond of is using the Seaborn objects API: https://seaborn.pydata.org/tutorial/objects_interface.html
This allows one to approximate a ggplot-like grammer of graphics using method chaining. As it says in the docs, it's still early in development but might be worth trying out.
from website.
We've experimented with side-by-side R/python code and I've never seen it work all that well. I think that it should be Python only.
Based on other things that I've done, many of the people consuming the main site and these computing pages are not going to be well versed in Python or R. We'll need to strike a balance between helpful content for beginners and more experienced readers (including "how to install" docs).
That said, I think that @bmreiniger's options 1 and 2 are good 9but I've never seen Sphix-gallery until now and don't know if that works with Quarto).
from website.
The demo looks good!
There are some nice Posit Python packages for tables and interactivity and many others unrelated to Posit (obviously).
Data splitting. sklearn’s train_test_split doesn’t support stratifying on a continuous outcome.
I was asked to discuss a PR or maybe a pip about this pre-pandemic. ¯\(ツ)/¯
There will be a lot of inconsistencies where R or Python have different (or more extensive) capabilities. It doesn't have to be a perfect reproduction of what is on the main site.
from website.
Best way to go is usually to stratify by pd.cut(df.target, n_grp, labels=False)
... regarding code translation I have found LMMs to be very good at the task. Might be interesting to try this solution.
from website.
I think Sphinx would be instead of Quarto. I'd like to put the same sort of demo together for that, but I suspect it'll end up being similar amount of setup/work, with a very slight benefit of being pure .py scripts, and the detriment of being styled very differently from the rest of the project (barring a lot of work in defining a sphinx style/template).
I had some trouble getting renv set up, but now have a working demo of R+python in tabsets. Since it's in a branch of this repo, I don't know how to most readily make it viewable; you can download the html view it here. But (1) it requires managing both envs (python inside of reticulate), (2) during render both sets of code run, effectively doubling the runtime and memory usage, and (3) switching between the rendered tabsets make the rest of the page jump around when they're of different length; so I agree with @topepo that it's not worth it.
So it seems approach (1) is probably best, and I'll try to clean it up, complete with a python env. (Maybe I'll still demo sphinx for the sake of having done it.) So, another early question: which environment manager? I'd suggest conda or Pipenv; I find conda more intuitive, and Pipenv more rigorous.
from website.
I want to keep the repos on Quarto just so that they are in one format. :-/
You can use Jupyter notebooks or basic Python chunks; you won't need R for anything.
from website.
Related Issues (12)
- additional individual transformations
- Dark mode theme
- a little more on data characteristics
- A small section on modeling philosophies
- where to initially discuss the variance/bais tradeoff? HOT 2
- Edits to README.md
- 2023-12-19 release
- Add basic regression tests
- add bland altman and other residual plots to calibration discussions
- Better styling for callout boxes HOT 9
- better blending of overlapping colors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from website.