Code Monkey home page Code Monkey logo

Comments (18)

topepo avatar topepo commented on May 26, 2024 2

I suggest creating a starter repo using the structure and styling of the computing-tidymodels repo. Once you get all the Python bits set up, ping me.

from website.

topepo avatar topepo commented on May 26, 2024

We would make a computing-python repo and keep the same css and organization. The owner of that repo would have to decide on using Jupyter Notebooks or a more markdown approach for the most Pythonic approach.

For libraries... ¯\(ツ)/¯ I'd like to avoid extra complexity but would defer to the python community for those decisions.

from website.

mermast avatar mermast commented on May 26, 2024

I would also like to work on python supplement. @bmreiniger may we collaborate on this?

from website.

ddixonAI avatar ddixonAI commented on May 26, 2024

I'll also toss in my hat for a collaboration on sklearn/python code. Could be a fun project!

from website.

lcrmorin avatar lcrmorin commented on May 26, 2024

I'd like to work on this too. I have a decent knowledge of python for ML (Kaggle notebooks GM). As already mentionned it is difficult to imagine working without pandas sklearn and matplotlib. If plotnine is mentionned to replace matplotlib, I should mention polars that has a grammar closer to the tidyverse and is significantly better than pandas.

from website.

mermast avatar mermast commented on May 26, 2024

There is another plotting option, lets-plot

from website.

sulphatet avatar sulphatet commented on May 26, 2024

I too would like to work on the python code, @bmreiniger lets collaborate on this?

from website.

topepo avatar topepo commented on May 26, 2024

I'll recant my previous statement:

I'd like to avoid extra complexity but would defer to the python community for those decisions.

Use whatever libraries you see fit. We use a ton of R packages to make the book (that's the way R is); use anything that you think makes the best results.

from website.

topepo avatar topepo commented on May 26, 2024

Also, I can export the data sets to a more suitable format to Python to ingest. What do you suggest? csv?

from website.

bmreiniger avatar bmreiniger commented on May 26, 2024

I'll probably be more useful on content, but I have a little site deployment experience; when I get some time I'll draft something. If anybody else knows more and/or has more time, jump in. My first thoughts:

  1. Quarto in the same repo as tidyverse coding, with panelsets. I still think this is attractive enough to do a demo of. On the other hand, fully rebuilding the site would require both an R and a python env...
  2. Quarto with qmd files and python snippets. This mirrors the tidyverse version the closest, and styling should be trivially very close as well.
  3. Quarto with ipynb files. Nice that the jupyter notebooks could be downloaded and executed directly, but git diffs will be unpleasant.
  4. sphinx-gallery I think is how sklearn generates its examples. Straight python means easy diffs and easily runnable, markup in comments for text sections. But styling will be harder, I imagine.
  5. ...?

As for data format, csv is probably fine. At least until something comes up to suggest otherwise.

On plotting, I'd lean toward starting out with matplotlib (and using the plotting functionality of pandas and sklearn), and if anyone can make much nicer plots much easier with another package, then make a PR for us all to look at. Similarly, I'd start with pandas, but if @lcrmorin or others can make something look nicer (or much faster, even for the toy datasets I imagine we'll have here?) using polars then let's see that and decide together.

from website.

bmreiniger avatar bmreiniger commented on May 26, 2024

A (very) rough demo for option 1: https://bmreiniger.github.io/aml4td-demo-computing-python/chapters/whole-game.html

from website.

ddixonAI avatar ddixonAI commented on May 26, 2024

I like that! Sphix-gallery from option 3 looks nice as well but this is an area I'm not well versed in so I don't have a strong opinion.

On the subject of plotting libraries, another option I'm fond of is using the Seaborn objects API: https://seaborn.pydata.org/tutorial/objects_interface.html

This allows one to approximate a ggplot-like grammer of graphics using method chaining. As it says in the docs, it's still early in development but might be worth trying out.

from website.

topepo avatar topepo commented on May 26, 2024

We've experimented with side-by-side R/python code and I've never seen it work all that well. I think that it should be Python only.

Based on other things that I've done, many of the people consuming the main site and these computing pages are not going to be well versed in Python or R. We'll need to strike a balance between helpful content for beginners and more experienced readers (including "how to install" docs).

That said, I think that @bmreiniger's options 1 and 2 are good 9but I've never seen Sphix-gallery until now and don't know if that works with Quarto).

from website.

topepo avatar topepo commented on May 26, 2024

The demo looks good!

There are some nice Posit Python packages for tables and interactivity and many others unrelated to Posit (obviously).

Data splitting. sklearn’s train_test_split doesn’t support stratifying on a continuous outcome.

I was asked to discuss a PR or maybe a pip about this pre-pandemic. ¯\(ツ)

There will be a lot of inconsistencies where R or Python have different (or more extensive) capabilities. It doesn't have to be a perfect reproduction of what is on the main site.

from website.

lcrmorin avatar lcrmorin commented on May 26, 2024

Best way to go is usually to stratify by pd.cut(df.target, n_grp, labels=False)... regarding code translation I have found LMMs to be very good at the task. Might be interesting to try this solution.

from website.

bmreiniger avatar bmreiniger commented on May 26, 2024

I think Sphinx would be instead of Quarto. I'd like to put the same sort of demo together for that, but I suspect it'll end up being similar amount of setup/work, with a very slight benefit of being pure .py scripts, and the detriment of being styled very differently from the rest of the project (barring a lot of work in defining a sphinx style/template).

I had some trouble getting renv set up, but now have a working demo of R+python in tabsets. Since it's in a branch of this repo, I don't know how to most readily make it viewable; you can download the html view it here. But (1) it requires managing both envs (python inside of reticulate), (2) during render both sets of code run, effectively doubling the runtime and memory usage, and (3) switching between the rendered tabsets make the rest of the page jump around when they're of different length; so I agree with @topepo that it's not worth it.

So it seems approach (1) is probably best, and I'll try to clean it up, complete with a python env. (Maybe I'll still demo sphinx for the sake of having done it.) So, another early question: which environment manager? I'd suggest conda or Pipenv; I find conda more intuitive, and Pipenv more rigorous.

from website.

topepo avatar topepo commented on May 26, 2024

I want to keep the repos on Quarto just so that they are in one format. :-/

You can use Jupyter notebooks or basic Python chunks; you won't need R for anything.

from website.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.