Code Monkey home page Code Monkey logo

coiled-resources's Introduction

coiled-resources

Notebooks that support content like blogs and videos.

Project goals

  • Make it easy for you to reproduce any computations covered in blogs or videos
  • Show best practices for organizing a repo with hundreds of notebooks & tens of environments

Here are some notable blog posts that are backed by notebooks in this repo:

Blog post Notebook link
Speed up pandas query 10x Notebook
Convert Dask DataFrame to pandas DataFrame Notebook
Convert Parquet to CSV Notebook

Repo organization

This repo contains notebooks that are used in blogs and other content. The notebooks are cleanly organized, so you can easily find the notebook that corresponds to a blog post. For example, the blogs/save-numpy-dask-array-to-zarr.ipynb notebook corresponds with the coiled.io/blog/save-numpy-dask-array-to-zarr/ blog post. Notice how the notebook name aligns with the blog post URL.

The instructions for creating an environment to run each notebook are at the top of every notebook. The following setup instruction will work for most of the notebooks.

Setting up your machine

You can install the dependencies on your local machine to run these notebooks by creating a conda environment:

conda env create -f envs/crt-004.yml

crt stands for coiled-runtime, which pins a set of Dask runtime dependencies that are known to happily coexist.

Activate the environment with conda activate crt-004.

Open the project in your browser with jupyter lab.

Create Coiled software environments

To a Coiled software environment that matches you local environment, run a command like this: coiled env create -n crt-004 --conda envs/crt-004.yml.

Your Coiled sofware environment should always match your local environment exactly.

Here's how to create a cluster that uses the coiled-runtime software environment: cluster = coiled.Cluster(name="powers-crt-004", software="crt-004", n_workers=5).

Notebooks

Some of the notebooks are designed to run locally and others run on cloud machines via Coiled.

You can follow the Coiled getting started guide to get your machine setup. Coiled gives you some free credits, so you can easily try out the platform.

Some notebooks in this repo require conda environments with additional customization. You can find environment.yml files to build those environments in the respective directories.

Contributing

We welcome community contributions, especially MCVE analyses that others will find useful.

Feel free to create an issue and we'll be happy to brainstorm contributions.

coiled-resources's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coiled-resources's Issues

Reorganize notebooks that correspond with blog posts

I'd be nice to organize the notebooks in accordance with the blog URLs, so they're easy to find.

The notebook for https://coiled.io/blog/dask-read-parquet-into-dataframe/ could be stored in blog/dask-read-parquet-into-dataframe for example.

That'd make it easy to find all the notebooks that correspond with the blog posts.

Separate coiled-datasets to separate repo

The coiled-datasets should be abstracted to a separate repo. That'll make them easier to document and will make it easier to setup best practices. Lots of the notebooks in this repo depend on datasets created by coiled-datasets, but we don't need the coiled-datasets creation code in this repo.

Repo organization

This repo has lots of top level directories that make it hard to follow. We want to have less top-level directories, but also minimize breaking changes to notebook links.

The repo name indicates that the notebooks should pertain to Coiled. We may want to abstract "Dask only" notebooks to another repo to facilitate organization.

Here's a possible organization structure:

dask/
  bag/
    json-to-parquet.ipynb
  dataframe/
    parquet/
      column-pruning.ipynb
      predicate-pushdown.ipynb
     memory-usage.ipynb
integrations/
  prefect/
    exmple-workflow.ipynb
  xgboost/
datasets/
pandas/

Let me know what you think.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.