Code Monkey home page Code Monkey logo

Comments (4)

MrPowers avatar MrPowers commented on August 16, 2024

@jrbourbeau - Thanks for opening this ticket.

A lot of the benchmarking work with dev advocacy & OSS engineering will overlap (e.g. generating the datasets, creating code to run benchmarks, making sure the Dask queries are optimized), but the actual infrastructure will differ a bit.

For the h2o benchmarks, we'll need to replicate their infrastructure exactly, which is a single r3-8xlarge node. I am hoping to create an issue that shows if we structure the code like this and use your same exact infrastructure, then the Dask queries run 30 times faster (exact numbers to be confirmed later).

For the Databricks benchmarks which I'd also like to address, we'll need to use the environments they list in their blog post.

The GitHub Actions / Coiled infrastructure you'll setup will be great for other dev advocacy content down the road. In the near term, I'll hope to pair with the product engineers on replicating the h2o / Databricks environments exactly and rerunning their benchmarks with properly structured code.

Let me know if this plan sounds alright with you. I get the feeling that there is a widespread impression that "Dask is slow". I'm not seeing that in reality with the benchmarks I am running. I am hoping to dig up the truth here and write content that clears up this misinformation.

from benchmarks.

shughes-uk avatar shughes-uk commented on August 16, 2024

Happy to help with any questions on getting the infrastructure you need. The #engineering-clusters channel would also be a good place to ask.

from benchmarks.

MrPowers avatar MrPowers commented on August 16, 2024

Cool, joined #engineering-clusters, thanks @shughes-uk!!

from benchmarks.

ncclementi avatar ncclementi commented on August 16, 2024

Closing this as obsolete. We have a separate tracking ticket for a blogpost on h2o benchmarks as well as we are about to invest work to improve the performance of the queries.

Closing in favor of https://github.com/orgs/coiled/projects/12/views/3

from benchmarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.