Code Monkey home page Code Monkey logo

Comments (15)

weixuanfu avatar weixuanfu commented on August 26, 2024

Ok will do:-)

from pmlb.

weixuanfu avatar weixuanfu commented on August 26, 2024

I tried to convert those regression data to tsv file in pmlb But I did not merge them into pmlb repo yet. I found a few dataset is very huge, like >200Mb in tsv format. You can check them in the LPC path I sent over in slack. Should we exclude them?

Another issue with pmlb. The target name of all current classification benchmarks in pmlb is class, I think it is not good for regression benchmark so I used pmlb_reg_target instead, is it OK?

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

How big are the files when gzipped?

Let's call the class equivalent as target.

from pmlb.

weixuanfu avatar weixuanfu commented on August 26, 2024

Even gzip file is more than 110Mb. It is a benchmark with 1 million samples.

target sounds good

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

We might end up changing the PMLB standard such that all target columns are named target.

from pmlb.

lacava avatar lacava commented on August 26, 2024

if you want i can add a collection of benchmark problems used in GP (mostly synthetic).

from pmlb.

weixuanfu avatar weixuanfu commented on August 26, 2024

@lacava sounds good.

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

Sure @lacava, send over a PR and describe them for us. 👍

from pmlb.

lacava avatar lacava commented on August 26, 2024

i was thinking it would also be nice to have a notebook for generating the synthetic datasets, for example for the korns / keijzer / pagie problems. where should that go?

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

Maybe that would be a separate project, unless we were explicitly creating datasets to upload to PMLB.

from pmlb.

lacava avatar lacava commented on August 26, 2024

yea, they would correspond to datasets in the archive.

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

Cool, send a separate PR for it and we'll take a look. :-)

from pmlb.

rhiever avatar rhiever commented on August 26, 2024

@weixuanfu, can you please re-send the PR with the regression benchmarks? Please take care when moving the existing classification datasets around.

from pmlb.

weixuanfu avatar weixuanfu commented on August 26, 2024

OK, will do

from pmlb.

weixuanfu avatar weixuanfu commented on August 26, 2024

Please check PR #15

from pmlb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.