Code Monkey home page Code Monkey logo

opal's Introduction

I am Data Magician Eve-ning! ๐Ÿช„

opal's People

Contributors

eve-ning avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

opal's Issues

Remove Influence of SV maps through SV-ness estimation

Currently, maps such as Backbeat Maniac & Perthed are highly overrated by the rankings.
For the future purposes of beatmap difficulty estimation, we should simply remove them.

To do so, we'll estimate the svness of each map, to find an optimal threshold where a map is "sv" or not.

Implement Lazy Dataset loading

Currently, the pipeline script in #29 ALWAYS runs the dataset preprocessing. This can be wasteful especially if

  • the preprocessing SQL is the same.
  • the dataset is the same

An idea would be to hash the SQL+Dataset string as a unique identifier for the dataset. However, it'd be good if we can also include how the dataset was generated, i.e. metadata for the .csv.

Perform Uniform sampling for equally weighted sample space training & evaluation

Currently, we just use the whole dataset, which can be

  1. A bit too much samples
  2. It's heavily biased against scores that aren't popular, but are more significant, such as 90%, in contrast to 99.5%, which doesn't say much

To even the training, we should try to uniformly sample across the sample space for a more representative measure and training process

2023 09 is trained on non-filtered 0 support dataset

I just found that the newest model didn't filter out non-active maps and players, this was a mistake due to the main repo using 0 for samples.

On this topic, maybe we should have a debug option for pipeline.sh and its stages?

SR thresholding should consider maps with mods

Currently, we threshold maps with a hard SR. which is fine, however,

  1. Maps with DT on these thresholded maps would have a hard threshold on whatever SR its DT is at
  2. Maps with HT will go below this threshold

We need to threshold maps w.r.t. map and speed.

Wrap preprocessing in docker compose

Currently in #29 , the preprocessing is done via a run.sh, while others are done with docker compose up --build.

This inconsistency is ugly, and furthermore, it can be annoying to dig into a long run.sh

Implement Shell Model Fetching

Currently, we have train.py setting the model path on the .env file.

I don't think it's a great approach, as it widens the scope of train, which should just train the model, make the model, and possibly return the model path. The problem is that the model path return is not trivial, thus we resorted to I/O-ing to the .env file.

A better solution is to specify a unique model name, which is the pipeline run id, then we can grep the model from opal/models.

Adjust influence of unpopular maps

Not sure why, but The Living Tombstone - Nippontradamus (Everest Hope) [October's 7K Insane].osu is highly rated.
This caused heavy bias towards maps that are non-competitive.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.