eve-ning / opal Goto Github PK

Rhythm Game Score Estimation through Neural Collaborative Filtering

Python 77.50% Dockerfile 1.26% Shell 21.24%

opal's Issues

Try Elementwise prod for model instead of dotprod

2023 09 is trained on non-filtered 0 support dataset

I just found that the newest model didn't filter out non-active maps and players, this was a mistake due to the main repo using 0 for samples.

On this topic, maybe we should have a debug option for pipeline.sh and its stages?

Automatically update conf.py pathing on building

Perform Uniform sampling for equally weighted sample space training & evaluation

Currently, we just use the whole dataset, which can be

A bit too much samples
It's heavily biased against scores that aren't popular, but are more significant, such as 90%, in contrast to 99.5%, which doesn't say much

To even the training, we should try to uniformly sample across the sample space for a more representative measure and training process

Implement Input via `.env` instead of manually editing `pipeline.sh`

Additionally, if it's possible, fetch env vars from shell

Adjust influence of unpopular maps

Not sure why, but The Living Tombstone - Nippontradamus (Everest Hope) [October's 7K Insane].osu is highly rated.
This caused heavy bias towards maps that are non-competitive.

Implement Shell Model Fetching

Currently, we have train.py setting the model path on the .env file.

I don't think it's a great approach, as it widens the scope of train, which should just train the model, make the model, and possibly return the model path. The problem is that the model path return is not trivial, thus we resorted to I/O-ing to the .env file.

A better solution is to specify a unique model name, which is the pipeline run id, then we can grep the model from opal/models.

Remove Influence of SV maps through SV-ness estimation

Currently, maps such as Backbeat Maniac & Perthed are highly overrated by the rankings.
For the future purposes of beatmap difficulty estimation, we should simply remove them.

To do so, we'll estimate the svness of each map, to find an optimal threshold where a map is "sv" or not.

Wrap preprocessing in docker compose

Currently in #29 , the preprocessing is done via a run.sh, while others are done with docker compose up --build.

This inconsistency is ugly, and furthermore, it can be annoying to dig into a long run.sh

Implement Lazy Dataset loading

Currently, the pipeline script in #29 ALWAYS runs the dataset preprocessing. This can be wasteful especially if

the preprocessing SQL is the same.
the dataset is the same

An idea would be to hash the SQL+Dataset string as a unique identifier for the dataset. However, it'd be good if we can also include how the dataset was generated, i.e. metadata for the .csv.

Maps with DT on these thresholded maps would have a hard threshold on whatever SR its DT is at
Maps with HT will go below this threshold

We need to threshold maps w.r.t. map and speed.

eve-ning / opal Goto Github PK

opal's Issues

Try Elementwise prod for model instead of dotprod

2023 09 is trained on non-filtered 0 support dataset

Automatically update conf.py pathing on building

Perform Uniform sampling for equally weighted sample space training & evaluation

Implement Input via `.env` instead of manually editing `pipeline.sh`

Adjust influence of unpopular maps

Implement Shell Model Fetching

Remove Influence of SV maps through SV-ness estimation

Wrap preprocessing in docker compose

Implement Lazy Dataset loading

Monte Carlo Dropout Evaluation for Confidence Intervals

Test Issue

SR thresholding should consider maps with mods

Dataset export stops halfway

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent