Balance the train and test sets about deep-packet HOT 3 CLOSED

munhouiani commented on September 23, 2024

Balance the train and test sets

from deep-packet.

Comments (3)

munhouiani commented on September 23, 2024

If we have almost the same amount of packets for every label, can we skip the undersampling?

Yes

The question is how to have almost the same amount not only for the train set, but also for the test set?

I need your context on the reason why you want to do this.

from deep-packet.

dimitrov89 commented on September 23, 2024

The question is how to have almost the same amount not only for the train set, but also for the test set?

I need your context on the reason why you want to do this.

I have unbalanced set.

/application_classification/train.parquet
label count
  16761
  16761
  ...
/application_classification/test.parquet
label count
  57476
  4232

I have now around 15 labels (my own dataset for other application) and the test set is very unbalanced, from 4k to 57k. Doing an evaluation in this way is not precise I suppose.

from deep-packet.

munhouiani commented on September 23, 2024

I presume the distribution of your dataset (test set) is similar to your actual environment. So I would suggest you keep the exact distribution of the test set.

You can get the evaluation result for each individual label after the model is trained. E.g., what is the precision/recall of label 1, i.e. treating the rest of the data with other labels as the "negative sample" and the data with label 1 as the "positive samples". You should get the precision/recall for your label 1 data under such a setting. Repeat this for all labels. You will know how your model performs.

from deep-packet.

Related Issues (20)

Recommend Projects

Balance the train and test sets about deep-packet HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent