Prepare dataset EEG Utrecht

Prep data Utrecht about mcfly HOT 6 CLOSED

nlesc commented on May 28, 2024

Prep data Utrecht

from mcfly.

Comments (6)

vincentvanhees commented on May 28, 2024

I completed the code for converting the African data (Utrecht group) to np arrays, these np arrays are now on the shared drive. The code is in notebook preproces_Guinea-Biseau.ipynb.

Data explanation:

I created a train, validation and test dataset for every experimental condition (eyes closed or eyes open), for 4 second time series, and for 10 seconds time series (3 x 2 x 2 = 12 dataset). Per dataset there is an X and a y file.
The test and validation dataset always have 20 individuals with proportionally the same number of individuals with epilepsy as in the total dataset, and never more than one time series (epoch) for the same individual.
The training dataset are all remaining individuals and all their available epochs. So, this means that in the training dataset there are multiple time series for some of the individuals.

The log.csv file is for my own reference. In this file I am keeping track of which pre-processed csv-files I used for every experimental condition. In this way I can make sure that I will use the same data for the shallow learning in R.

from mcfly.

vincentvanhees commented on May 28, 2024

The way I selected the data means that the proportion of individuals with epilepsy will be the same in the training, test and validation set, but the proportion of epochs is slightly different because for some individuals we will include multiple epochs in the training dataset.

The proportions of epochs for Controls (no epilepsy) out of the total number of epochs ranges between 37-44% in the training sets. In the test and validation dataset this is (always) 45%. However, the advantage of including all the epochs is that have between 34 and 81 epochs per group (control or epilepsy) per experimental condition in the training dataset, (compare this against the 9 controls 11 epilepsy patients in the test and validation sets).

from mcfly.

vincentvanhees commented on May 28, 2024

Possibly relevant for mcfly:
The prelimenary performance of my shallow learning approach on the test set:
Cohen-Kappa coefficient: 0.27
Area under curve: 0.808
(Cohen-Kappa coefficient in model training phase was 0.47)

However, I am not using the validation set at the moment. The code as I have it defines its own validation set as a subsample of the training dataset. This is obviously something I will have to address. Nonetheless, I hope that these performance estimates will only improve after further enhancements of the code.

from mcfly.

vincentvanhees commented on May 28, 2024

just discovered that there is a bug in how i generated the data. I am now fixing this and will put new data on the sharedrived soon

from mcfly.

vincentvanhees commented on May 28, 2024

just updated my analyses followed the bug fixed earlier today.
For protocol = eyes open:
New shallow learning results in test set are: Kappa = 0.596 en AUC = 0.778 in test set
For protocol = eyes closed:
New shallow learning results in test set are: Kappa = 0.490 en AUC = 0.833 in test set

Seems like a nice benchmark for Keras to compete with.

from mcfly.

vincentvanhees commented on May 28, 2024

More elaborate overview of shallow results in guinnea-bissea dataset, now with set.seed constant (forgot to do that in previous run). All results are based on random forrest classification, AUC = Areas under ROC curve:

Protocol: eyes closed
Timewindow: 4 seconds
Wavelet: la10
AUC in test set: 0.78
Kappa coefficient in test set: 0.39
Accuracy in test set: 0.70
Confusion matrix (prediction in row, truth in columns):
control 6 3
epilepsy 3 8

Protocol: eyes open
Timewindow: 4 seconds
Wavelet: la16
AUC in test set: 0.85
Kappa coefficient in test set: 0.38
Accuracy in test set: 0.70
Confusion matrix (prediction in row, truth in columns):
Control 5 4
Epilepsy 2 9

Protocol: eyes closed
Timewindow: 10 seconds
Wavelet: d2
AUC in test set: 0.90
Kappa coefficient in test set: 0.69
Accuracy in test set: 0.85
Confusion matrix (prediction in row, truth in columns):
control 7 2
epilepsy 1 10

Protocol: eyes open
Timewindow: 10 seconds
Wavelet: d10
AUC in test set: 0.83
Kappa coefficient in test set: 0.29
Accuracy in test set: 0.65
Confusion matrix (prediction in row, truth in columns):
Control 5 4
Epilepsy 3 8

from mcfly.

Prep data Utrecht about mcfly HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent