Code Monkey home page Code Monkey logo

2024-01-ml-for-autism's Introduction

26 Mars 2024

figures-same-other/ contains CSV and figures to show that it is not just size that matters.

figures-same-other/NSCH_autism_error_mean_sd_more.png

26 Feb 2024

HOCKING-slides-2024-02-26-ml-for-autism.tex makes HOCKING-slides-2024-02-26-ml-for-autism.pdf slides with new drawings

drawing-cv-feature-sets.svg makes drawing-cv-feature-sets.pdf

drawing-cv-same-other-years.svg makes drawing-cv-same-other-years-1.pdf drawing-cv-same-other-years-2.pdf drawing-cv-same-other-years-3.pdf drawing-cv-same-other-years-4.pdf

23 Feb 2024

download-nsch-mlr3batchmark.R launches jobs, here is a preliminary analysis of how much time and memory they take:

> usage.wide[order(megabytes_max), .(learner_id, task_id, megabytes_min, megabytes_median, megabytes_max, megabytes_length)]
                   learner_id        task_id megabytes_min megabytes_median megabytes_max megabytes_length
                       <char>         <char>         <num>            <num>         <num>            <int>
 1:         classif.cv_glmnet    behavior.15        0.0000           0.0000        0.0000               60
 2:         classif.cv_glmnet comorbidity.30        0.0000           0.0000        0.0000               60
 3:         classif.cv_glmnet     culture.14        0.0000           0.0000        0.0000               60
 4:       classif.featureless comorbidity.30        0.0000           0.0000        0.0000               60
 5:       classif.featureless  healthcare.88        0.0000           0.0000        0.0000               60
 6:             classif.rpart       birth.24        0.0000           0.0000        0.0000               60
 7:             classif.rpart comorbidity.30        0.0000           0.0000        0.0000               60
 8:             classif.rpart     culture.14        0.0000           0.0000        0.0000               60
 9:             classif.rpart  healthcare.88        0.0000           0.0000        0.0000               60
10:       classif.featureless     culture.14        0.0000           0.0000      184.3555               60
11:       classif.featureless       birth.24        0.0000           0.0000      185.0703               60
12:             classif.rpart    behavior.15        0.0000           0.0000      195.0234               60
13:       classif.featureless    behavior.15        0.0000           0.0000      196.5000               60
14:         classif.cv_glmnet       birth.24        0.0000           0.0000      419.1250               60
15:           classif.xgboost     culture.14      410.0664         425.7168      516.3867               60
16:           classif.xgboost       birth.24      411.4688         446.2695      518.8477               60
17:           classif.xgboost    behavior.15      413.1992         431.9512      519.3633               60
18:           classif.xgboost comorbidity.30      411.9727         451.4375      520.8359               60
19: classif.nearest_neighbors     culture.14      405.4688         465.7988      531.1367               60
20: classif.nearest_neighbors    behavior.15      401.6992         462.6016      552.0781               60
21: classif.nearest_neighbors       birth.24      409.3086         472.2266      588.5117               60
22: classif.nearest_neighbors comorbidity.30      435.0664         480.6035      594.1562               60
23:         classif.cv_glmnet  healthcare.88        0.0000         453.3457      606.5117               60
24:           classif.xgboost  healthcare.88      519.7617         614.1836      747.3711               60
25: classif.nearest_neighbors  healthcare.88      536.2422         613.3730      843.5859               60
26:            classif.ranger  healthcare.88     1192.5625        1192.5625     1192.5625                1
27:            classif.ranger comorbidity.30     1201.4414        1347.5469     1944.3164               30
28:            classif.ranger     culture.14      898.6367        1336.7637     1966.7070               60
29:            classif.ranger    behavior.15     1003.0703        1372.0977     2167.9062               60
30:            classif.ranger       birth.24     1244.2656        1758.0156     2780.9922               43
                   learner_id        task_id megabytes_min megabytes_median megabytes_max megabytes_length
> usage.wide[order(hours_max), .(learner_id, task_id, hours_min, hours_median, hours_max, hours_length)]
                   learner_id        task_id    hours_min hours_median    hours_max hours_length
                       <char>         <char>        <num>        <num>        <num>        <int>
 1:       classif.featureless     culture.14 0.0005555556 0.0008333333  0.001111111           60
 2:             classif.rpart     culture.14 0.0005555556 0.0008333333  0.001111111           60
 3:       classif.featureless    behavior.15 0.0005555556 0.0011111111  0.001388889           60
 4:       classif.featureless       birth.24 0.0005555556 0.0008333333  0.001388889           60
 5:             classif.rpart comorbidity.30 0.0008333333 0.0008333333  0.001388889           60
 6:             classif.rpart    behavior.15 0.0008333333 0.0011111111  0.001666667           60
 7:             classif.rpart       birth.24 0.0005555556 0.0008333333  0.001666667           60
 8:       classif.featureless comorbidity.30 0.0005555556 0.0011111111  0.001944444           60
 9:       classif.featureless  healthcare.88 0.0005555556 0.0009722222  0.001944444           60
10:             classif.rpart  healthcare.88 0.0008333333 0.0011111111  0.002222222           60
11:         classif.cv_glmnet     culture.14 0.0011111111 0.0016666667  0.002500000           60
12:         classif.cv_glmnet    behavior.15 0.0019444444 0.0025000000  0.003333333           60
13:         classif.cv_glmnet       birth.24 0.0013888889 0.0019444444  0.004722222           60
14:         classif.cv_glmnet comorbidity.30 0.0016666667 0.0027777778  0.005000000           60
15:         classif.cv_glmnet  healthcare.88 0.0047222222 0.0094444444  0.020000000           60
16:           classif.xgboost     culture.14 0.0102777778 0.0166666667  0.027777778           60
17:           classif.xgboost    behavior.15 0.0169444444 0.0254166667  0.048888889           60
18:           classif.xgboost comorbidity.30 0.0252777778 0.0477777778  0.080833333           60
19: classif.nearest_neighbors    behavior.15 0.0138888889 0.0291666667  0.084722222           60
20:           classif.xgboost       birth.24 0.0241666667 0.0366666667  0.087222222           60
21: classif.nearest_neighbors     culture.14 0.0122222222 0.0268055556  0.096666667           60
22: classif.nearest_neighbors       birth.24 0.0150000000 0.0306944444  0.099444444           60
23: classif.nearest_neighbors comorbidity.30 0.0183333333 0.0398611111  0.170277778           60
24:           classif.xgboost  healthcare.88 0.0608333333 0.1200000000  0.213333333           60
25: classif.nearest_neighbors  healthcare.88 0.0566666667 0.1898611111  0.798888889           60
26:            classif.ranger  healthcare.88 5.3941666667 5.3941666667  5.394166667            1
27:            classif.ranger     culture.14 1.1869444444 2.5109722222  6.713055556           60
28:            classif.ranger    behavior.15 1.5277777778 3.2013888889  8.618611111           60
29:            classif.ranger comorbidity.30 3.6255555556 4.6951388889 10.774444444           30
30:            classif.ranger       birth.24 2.4188888889 5.0616666667 12.538888889           43
                   learner_id        task_id    hours_min hours_median    hours_max hours_length

Looks like ranger is by far the slowest and more memory intensive, so for now I will omit that.

Below we see that total time for CV experiment with 2700 iterations is 240 hours, so since we did this in a 4 hour time limit, this is about 60x speedup.

2700: 3.194722222  1810.023 classif.nearest_neighbors     all.364
> sum(usage.long$hours)
[1] 240.7103
> sum(usage.long$hours)/4
[1] 60.17757

22 Feb 2024

download-nsch-convert-do.R makes download-nsch-convert-do-2019-2020.csv

> out.dt[, table(survey_year, Autism)]
           Autism
survey_year   Yes    No
       2019   859 28003
       2020  1255 40826

download-nsch-counts.R separated out from download-nsch.R

18 Dec 2023

https://docs.google.com/spreadsheets/d/19Tm75T4wNN4yITlXuUMNVc22yzHmmzVcMY1GBVGsEnQ/edit#gid=0 is the source file for NSCH_categories.csv

download-nsch.R makes download-nsch-nrow-ncol.csv and download-nsch-column-counts.csv and NSCH_categories_NA_counts.csv after which I manually added different categories for the least missing columns, NSCH_categories_NA_counts_TDH.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.