Code Monkey home page Code Monkey logo

cogconstruct-datadriven's Introduction

A data-driven framework for mapping domains of human neurobiology

Code repository for the manuscript by Elizabeth Beam, Christopher Potts, Russell Poldrack, & Amit Etkin

Abstract

Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of fMRI data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we employ a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure-function links within domains better replicate in held- out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure-function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.

Pipelines

Data-driven framework

data-driven_pipeline

Approach to computational ontology. A data-driven framework was generated in an integrative manner in a training set of 12,708 human neuroimaging articles with brain coordinate data. First, 118 brain structures were clustered by k-means according to their co-occurrences with 1,683 terms for mental functions. The co-occurrence matrix was weighted by pointwise mutual information (PMI). Second, the top 25 terms for mental functions were assigned to each circuit based on the point-biserial correlation (rpb) of their binarized occurrences with the centroid of occurrences across structures. Third, the number of terms was selected to maximize average ROC-AUC of logistic regression classifiers predicting structure occurrences from term occurrences (forward inference) and term occurrences from structure occurrences (reverse inference) over a range of term list lengths from 5 to 25. Fourth, the number of domains was selected based on the average ROC-AUC of forward and reverse inference classifiers. Occurrences were summed across terms in each list and structures in each circuit, then thresholded by their mean across articles. In the fifth and final step, each domain was named by the mental function term with highest degree centrality of co-occurrences with other terms in the domain.

Expert-determined frameworks

expert-determined_pipeline

Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM). Seed terms from the RDoC and DSM frameworks were translated into the language of the human neuroimaging literature through a computational linguistics approach. Term embeddings of length 100 were trained using GloVe. For RDoC, embeddings were trained on a general human neuroimaging corpus of 29,828 articles (Supplementary Fig. 1b). For the DSM, embeddings were trained on a psychiatric human neuroimaging corpus of 26,070 articles (Supplementary Fig. 1c). Candidate synonyms included terms for mental functions in the case of RDoC and for both mental functions and psychopathology in the case of the DSM, as detailed in Supplementary Table 2. In the first step, the closest synonyms of seed terms were identified based on the cosine similarity of synonym term embeddings with the centroid of embeddings across seed terms in each domain. Second, the number of terms for each domain was selected to maximize cosine similarity with the centroid of seed terms. Third, the mental function term lists for each domain were mapped onto brain circuits based on positive pointwise mutual information (PPMI) of term and structure co-occurrences across the corpus of 18,155 articles with activation coordinate data (Supplementary Fig. 1a). Structures were included in the circuit if the FDR of the observed PPMI was less than 0.01, determined by comparison to a null distribution generated by shuffling term list features over 10,000 iterations.

Index of Figures

Main Text

Figure Files
1b ontology/ontol_data-driven_lr.ipynb, ontology/ontology.py
1c partition/part_splits.ipynb, partition/partition.py
1d modularity/mod_kvals_lr.ipynb
1e prototype/proto_kvals_lr.ipynb
2a ontology/ontol_data-driven_lr.ipynb
2b prediction/comp_frameworks_lr_k*.ipynb, modularity/comp_frameworks_lr_k*.ipynb, prototype/comp_frameworks_lr_k*.ipynb
2c hierarchy/hier_data-driven_lr_k6-8-22.ipynb
3b ontology/ontol_rdoc.ipynb, ontology/ontology.py
4a ontology/ontol_rdoc.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4b ontology/ontol_data-driven_lr.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4c ontology/ontol_ontol_dsm.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
5b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5h prediction/comp_frameworks_lr.ipynb
6a-f mds/mds.ipynb, mds/mds.py
6g modularity/mod_data-driven_lr.ipynb, modularity/modularity.py
6h modularity/mod_rdoc.ipynb, modularity/modularity.py
6i modularity/mod_dsm.ipynb, modularity/modularity.py
6j modularity/comp_frameworks_lr.ipynb, modularity/modularity.py
6k prototype/proto_data-driven_lr.ipynb, prototype/prototype.py
6l prototype/proto_rdoc.ipynb, prototype/prototype.py
6m prototype/proto_dsm.ipynb, prototype/prototype.py
6n prototype/comp_frameworks_lr.ipynb, prototype/prototype.py

Extended Data

Figure Files
1 corpus/cohorts.ipynb
2-3 ontology/ontol_kvals_lr.ipynb, ontology/ontology.py
4a-b ontology/ontol_data-driven_nn.ipynb, ontology/ontology.py
4c mds/mds.ipynb, mds/mds.py
4d modularity/mod_data-driven_nn.ipynb, modularity/modularity.py
4e prototype/proto_data-driven_nn.ipynb, prototype/prototype.py
5a ontology/ontol_data-driven_terms.ipynb, ontology/ontol_sim_terms.ipynb, ontology/ontology.py
5b-e ontology/ontol_sim_terms.ipynb
6a, d prediction/comp_frameworks_lr_k09.ipynb
6b-c, e-f prediction/pred_data-driven_lr_k09.ipynb
6g-h partition/part_data-driven_lr_k09.ipynb, mds/mds.ipynb
6i Left modularity/comp_frameworks_lr_k09.ipynb
6i Right modularity/mod_data-driven_lr_k09.ipynb
6j Left prototype/comp_frameworks_lr_k09.ipynb
6j Right prototype/proto_data-driven_lr_k09.ipynb
7b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7h-j prediction/comp_frameworks_lr.ipynb
8b, e; 9b, e prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8d, g; 9d, g prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8h; 9h-j prediction/comp_frameworks_nn.ipynb
10a partition/part_data-driven_lr.ipynb, partition/partition.py
10b partition/part_rdoc.ipynb, partition/partition.py
10c partition/part_dsm.ipynb, partition/partition.py
10d-f tsne/tsne.ipynb

Supplementary Material

Figure Files
1 validation/val_brainmap_top.ipynb
2 validation/val_brainmap_sims.ipynb
3-4 ontology/ontol_kvals_nn.ipynb, ontology/ontology.py
5 stability/stab_data-driven_lr_top.ipynb
6a, d; 7a, d prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6b, e; 7b, e prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6c, f; 7c, f prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6g; 7g-i prediction/comp_frameworks_lr.ipynb
8a, d; 9a, d prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8b, e; 9b, e prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8g; 9g-i prediction/comp_frameworks_nn.ipynb
Table Files
1 data/data_table_coord.ipynb
2 lexicon/preproc_cogneuro.py, lexicon/preproc_psychiatry.py, lexicon/preproc_rdoc.py, lexicon/preproc_dsm.py
3 data/text/pubmed/gen_190428/query.txt, data/text/pubmed/psy_190428/query.txt
4-5 prediction/table_lr-nn.ipynb

cogconstruct-datadriven's People

Contributors

ehbeam avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.