Code Monkey home page Code Monkey logo

neuro-knowledge-engine's Introduction

A data-driven framework for mapping domains of human neurobiology

Code repository for the article in Nature Neuroscience by Elizabeth Beam, Christopher Potts, Russell Poldrack, & Amit Etkin

Abstract

Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of fMRI data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we employ a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure-function links within domains better replicate in held- out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure-function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.

Pipelines

Data-driven framework

data-driven_pipeline

Approach to computational ontology. A data-driven framework was generated in an integrative manner in a training set of 12,708 human neuroimaging articles with brain coordinate data. First, 118 brain structures were clustered by k-means according to their co-occurrences with 1,683 terms for mental functions. The co-occurrence matrix was weighted by pointwise mutual information (PMI). Second, the top 25 terms for mental functions were assigned to each circuit based on the point-biserial correlation (rpb) of their binarized occurrences with the centroid of occurrences across structures. Third, the number of terms was selected to maximize average ROC-AUC of logistic regression classifiers predicting structure occurrences from term occurrences (forward inference) and term occurrences from structure occurrences (reverse inference) over a range of term list lengths from 5 to 25. Fourth, the number of domains was selected based on the average ROC-AUC of forward and reverse inference classifiers. Occurrences were summed across terms in each list and structures in each circuit, then thresholded by their mean across articles. In the fifth and final step, each domain was named by the mental function term with highest degree centrality of co-occurrences with other terms in the domain.

Expert-determined frameworks

expert-determined_pipeline

Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM). Seed terms from the RDoC and DSM frameworks were translated into the language of the human neuroimaging literature through a computational linguistics approach. Term embeddings of length 100 were trained using GloVe. For RDoC, embeddings were trained on a general human neuroimaging corpus of 29,828 articles (Supplementary Fig. 1b). For the DSM, embeddings were trained on a psychiatric human neuroimaging corpus of 26,070 articles (Supplementary Fig. 1c). Candidate synonyms included terms for mental functions in the case of RDoC and for both mental functions and psychopathology in the case of the DSM, as detailed in Supplementary Table 2. In the first step, the closest synonyms of seed terms were identified based on the cosine similarity of synonym term embeddings with the centroid of embeddings across seed terms in each domain. Second, the number of terms for each domain was selected to maximize cosine similarity with the centroid of seed terms. Third, the mental function term lists for each domain were mapped onto brain circuits based on positive pointwise mutual information (PPMI) of term and structure co-occurrences across the corpus of 18,155 articles with activation coordinate data (Supplementary Fig. 1a). Structures were included in the circuit if the FDR of the observed PPMI was less than 0.01, determined by comparison to a null distribution generated by shuffling term list features over 10,000 iterations.

Index of Figures

Main Text

Figure Files
1b ontology/ontol_data-driven_lr.ipynb, ontology/ontology.py
1c partition/part_splits.ipynb, partition/partition.py
1d modularity/mod_kvals_lr.ipynb
1e prototype/proto_kvals_lr.ipynb
2a ontology/ontol_data-driven_lr.ipynb
2b prediction/comp_frameworks_lr_k*.ipynb, modularity/comp_frameworks_lr_k*.ipynb, prototype/comp_frameworks_lr_k*.ipynb
2c hierarchy/hier_data-driven_lr_k6-8-22.ipynb
3b ontology/ontol_rdoc.ipynb, ontology/ontology.py
4a ontology/ontol_rdoc.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4b ontology/ontol_data-driven_lr.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4c ontology/ontol_ontol_dsm.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
5b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5h prediction/comp_frameworks_lr.ipynb
6a-f mds/mds.ipynb, mds/mds.py
6g modularity/mod_data-driven_lr.ipynb, modularity/modularity.py
6h modularity/mod_rdoc.ipynb, modularity/modularity.py
6i modularity/mod_dsm.ipynb, modularity/modularity.py
6j modularity/comp_frameworks_lr.ipynb, modularity/modularity.py
6k prototype/proto_data-driven_lr.ipynb, prototype/prototype.py
6l prototype/proto_rdoc.ipynb, prototype/prototype.py
6m prototype/proto_dsm.ipynb, prototype/prototype.py
6n prototype/comp_frameworks_lr.ipynb, prototype/prototype.py

Extended Data

Figure Files
1 corpus/cohorts.ipynb
2-3 ontology/ontol_kvals_lr.ipynb, ontology/ontology.py
4a-b ontology/ontol_data-driven_nn.ipynb, ontology/ontology.py
4c mds/mds.ipynb, mds/mds.py
4d modularity/mod_data-driven_nn.ipynb, modularity/modularity.py
4e prototype/proto_data-driven_nn.ipynb, prototype/prototype.py
5a ontology/ontol_data-driven_terms.ipynb, ontology/ontol_sim_terms.ipynb, ontology/ontology.py
5b-e ontology/ontol_sim_terms.ipynb
6a, d prediction/comp_frameworks_lr_k09.ipynb
6b-c, e-f prediction/pred_data-driven_lr_k09.ipynb
6g-h partition/part_data-driven_lr_k09.ipynb, mds/mds.ipynb
6i Left modularity/comp_frameworks_lr_k09.ipynb
6i Right modularity/mod_data-driven_lr_k09.ipynb
6j Left prototype/comp_frameworks_lr_k09.ipynb
6j Right prototype/proto_data-driven_lr_k09.ipynb
7b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7h-j prediction/comp_frameworks_lr.ipynb
8b, e; 9b, e prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8d, g; 9d, g prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8h; 9h-j prediction/comp_frameworks_nn.ipynb U
10a partition/part_data-driven_lr.ipynb, partition/partition.py
10b partition/part_rdoc.ipynb, partition/partition.py
10c partition/part_dsm.ipynb, partition/partition.py
10d-f tsne/tsne.ipynb

Supplementary Material

Figure Files
1 validation/val_brainmap_top.ipynb
2 validation/val_brainmap_sims.ipynb
3-4 ontology/ontol_kvals_nn.ipynb, ontology/ontology.py
5 stability/stab_data-driven_lr_top.ipynb
6a, d; 7a, d prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6b, e; 7b, e prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6c, f; 7c, f prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6g; 7g-i prediction/comp_frameworks_lr.ipynb
8a, d; 9a, d prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8b, e; 9b, e prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8g; 9g-i prediction/comp_frameworks_nn.ipynb
Table Files
1 data/data_table_coord.ipynb
2 lexicon/preproc_cogneuro.py, lexicon/preproc_psychiatry.py, lexicon/preproc_rdoc.py, lexicon/preproc_dsm.py
3 data/text/pubmed/gen_190428/query.txt, data/text/pubmed/psy_190428/query.txt
4-5 prediction/table_lr-nn.ipynb

neuro-knowledge-engine's People

Contributors

ehbeam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

neuro-knowledge-engine's Issues

cannot download the pre-trained glove embeddings

Hi Team,
I have been reading your awesome paper, and realized that you have all the data and code available on GitHub. ( Much thanks for that)
I want to explore the pre-trained glove embeddings that you have kindly made public.

but I'm not able to access that via git-lfs
it throws the below error

batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/ehbeam/neuro-knowledge-engine.git/info/lfs'

Is there some other way to get my hands on this awesome resource ?

PFA the screenshot ...
Screen Shot 2022-01-14 at 4 07 19 AM

need overall makefile

it would be nice to have a guide so that someone could run the entire code base from beginning to end (excepting maybe the parts that rely on a cluster). one approach would be to generate a Makefile - or at least a complete recipe for the user.

Dependency error when loading the pickled fits (ontology.py)

Hi,

first of all thanks a lot for sharing the code & data for this awesome study!

Trying to recreate the results from the publications with plot_main.py I came across the following error:

  File "<mypath>/neuro-knowledge-engine/ontology/ontology.py", line 102, in load_fits
    fits[direction][k] = pickle.load(open(fit_file, "rb"))
ModuleNotFoundError: No module named 'sklearn.linear_model.logistic'

It seems that this is due to an older version of sklearn, with which the pickle files were created. However, trying out several versions did not fix this error..

Would it be possible to share the package versions you used when creating the pickled fits, so one could set up a suitable environment for running the script?

Many thanks!
L

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.