Code Monkey home page Code Monkey logo

minds's Introduction

logo

MINDS is a framework designed to integrate multimodal oncology data. It queries and integrates data from multiple sources, including clinical data, genomic data, and imaging data from the NIH NCI CRDC and TCIA portals.

Installation

Currently the cloud version of MINDS is in closed beta, but, you can still recreate the MINDS database locally. To get the local version of the MINDS database running, you will need to setup a MySQL database and populate it with the MINDS schema. This can be easily done using a docker container. First, you will need to install docker. You can find the installation instructions for your operating system here. Next, you will need to pull the MySQL docker image and run a container with the following command.

NOTE: Please replace my-secret-pw with your desired password and port with the port you want to use to access the database. The default port for MySQL is 3306. The following command will not work until you replace port with a valid port number.

docker run -d --name minds -e MYSQL_ROOT_PASSWORD=my-secret-pw -e MYSQL_DATABASE=minds -p port:3306 mysql

Finally, to install the MINDS python package use the following pip command:

pip install git+https://github.com/lab-rasool/MINDS.git

After installing the package, please create a .env file in the root directory of the project with the following variables:

HOST=127.0.0.1
PORT=3306
DB_USER=root
PASSWORD=my-secret-pw
DATABASE=minds   

Usage

Initial setup and automated updates

If you have locally setup the MINDS database, then you will need to populate it with data. To do this, or to update the database with the latest data, you can use the following command:

# Import the minds package
import minds

# Update the database with the latest data
minds.update()

Querying the MINDS database

The MINDS python package provides a python interface to the MINDS database. You can use this interface to query the database and return the results as a pandas dataframe.

import minds

# get a list of all the tables in the database
tables = minds.get_tables()

# get a list of all the columns in a table
columns = minds.get_columns("clinical")

# Query the database directly
query = "SELECT * FROM minds.clinical WHERE project_id = 'TCGA-LUAD' LIMIT 10"
df = minds.query(query)

Building the cohort and downloading the data

# Generate a cohort to download from query
query_cohort = minds.build_cohort(query=query, output_dir="./data")

# or you can now directly supply a cohort from GDC
gdc_cohort = minds.build_cohort(gdc_cohort="cohort_Unsaved_Cohort.2024-02-12.tsv", output_dir="./data")

# to get the cohort details
gdc_cohort.stats()

# to download the data from the cohort to the output directory specified
# you can also specify the number of threads to use and the modalities to exclude or include
gdc_cohort.download(threads=12, exclude=["Slide Image"])

Please cite our work

@Article{s24051634,
    AUTHOR = {Tripathi, Aakash and Waqas, Asim and Venkatesan, Kavya and Yilmaz, Yasin and Rasool, Ghulam},
    TITLE = {Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets},
    JOURNAL = {Sensors},
    VOLUME = {24},
    YEAR = {2024},
    NUMBER = {5},
    ARTICLE-NUMBER = {1634},
    URL = {https://www.mdpi.com/1424-8220/24/5/1634},
    ISSN = {1424-8220},
    DOI = {10.3390/s24051634}
}

Contributing

We welcome contributions from the community. If you would like to contribute to the MINDS project, please read our contributing guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

minds's People

Contributors

aakash-tripathi avatar

Stargazers

 avatar Enzo Tonon Morente avatar Ghulam Rasool avatar

Watchers

Ghulam Rasool avatar  avatar

minds's Issues

Demo video

It would be wonderful if you could record your screen while using MINDS to build a cohort. Ideally, the vide should be < 1 min.

Add cli version on MINDS

Is your feature request related to a problem? Please describe.
Having to setup a full python project can be annoying, I'd like to be able to directly download the cohort I select from GDC.

Describe the solution you'd like
A cli version of minds where you can call it and provide the GDC cohort file as well as the output directory and it downloads all data.

Help Write Test Functions

XKCD #927:

engineering_hubris
Anything that can go wrong will go wrong.

Help Write Test Functions for MINDS

I'm working on expanding test coverage for the MINDS framework to validate functionality and prevent regressions. Help writing unit and integration tests would be greatly appreciated!

Areas Needing Tests

Some key modules and flows that could benefit from testing:

  • minds.update()
    • Validate data is imported correctly
    • Test incremental updates
  • minds.build_cohort()
    • Test different query and filter combinations
    • Validate generated cohort data
  • minds.download()
    • Test download of different cohort formats
    • Validate correct data is downloaded
  • Database query interface
    • Unit test query building
    • Validate queries return expected results
  • Core data processing modules
    • etl/ - Validate data transformations
    • integrate/ - Test multimodal integrations

Guidelines

When writing tests, please:

  • Use PyTest for unit tests and pytest-docker for integration tests
  • Follow file naming test_<module>.py
  • Use descriptive assert statements
  • Use fixtures and parametrize where applicable

Assistance Appreciated

Additional test coverage will go a long way to improving MINDS reliability and maintainability. Any help is greatly appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.