gdc-tcga-file-fetcher's Introduction

File UUID fetcher for GDC/TCGA

These two scripts together will fetch all the cases in the GDC which have Primary Solid Tumor and Solid Tissue Normal sample sequenced.

You can put in filters for type of sequencing and type of tumor.

Detailed instructions of how to use these scripts:

Try to be on an educational network such as eduroam. Although I'm not certain, but I've seen significant delays in API response on private networks; AT&T in my case.

Step 1: Set the data_path variable to the location of your choice. It's immediately after this comment section.

Step 2: See what all sequencing strategies you want, and set the strategies variable accordingly.

Step 3: Put in primary sites in primary_sites variable. Some exaples include: Kidney, Ovary, Brain etc.

Step 4: In order to get complete intersection of files, put if_files as FALSE. Run this script once.

Step 5: Then switch back the if_files flag to TRUE and run this script once again. This should create some json files in your data directory. Please check.

Step 6: Run the "gather_gdc_uuid_links.py" script which will create some links file.

Note: These link files have the cases which were sequenced for both normal and tumor samples.

Hope this helps! Thanks.

Recommend Projects

shivamsharma13 / gdc-tcga-file-fetcher Goto Github PK

gdc-tcga-file-fetcher's Introduction

File UUID fetcher for GDC/TCGA

These two scripts together will fetch all the cases in the GDC which have Primary Solid Tumor and Solid Tissue Normal sample sequenced.

You can put in filters for type of sequencing and type of tumor.

gdc-tcga-file-fetcher's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent