Code Monkey home page Code Monkey logo

3dgenomes / tadbit Goto Github PK

View Code? Open in Web Editor NEW
98.0 17.0 61.0 375.54 MB

TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit the user can map FASTQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the so-called Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models

License: GNU General Public License v3.0

Python 90.88% C++ 2.67% C 5.62% Makefile 0.04% R 0.12% Batchfile 0.01% GDB 0.01% Shell 0.59% Dockerfile 0.07%
python chromatin mapping 3d-models hi-c ngs

tadbit's Introduction

image

Current version: v1.1 image image image

TADbit is a complete Python library to deal with all steps to analyze, model and explore 3C-based data. With TADbit the user can map FASTsQ files to obtain raw interaction binned matrices (Hi-C like matrices), normalize and correct interaction matrices, identify and compare the Topologically Associating Domains (TADs), build 3D models from the interaction matrices, and finally, extract structural properties from the models. TADbit is complemented by TADkit for visualizing 3D models.

Hi-C experiments generate genomic interaction between loci located in the same or in different chromosomes. TADbit is built around the concept of a chromosome, and uses it as a central item to store and compare different Hi-C experiments. The library has been designed to be used by researchers with no expertise in computer science. All-in-one scripts provided in TADbit allow to run the full analysis using one single command line; advanced users may produce their own programs using TADbit as a complementary library.

Contributors

TADbit is currently developed at the MarciusLab with the contributions of François Serra, David Castillo, Marco Di Stefano, Irene Farabella, Mike Goodstadt and many other members of our Lab

Documentation

Feedback

If you have any question remaining, we would be happy to answer informally:

Join the chat at https://gitter.im/3DGenomes/tadbit

Frequently asked questions

Check the label FAQ in TADbit issues.

If your question is still unanswered feel free to open a new issue.

Docker/Singularity Containers

Recipe files (Dockerfile and Singularity recipe) to generate containers are available in the containers folder.

  • Docker

Build the image using the Dockerfile from inside an empty folder with docker build -t tadbit . (~20 minutes)

Once built, run it as docker run tadbit tadbit map -h

This image contains all dependencies for TADbit and also jupyter .

To run a notebook from inside the docker container run tadbit docker image as:

docker run -it -p 8888:8888 -v /LOCAL_PATH:/mnt tadbit

LOCAL_PATH would be for example a local folder with data (e.g. FASTQs or reference genomes). And /mnt a directory inside the Docker container where the LOCAL_PATH would be mounted.

From inside docker run:

jupyter notebook --ip 0.0.0.0 --allow-root --NotebookApp.token=''

And finally write the url http://localhost:8888 in your browser.

Note: this can also be done in a single line and running in the background:

docker run -d -p 8888:8888 -v /LOCAL_PATH:/mnt tadbit jupyter notebook --ip 0.0.0.0 --allow-root --NotebookApp.token='' > /dev/null &
  • Singularity

Build the image using the Singularity from inside an empty folder with sudo singularity build tadbit.simg Singularity (~20 minutes)

Once built, run it as singularity run tadbit.simg

You can also install jupyter inside the Singularity by uncommenting the coresponding line in the recipe file.

Citation

Please, cite this article if you use TADbit.

Serra, F., Baù, D., Goodstadt, M., Castillo, D. Filion, G., & Marti-Renom, M.A. (2017). Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors. PLOS Comp Bio 13(7) e1005665. doi:10.1371/journal.pcbi.1005665

Methods implemented in TADbit

In addition to the general citation for the TADbit library, please cite these articles if you used TADbit for:

Applications

TADbit has been previously used for modeling genomes and genomic domains. Here is the list of published articles:

Other programs

TADbit uses other major software packages in biology. Here is the list of their articles:

TADbit training

Next editions

  • To be announced.

Past editions

Bibliography

Ay2015

Ay, F., Vu, T.H., Zeitz, M.J., Varoquaux, N., Carette, J.E., Vert, J.-P., Hoffman, A.R. and Noble, W.S. 2015. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genomics 16, p. 121.

Baù2011

Baù, D., Sanyal, A., Lajoie, B.R., Capriotti, E., Byron, M., Lawrence, J.B., Dekker, J. and Marti-Renom, M.A. 2011. The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules. Nature Structural & Molecular Biology 18(1), pp. 107–114.

BaùMarti-Renom2012

Baù, D. and Marti-Renom, M.A. 2012. Genome structure determination via 3C-based data integration by the Integrative Modeling Platform. Methods 58(3), pp. 300–306.

Belton2015

Belton, J.-M., Lajoie, B.R., Audibert, S., Cantaloube, S., Lassadi, I., Goiffon, I., Baù, D., Marti-Renom, M.A., Bystricky, K. and Dekker, J. 2015. The conformation of yeast chromosome III is mating type dependent and controlled by the recombination enhancer. Cell reports 13(9), pp. 1855–1867.

Cattoni2017

Cattoni, D.I., Cardozo-Gizz, A.M., Georgieva, M., Di Stefano, M., Valeri, A., Chamousset, D., Houbron, C., Dejardin, S., Fiche, J-B., Marti-Renom, M.A., Bantignies, F., Cavalli, G. and Nollmann, M. (2017) Single-cell absolute contact probability detection reveals that chromosomes are organized by modulated stochasticity. Nature Communications 8 pp 1753

Cuadrado2019

Cuadrado, A., Giménez-Llorente, D., Kojic, A., Rodríguez-Corsino, M., Cuartero, Y., Martín-Serrano, G., Gómez-López, G., Marti-Renom, M.A. and Losada, A. (2019) Specific contributions of cohesin-SA1 and cohesin-SA2 to TADs and Polycomb domains in embryonic stem cells. Cell Reports, in press

Enright2002

Enright, A. J., Van Dongen, S., & Ouzounis, C. A. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 30(7), 1575–1584.

Imakaev2012

Imakaev, M., Fudenberg, G., McCord, R.P., Naumova, N., Goloborodko, A., Lajoie, B.R., Dekker, J. and Mirny, L.A. 2012. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods 9(10), pp. 999–1003.

Kojic2018

Kojic, A., Cuadrado, A., Koninck, A.M., Gomez-Lopez, G., Rodriguez-Corsino, M., Le Dily, F., Marti-Renom, M.A. and Losada, A. (2018) Distinct roles of cohesin-SA1 and cohesin-SA2 in 3D chromosome organization. Nature Structural and Molecular Biology 25 pp 496–504

Le_Dily2014

Le Dily, F., Baù, D., Pohl, A., Vicent, G.P., Serra, F., Soronellas, D., Castellano, G., Wright, R.H.G., Ballare, C., Filion, G., Marti-Renom, M.A. and Beato, M. 2014. Distinct structural transitions of chromatin topological domains correlate with coordinated hormone-induced gene regulation. Genes & Development 28(19), pp. 2151–2162.

Lieberman-Aiden2009

Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S. and Dekker, J. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), pp. 289–293.

Marco-Sola2012

Marco-Sola, S., Sammeth, M., Guigo, R. and Ribeca, P. 2012. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9(12), pp. 1185-1188.

Mas2018

Mas, G., Blanco, E., Ballaré, C., Sansó, M., Spill, Y.G., Hu, D., Aoi, Y., Le Dily, F., Shilatifard, A., Marti-Renom, M.A. and Di Croce, L. (2018) Promoter bivalency favors an open architecture of the stem cell genome. Nature Genetics 50 pp 1452–1462

Miguel-Escalada2019

Miguel-Escalada, I., Bonàs-Guarch, S., Cebola, I., Ponsa-Cobas, J., Mendieta-Esteban, J. , Rolando, D., Javierre, B.M., Atla, G., Farabella, I., Morgan, C.C., García-Hurtado, J., Beucher, A., Morán, I., Pasquali, L., Ramos, M., Appel, E.V.R., Linneberg, L., Gjesing, A.P., Witte, D.R., Pedersen, O., Grarup, N., Ravassard, P., Mercader, J.M., Torrents, D., Piemonti, L., Berney, T., de Koning E., Kerr-Conte, J., Pattou, F., Hansen, T., Marti-Renom, M.A., Fraser, P. and Ferrer, J. (2019) Human pancreatic islet 3D chromatin architecture provides insights into the genetics of type 2 diabetes. Nature Genetics, in press

Morf2019

Morf, J., Wingett, S.W., Farabella, I., Cairns, J., Furlan-Magaril, M., Jiménez-García, L.F., Liu, X., Craig, F.F., Walker, S., Segons-Pichon, A., Andrews, S., Marti-Renom, M.A. and Fraser, P. (2019) RNA proximity sequencing reveals properties of spatial transcriptome organization in the nucleus. Nature Biotechnology, in press

Nir2018

Nir, G., Farabella, I., Pérez Estrada, C., Ebeling, C.G., Beliveau, B.J., Sasaki, H.M., Lee, S.H., Nguyen, S.C., McCole, R.B., Chattoraj, S., Erceg, J., Abed, J.A., Martins, N.M.C., Nguyen, H.Q., Hannan, M.A., Russell, S., Durand, N.C., Rao, S.S.P., Kishi, J.Y., Soler-Vila, P., Di Pierro, M., Onuchic, J.N., Callahan, S., Schreiner, J., Stuckey, J., Yin, P., Lieberman Aiden, E., Marti-Renom, M.A. and Wu, C.T. (2018) Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling. PLOS Genetics 14(12) pp e1007872

Pascual-Reguant2018

Pascual-Reguant. L., Blanco, E., Galan, S., Le Dily, F., Cuartero, Y., Serra-Bardenys, G., di Carlo, V., Iturbide, A., Cebrià-Costa, J.P., Nonell, L., García de Herreros, A., Di Croce, L., Marti-Renom, M.A. and Peiró, S. (2018) Genome-wide mapping of lamin B1 reveals the existence of dynamic and functional euchromatin lamin B1 domains (eLADs) during epithelial-to-mesenchymal transition (EMT).Nature Communications 9(1) pp 3420

Rao2014

Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S. and Aiden, E.L. 2014. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7), pp. 1665–1680.

Russel2011

Russel, D., Lasker, K., Webb, B., Velázquez-Muriel, J., Tjioe, E., Schneidman-Duhovny, D., et al. (2011). Putting the Pieces Together: Integrative Modeling Platform Software for Structure Determination of Macromolecular Assemblies. PLoS Biology, 10(1), e1001244.

Stadhouders2018

Stadhouders, R., Vidal, E., Serra, F., Di Stefano, B., Le Dily, F., Quilez, J., Gomez, A., Collombet, S., Berenguer, C., Cuartero, Y., Hecht, J., Filion, G., Beato, M., Marti-Renom, M.A. and Graf, T. (2018) Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming. Nature Genetics 50 pp 238–249

Trussart2015

Trussart, M., Serra, F., Baù, D., Junier, I., Serrano, L. and Marti-Renom, M.A. 2015. Assessing the limits of restraint-based 3D modeling of genomes and genomic domains. Nucleic Acids Research 43(7), pp. 3465–3477.

Trussart2017

Trussart, M., Yus, E., Martinez, S., Baù, D., Tahara, Y.O., Pengo, T., Widjaja, M., Kretschmer, S., Swoger, J., Djordjevic, S., Turnbull, L., Whitchurch, C., Miyata, M., Marti-Renom, M.A., Lluch-Senar, M. and Serrano, L. 2017. Defined chromosome structure in the genome-reduced bacterium Mycoplasma pneumoniae. Nature Communications 8, p. 14665.

Umbarger2011

Umbarger, M.A., Toro, E., Wright, M.A., Porreca, G.J., Baù, D., Hong, S.-H., Fero, M.J., Zhu, L.J., Marti-Renom, M.A., McAdams, H.H., Shapiro, L., Dekker, J. and Church, G.M. 2011. The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Molecular Cell 44(2), pp. 252–264.

tadbit's People

Contributors

david-castillo avatar dbau avatar fransua avatar gitter-badger avatar gui11aume avatar iosonoirene avatar julenmendieta avatar marcodis avatar martirenom avatar sgalan avatar vreuter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tadbit's Issues

check correlation value

spearman correlation between original data and modelled data. Check with original script.

Visualize Centroid

Would be nice to have a function (models.visualize_centroid()) that makes a drawing of the centroid similar to the models you show on the deconvolution analysis.

Deconvolution analysis

Add a function that given a set of models and clusters provides a deconvolution analysis of the input interaction matrix

calc_eqv_rmsd slow

should pass everything to the function, not one call per pairwise comparison.
(check if this is truly faster).

new plot for z-scores

it would be useful to have an extra z-score plot showing on the right the matrix of the z-score used (i.e. the ones between lower and upper bound should be grayed out), and on the left the z-score distribution

Default CPU use too high

Requesting all the CPUs by default when calling the tadbit C function is too high and sometimes causes the machine to crash/freeze.

IMP installation and sudo

After following the installation guideline, the following line fails

sudo python setup.py install

The reason is that root PYTHONPATH does not contain IMP (but user PYTHONPATH does).

cutoff should be resolution-dependent

at 100,000 Kb a cutoff of 200 nm does not make sense -> should be around 1000 nm.

This implies that we should set a global default cutoff values for a given "StructuralModels". This predetermined cutoff will be used as default in for correlations unless something else is specified.

find a cutoff formula, something like:

cutoff = resolution * 3 / 200

if we have a resolution of 10 Kb a resolution of 150 nm
if we have a resolution of 20 Kb a resolution of 300 nm
if we have a resolution of 100 Kb a resolution of 1500 nm

... something like this

calculation of the volume occupied by the model (Marc idea)

One analysis from the models that we are not really doing well is the calculation of the volume occupied by the model. Sort of a calculating what is called the Accessible Surface Area. Therefore, we will need to get measures such:

  • Radius of Gyration
  • Longest X,Y and Z paths
  • Volume
  • Accessible Surface

Create Genome class

To link chromosomes together.

Genome objects may contain

  • inter chromosomal interactions, or a summary of the genomic Hi-C map (lower resolution)
  • estimate of nucleus size
  • tools to convert coordinates between species
  • ...

conflict between search for centromere option and max_tad_size

max_tad_size only used when searching for centromere -> need to change this.

  • search for centromere not doing well when TAD beginning matches with centromere beginning (check if the same occurs with TAD end).
  • max_tad_size should be by default equal to chromosome size or to infinite.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.