Code Monkey home page Code Monkey logo

chromebat's Introduction


ChromeBat: A Bio-Inspired Approach to Genome Reconstruction


OluwadareLab, University of Colorado, Colorado Springs


Developed by:
              Brandon Collins
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: [email protected]

              Philip Brown, PhD
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: [email protected]

              Oluwatosin Oluwadare, PhD
              Department of Computer Science
              University of Colorado, Colorado Springs
              Email: [email protected]


1. Content of folders:

  • src: Source code and parameter file for ChromeBat
  • results: output structures on GM06990 and GM12878

3. Input matrix file format:

Square Matrix Input format: The square matrix is a whitespace seperated N by N intra-chromosomal contact matrix derived from Hi-C data, where N is the number of equal-sized regions of a chromosome.

4. Dependencies Installation:

    python - 3.8.5
    scipy - 1.5.2
    numpy - 1.19.2
    scikit-learn - 0.23.2

5. Usage:

python ChromeBat.py contact_matrix parameter_file

where contact_matrix is text file representing a whitespace delimited square contact matrix derived from a HiC experiement.

6. Parameters:

All parameters must be specified in a text file like in the parameters_heavy.txt example. Default values of parameters_heavy.txt are given in []. However, during the parameter search phase of the algoirthm the heavy parameter file will run 30 processes concurrently. Because this is could be computationally intense to run on most local machines we also provide a parameters_light.txt file that will run only 6 processes concurrently.

Searched Parameters:

  • alpha: [0.1,0.3,0.5,0.7,0.9,1.0] This is the conversion factor used to convert the contact matrix to a distance matrix. If no alpha value is found a search across alphas in [0,1] will be performed
  • perturbation: [0.002,0.004,0.006,0.008,0.01] This determines the size of the random walk a bat takes after pulsing

These parameters may have comma delimited values in the parameter file, doing so will result in a search over all combinations of possible purturbation and alpha values.

Normal Parameters:

  • ouput_file: [bat] This is the name both of the outputted files will have.
  • num_bats: [10] How many bats the algorithm will simulate.
  • generations: [10] How many iterations the algorithm will perform.
  • min_freq: [0] The minimuim frequency that a bat can have. Low frequency means a bat will explore more then exploit.
  • max_freq: [0.1] The maximum frequency a bat will be simulated with. Bats with high frequency expoit more then explore.
  • volume: [0.9] A bat's volume determines how willing it is to accept new solutions. A loud bat will accept solutions with high probability.
  • pulse: [0.9] When a bat pulses it teleports to the current best known solution. High pulse means it teleports with high probability.
  • structs: [10] This is how many structures in addition to the alpha search structures the algorithm should generate. If an alpha search is performed only the structures generated with the optimal alpha will be written to files.

These parameters may only have 1 value.

7. Output:

Chromebat.py will produce two files per structure generated whose names are specified by the output_file parameter in the parameter file.

These files are

  • output_fileX.pdb : contains the model that may be visualized using PyMol
  • output_fileX.log : contains the input file name, Spearman's and Pearson's Correlation Coeffiecents and the Root Mean Squared Error
  • output_filecoordinate_mapping.txt : contains the mapping of old loci indices present in the original data to the new indices used after preprocessing (row column pairs that were all zero are removed as a preprocessing step)

Where X indicates the a structures number. For example if structs=2 and an alpha/perturbation search is performed 3 structures with the optimal alpha/perturbation will be generated, and 3 pairs of .pdb,.log files will be generated.

chromebat's People

Contributors

bcollin3 avatar oluwadarelab avatar

Watchers

 avatar

Forkers

mohankumarbc920

chromebat's Issues

Make input files

This software is really great, but I don't know how to prepare my input file? How should I make the N by N intra-chromosomal contact matrix from .HiC , cool or h5 format HiC file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.