Code Monkey home page Code Monkey logo

chrom_mcsa's Introduction

chrom_MCSA

The latest version of the chromatin MC code. Scales with system size efficiently to 300 processors on 750-nucleosome system. Single fibers only, variable NRL, explicit tails and 28 bead LH model. Usage/Build Notes for Chromatin Monte Carlo Mesoscale Modeling by gavinbascom Schlick Lab Nov 2014 Updated for LANL machines 5/15******

For details of the theory and basic model motif see the following references:

1.Collepardo-Guevara, Rosana, and Tamar Schlick. "Crucial role of dynamic linker histone binding and divalent ions for DNA accessibility and gene regulation revealed by mesoscale modeling of oligonucleosomes." Nucleic acids research 40.18 (2012): 8803-8817.

  1. Collepardo-Guevara, Rosana, and Tamar Schlick. "Chromatin fiber polymorphism triggered by variations of DNA linker lengths." Proceedings of the National Academy of Sciences 111.22 (2014): 8061-8066.

3.Luque, Antoni, et al. "Dynamic condensation of linker histone C-terminal domain regulates chromatin structure." Nucleic Acids Research (2014): gku491.

  1. Zhang, Qing, Daniel A. Beard, and Tamar Schlick. "Constructing irregular surfaces to enclose macromolecular complexes for mesoscale modeling using the discrete surface charge optimization (DiSCO) algorithm." Journal of computational chemistry 24.16 (2003): 2063-2074.

Notes on model/implementation:

This code models DNA+histone (nucleosome) complex as several (~300) beads which have been optimized for charge and density on the surface of the nucleosome (called DISCO, see ref 4 above) while linker DNA is modeled as beads while tails and Linker Histone (H1) are also modeled as coarse grain beads. The Monte Carlo algorithm is comprised of 5 basic moves, utilizing instantaneous potential energy as the cost function/acceptance criteria. The five moves are: Global,Translational,Rotational,Linker Histone,and Rosenbluth Regrow (tails). Global moves take a subset of atoms and move them as a unit, exploring the global conformations accessible to the fiber, while rotational and translational move or rotate individual DNA linker beads. LH moves are translated within the dyad axis near the nucleosome, and the Rosenbluth Regrow routine randomly chooses a tail, erases it, and inserts each bead then one at a time within a distance (with gaussian probability) near the insertion point. Each step repeats each bead insertion "Ntrials" number of times, and the optimal bead is then chosen via an energy acceptance criteria similar to the main metropolis monte carlo scheme.

COMPILING:

NYU HPC Mercer:

  1. Navigate to the chromatin folder (where this README should be residing) and type:

module load intel make

This should produce an executable in the same folder named "chrom_vNRL.x". NOTE: This is a parallel program, there are older versions of the code which run serially and have a different workflow, Although this implementation is based on the older implementations. The other option is to run this version with the current build (with mpi module loaded) normally.

Additionally, if you need to alter variables in the module file "modglob.f" you'll need to compile the module separately and then recompiled the main code with modglob.0 as a make file (as is currently implemented in the makefile). There are commented lines in the makefile which can be used to do this.

NOTE: Running on LANL machines use:

module load intel mvapich2/1.9

and you don't need to point to the mpi wrappers, just use: mpif90 (build lines)

*copy executable to working directory for command lines below to work

USAGE:

  1. Once compiled, you may want to open some interactive nodes for parameter tweaking or debugging. On MUSTANG use:

    msub -I -l procs=4 -l walltime=02:00:00

  2. Once job has started navigate to the run directory. You will need the following files to be present within this folder: a. input file (in this doc its named input.run) b. dim.in (file containing the dimensions of the system) c. LH.in (file containing the Linker Histone parameters, refined models are labelled by number of beads per protein domain, Nterminal, Globular Head, C Terminal) d. LH_equil.in (file containing the equilibrium values for LH parameters) e. Core file (as generated by the DiSCO method, generally named core_data.reg.150mM) f. Tail file (a PDB style file containing the tail parameters and labels, named tail_data.mod.200mM) g. 1.dat (an ascii file containing the coordinates of the DNA bead vectors + nucleosomes) h. grid_data.9deg (file containing occupancy matrix for nucleosome volume)

  3. To run (on LANL-Mustang w 300 processors):

    module load intel mvapich2/1.9 srun -n 300 chrom_vNRL.x dim.in input.run LH_N0G6C22.in LH_N0G6C22_equil.in

    where the np flag denotes how many processors you would like to use. NOTE: The performance on multiple processors is highly dependent on system size and computer architecture. See section benchmarks/benchmarking. Additionally, you must pass all of the appropriate files to the executable on command line, or the program will complain and die.

3.1 To Restart: NOTE: In order to restart successfully, you will have to make sure that fort.15 is present and correctly formatted, with the correct laststep # in the opening line 1. Change restart flag to 1, change total steps if necessary 2. Make sure LHbound.0.in is available and in your working directory, if needs be copy it from LHbound.0.out (same file, same formatting, just time when written) *NOTE this is only necessary if using variable [LH] 3. Choose Mode 1 for modeLHc in input file. This will render [LH] useless but make sure that LHnum is correctly matched with the LHbound.0.in for allocation purposes 4. Check and make sure the energy of the first step (it will be marked as zero) is the same as the energy marked as the last step in the previous log file. If not, you probably have file corruption, erroneous LH inputs, or you've changed t_bcr, or some other structural parameters that will give you problems.

5. Make sure the appropriate fort.* files are present- they will be appended from the restart. If they aren't present the code will complain and start a new run
6. Make sure gen twist is set to 0 and twist is specified to the previous twist (THIS NEEDS FIXING)    
  1. Example Output (NOTE THIS IS SLIGHTLY OUTDATED:

    The following is an example of output directed to the screen(for a 12-nucleosome 6 bead system using standard inputs on 4 processors for 10 steps with freq 10) which can be easily redirected as a log file with the > key at the command line:

SIMULATING CHROMATIN MESOSCOPICALLY VIA TEMP CONTROLLED METROPOLIS MONTE CARLO DIMENSION FILE BEING READ BY 2 by SCHLICK LAB NYU 2014 DIMENSION FILE BEING READ BY 1 DIMENSION FILE BEING READ BY 0 DIMENSION FILE BEING READ BY 3 INPUT-FILE BEING READ BY 0 INPUT-FILE BEING READ BY 3 INPUT-FILE BEING READ BY 2 INPUT-FILE BEING READ BY 1 Simulation with LHs LH INPUT FILE BEING READ BY 0 LH INPUT FILE BEING READ BY 3 LH INPUT FILE BEING READ BY 2 LH INPUT FILE BEING READ BY 1 ************** SIMULATING WITH Mg2+ ************** n_c = 12 n = 84 N_unit = 7 nptc = 50 T = 0.2931500E+03 K Cs = 0.1500000E+00 M debye = 0.1285291E+01 nm^-1 q_l = -0.2408798E+02 e t_bcr = 0.1120000E+01
kbt = 0.5825868E+00 kcal/mol k_ex = 0.5825868E-03 kcal/mol h = 0.6473186E+01 kcal/mol/nm^2 g = 0.5825868E+01 kcal/mol s = 0.1439000E+02 kcal/mol h_tc = 0.6473186E+01 kcal/mol/nm^2 phi_o = 0.3300000E-02 rad Rcut = 0.7000000E+01 nm vdw_cut = 0.4000000E+01 nm t_exv_d = 0.1800000E+01 nm t_exv_e = 0.1000000E+00 (kcal/mol) evd_tl = 0.2700000E+01 nm evd_tc = 0.1800000E+01 nm evd_cc = 0.1200000E+01 nm evd_cl = 0.2400000E+01 nm evd_ll = 0.3600000E+01 nm evd_hcG = 0.2200000E+01 nm evd_hcC = 0.2400000E+01 nm evd_hlG = 0.3400000E+01 nm evd_hlC = 0.3600000E+01 nm steps = 10 restart = 0 freq = 10 corefile = core_data.reg.150mM
tailfile = tail_data.mod.200mM
iseed = 5678 withlink = T q_h1G = 0.1388000E+02 e q_h1C1 = 0.2562000E+02 e q_h1C2 = 0.2562000E+02 e HELLO FROM PROCESS 0! HELLO FROM PROCESS 2! HELLO FROM PROCESS 3! HELLO FROM PROCESS 1! COORD FILE BEING READ BY 3 COORD FILE BEING READ BY 0 COORD FILE BEING READ BY 1 COORD FILE BEING READ BY 2 COREFILE BEING READ BY 0 COREFILE BEING READ BY 3 COREFILE BEING READ BY 2 COREFILE BEING READ BY 1 TAIL FILE BEING READ BY 0 TAIL FILE BEING READ BY 3 TAIL FILE BEING READ BY 2 TAIL FILE BEING READ BY 1 LH.out written!

====== Step, myid: 0 0 ====== Es Eb Et
0.8270440E+03 0.8463352E+02 0.1900578E+03 Ec Ev TOTAL
-0.7939587E+03 0.1655635E+08 0.1655715E+08 tEc tEv tEb
0.4034159E+02 0.5816352E+00 0.3660650E+03 Eb_tc tEa
0.1223229E+03 0.0000000E+00 **** Consider increasing Ntrials **** step/trialID 10 1 1 23

====== Step, myid: 10 0 ====== Es Eb Et
0.8270440E+03 0.8463405E+02 0.1900217E+03 Ec Ev TOTAL
-0.7866014E+03 0.1650805E+08 0.1650885E+08 tEc tEv tEb
0.3942342E+02 0.5765955E+00 0.3630393E+03 Eb_tc tEa
0.1218703E+03 0.1722757E+01 ================== 10 0 ============== Global Trans Rotate Regrow Total Moves att.(mil): 0.000 0.000 0.000 0.000 0.000 Moves acc.(mil): 0.000 0.000 0.000 0.000 0.000

LH:acc,rat,del 2 0.500000000000000 0.700000000000000


  • PROGRAM COMPLETED NORMALLY *

CPU TIME ELAPSED: 2.0413566E-02 MINS

 NOTES ON OUTPUT:
 a.  The integers following each 'FILE BEING READ BY' outputs refer to the 
     processor id which has been able to access that file.  All processes should have 
     access to all files before the first energy call is made.
     
 b.  The following is a legend for the Energy variables displayed (see ref 2 Supporting Info):
 
     Es :Energy of DNA bead stretching.  Computed by a harmonic potential (spring)
     Eb :Energy of DNA bead bending. Computed by a harmonic angle potential (spring)
     Et :Energy of DNA bead twisting. Computed by a harmonic around the dihedral (spring)
     Ec :Energy of DNA electrostatics.  Computed as Debye-Huckel pairwise
     Ev :Energy of DNA excluded volume.  Computed by Lennerd-Jones 12-6 pairwise
     TOTAL :The total Energy of the system
     tEc :Tail electrostatics (Debye-Huckel)
     tEv :Tail excluded volume (Debye-Huckel)
     tEb :Tail bending energy (harmonic)
     Eb_tc : Energy of the Tail-Core bond bending
     tEa : Energy of the Tail-Core bond twisting
     (this output is piped directly to fort.9, along with other variables.  the columns are listed below)
     
 c.  The ***Consider Increasing Ntrials*** warning refers to a warning generated during the Rosenbluth
     regrow subroutine meaning that an adequate sampling for tail regrowth decision could
     not be found.  If you see a lot of this message, you likely have an error with your
     tail file, starting coordinates, or you Ntrial parameter is too low. (although it is normal to see a lot of it during early runs
     or runs being started from high energy starting conformers

d.   Moves att/acc lines refer to the number of each move type has been accepted or attempted,
     in the unit of Millions.  A good monte carlo simulation should have a an acc/att ratio
     of somewhere between 0.3 and 0.5, even if these ratios emerge only at multiples of millions 
     of steps.
     
e.  LH:acc returns the number of dynamic linker histone moves which have been accepted,
    rat is the acc/attempted ratio (which again should be between 0.3 and 0.5), and del
    refers to the average displacement of each LH bead, which is set in the input files. 
  1. Input parameters (found in the input file (THIS IS UP TO DATE)):

    1. The following input file was used to generate the above output:

T Cs steps restart freq Pglob Ptran Prota Ptgrw 293.15 0.150 60E6 1 10000 .1 .05 .05 .75 t_bcr deltas 1/2/3 ntrials 1.12 0.015 0.6 0.6 10 corefile tailfile core_data.reg.150mM tail_data.mod.200mM gen seed? seed T 1 gen twist? DNA_twist(phi_o) T 0.0033 Magnesium flag 0 Non-parental LH 1 ProbLH deltaLH .05 0.6 pre(Ev LH/lDNA) 1 [LH] mode LHnum 1 0 47 Verbose Output F Sim Annealing (0=none,1=linear anneal,2=trill) 0 0 0

2. Following is a legend explaining usage of each parameter.
	a. T 				= the temperature of the system
	b. Cs 				= concentration of NaCl used to parameterize the system.  If changing this, be sure to use the appropriate corefiles, tailfiles etc
	c. steps			= the total number of steps for the simulation
	d. restart			= restart flag, 1= restart, 0= start from scratch
	e. freq 			= how often to print energy to screen and output files
	f. Pglob			= percent of total moves which will be GLOBAL
	e. Ptran			= percent of total moves which will be TRANSLATIONAL
	f. Ptran			= percent of moves which will be ROTATIONAL 
	g. Ptgrw			= percent of moves which will be ROSENBLUTH REGROW moves
	h. t_bcr			= tail bead charge coefficient (multiplied to tail_file charges, used to mimic acetylation in past) 
	i. delta 1			= displacement(ave) for global moves 
	j. delta 2			= displacement(ave) for translational moves 
	k. delta 3			= displacement(ave) for rotational moves 
	l. ntrials			= number of trials to be performed during ROSENBLUTH REGROW
	m. corefile 		= path to corefile
	n. tailfile 		= path to tailfile
	o. seed				= seed for random number generator (this puts the small digit numbers used here and generates a larger one, see rand.f)
	p. withlink 		= THIS IS NOW HARDWIRED INTO THE CODE, for โ€“[LH] use [LH]=0 
	*. gen seed? 		= this will generate a seed based on wallclock ticks, if F uses the following number as seed
	*. gen twist?       = same as above except will generate equal amounts of -20deg, 0deg, and +20deg twist values, unless false, then twist is specified.  only active if not restarting
	r. DNA_twist(phi_o)	= equilibrium value for DNA twist
	s. Magnesium flag	= flag for presence of magnesium, 1=Mg2+,0=No Mg2+
	t. Non-parental LH	= flag to turn off LH interactions with non-parental nucleosomes (used for debugging frozen starting conformations)
	u. ProbLH			= total percentage of the steps which will be LH moves
	v. deltaLH			= average displacement during LH steps
	w. pre(Ev LH/DNA)	= now obsolete parameter
	x. [LH]				= Linker Histone concentration (only used if following mode is 0)
	y. mode				= mode of disbursing LHs: 
							0 = Random distribution based on [LH] (computed as ratio)
							1 = read distribution from file (named LHbound.0.in)
							2 = randomly distributed based on LHnum
	z.LHnum				= number of LHs to be simulated.  Needs to be consistent with LHbound.0.in or concentration if used
   aa.Verbose Output	= Logical Flag to print extra output (used for debugging). Accepts 
   			              T,F,.TRUE.,.FALSE.
   			              
3. The dim.in file should look like this:

#num of cores
12

beads LH-domains: N-term Glob-head C-term

0 6 22

linker dna beads per core (1 line per core)

4 4 4 4 4 4 4 4 4 4 4 4 4

	Where the first line denotes 12 nucleosomes,
	And second line denotes how many beads are to be used per LH domain (Nterm, Gterm, Cterm)
	and the following lines denote the number of LH beads per core (NOTE 1bead=9bp).
	
4.  OUTPUT FILES:

The following output files are generated (the naming has been kept as fort.* for various reasons, 
mostly compatibility with matlab script archives):

fort.8 : distance between the first two dimers, indexed by MC step, written every 'freq' step 
fort.9 : main energy terms, not indexed, each column corresponds to those written to screen (in the same order, left to right, top to bottom.  written every 'freq' step. 
fort.10 : all the coordinates of the system, dumped at beginning and every 'freq' steps
fort.11 : restart file written at end of simulation, identical to fort.15
fort.13 : simulation parameters, identical to screen output at beginning of simulation 
fort.14 : Electrostatics File.  Prints the individual contributions of bead type to the electrostatics every 'freq' step
fort.15 : Restart file written every 5,000 steps or final step
fort.16 : output of snapshot subroutine (gives coors with labels)
fort.17 : excluded volume file: Prints the bead type contributions to the excluded volume
fort.101 : 
fort.121 : Initial snapshot of the system (check for file corruption or read errors)
fort.122 : Final snapshot of the system


********************FORT.9***********************************************************
by Column
1.  DNA Stretch
2.  DNA Bend
3.  DNA Twist
4.  DNA Electrostatics (Debye-Huckel Nonbonded Terms)
5.  DNA Excluded Volume (Lennard Jones 12-6 terms)
6.  TOTAL ENERGY
7.  Tail Electrostatics (Debye-Huckel)
8.  Tail Excluded Volume (Lennard Jones 12-6)
9.  Tail Stretch
10. Tail Bend
11. Tail-Core Stretch
12. Linker Histone Stretch
13. Linker Histone Bend
14. Linker Histone Twist
********************FORT.9***********************************************************

********************FORT.14 Electrostatics (DEBYE HUCKEL by bead type)**************
by Column

1.  DNA-DNA
2.  Core-Core
3.  Tail-Tail
4.  DNA-Core
5.  Core-Tail
6.  Tail-DNA 
7.  Tail-Tail across tails
8.  Tail-Tail same tail
9.  Core-Tail across
10. Core-Tail same
11. LH-LH
12. LH-Tail
13. LH-DNA1
14. LH-DNA2
15. LH-Core
*************************************************************************************

********************FORT.17 Excluded Volume (Lennard-Jones 12-6 by bead type)********
by Column

1.  DNA-DNA
2.  Core-Core
3.  Tail-Tail
4.  DNA-Core
5.  Core-Tail
6.  Tail-DNA 
7.  Tail-Tail across tails
8.  Tail-Tail same tail
9.  Core-Tail across
10. Core-Tail same
11. LH-LH
12. LH-Tail
13. LH-DNA1
14. LH-DNA2
15. LH-Core
*************************************************************************************

5.Input CORE and TAIL files, with explanation of variables:

COREFILE:
no index, no labels, just x,y,z columns of 300 point beads, with charges to follow in single column.

TAILFILE (PDB style)
Column:

1. residue name  
2. tail group	
3. fixed to core flg (1 fixed / 0 notfixed)
4. bead charge
5. bead LJ radius     
6. tail mass	
7. x (relative to core)  
8. y (relative to core)	
9. z (relative to core)
10. eq bond_value
11. bond constant   	
12. eq angle value
13. angle_constant  

chrom_mcsa's People

Contributors

gavinbas avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.