Code Monkey home page Code Monkey logo

difffitviewer's Introduction

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map

IEEE VIS 2024 Submission arXiv preprint, Video, OSF repo

YouTube tutorial videos (coming soon)

  1. Install
  2. Demo Usage Scenario 1: Fit a single structure
  3. Demo Usage Scenario 2: Composite multiple structures
  4. Demo Usage Scenario 3: Identify unknown densities

Install

  1. Download the repository and unzip to a path. The following guide will use D:\GIT\DiffFitViewer as this path.
  2. Run ChimeraX command devel build D:\GIT\DiffFitViewer; devel install D:\GIT\DiffFitViewer
  3. Open the system command line shell, install PyTorch, Biopython, mrcfile, scikit-learn to ChimeraX's Python
    1. Find ChimeraX's Python, you may find this guide useful. The following commands will use C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe.
    2. Install PyTorch via the following command or according to its official doc
      C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
      
    3. Install Biopython, mrcfile, scikit-learn via the following command
      C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install biopython mrcfile scikit-learn
      

Now, DiffFit should be fully installed. Launch it via Tools > Volume Data > DiffFit

image

Right-click in the panel to access DiffFit's help page.

image

Demo usage scenarios

Scenario 1: Fit a single structure

  1. Download PDB-8JGF and EMD-36232
    1. note the resolution as 2.7ร… from the webpage
    2. extract the map
    3. put the files (8jgf.cif and emd_36232.map) under, for example, D:\GIT\DiffFitViewer\run\input\8JGF
  2. Drop both files into ChimeraX,
    1. take a note for the pixel value from the log, which represents the grid spacing for this volume, which is 1.04 in this case
    2. move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as 8JGF_transformed.cif. This step is only for demo purpose and is not necessary for real use cases
  3. Put 8JGF_transformed.cif under D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
  4. Simulate a map for the molecule
    1. Create two folders, subunits_mrc and subunits_npy, under D:\GIT\DiffFitViewer\run\input\8JGF\
    2. Open a new ChimeraX session and run runscript "D:\GIT\DiffFitViewer\src\convert2mrc_npy.py" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc" "D:\GIT\DiffFitViewer\run\input\8JGF\subunits_npy" 2.7 1.04
  5. Run DiffFit. Set the parameters as follows and hit Run!
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
    3. Structures sim-map folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_mrc
    4. Output folder: D:\GIT\DiffFitViewer\run\output\8JGF
    5. Experiment name: fit_single_demo
    6. Target surface threshold: 0.20. Or use the author recommended contour level 0.162. DiffFit is very robust against this parameter, a value between 0.02 - 0.4 is fine in this case.
    7. Leave the rest as default and hit Run!
  6. After freezing for a couple of seconds (less than 15 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the View tab to examine the results.
    1. Save the molecule if desired
    2. You may take a look at the optimization steps
  7. If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8JGF\emd_36232.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8JGF\subunits_cif
    3. Data folder: D:\GIT\DiffFitViewer\run\output\8JGF\fit_single_demo
    4. Clustering - Shift Tolerance: 0.5 or the value you desire
    5. Clustering - Angle Tolerance: 0.5 or the value you desire
    6. Hit Load

Scenario 2: Composite multiple structures

  1. Download PDB-8SMK and EMD-40589
    1. note the resolution as 3.5ร… from the webpage
    2. extract the map
    3. put the files (8smk.cif and emd_40589.map) under, for example, D:\GIT\DiffFitViewer\run\input\8SMK
  2. Drop both files into ChimeraX,
    1. take a note for the pixel value from the log, which represents the grid spacing for this volume, which is 0.835 in this case
    2. move and rotate the molecule and then save it (select it, choose "Save selected atoms only", uncheck "Use untransformed coordinates") as 8SMK_transformed.cif. This step is only for demo purpose and is not necessary for real use cases
  3. Create a folder subunits under D:\GIT\DiffFitViewer\run\input\8SMK
  4. Split the chains into individual .cif files and simulate a map for each chain
    1. Open a new ChimeraX session and run runscript "D:\GIT\DiffFitViewer\src\split_chains.py" "D:\GIT\DiffFitViewer\run\input\8SMK\8SMK_transformed.cif" "D:\GIT\DiffFitViewer\run\input\8SMK\subunits" 3.5 0.835
    2. Put all generated .cif files under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
    3. Put all generated .mrc files under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
    4. Delete all generated .npy files, or put them under D:\GIT\DiffFitViewer\run\input\8SMK\subunits_npy
    5. Keep only the unique chains (A, B, C) in subunits_cif and subunits_mrc
  5. Run DiffFit. Set the parameters as follows and hit Run!
    1. Target volume: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
    2. Structures folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
    3. Structures sim-map folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_mrc
    4. Output folder: D:\GIT\DiffFitViewer\run\output\8SMK
    5. Experiment name: round1
    6. Target surface threshold: 0.8. Or use the author recommended contour level 5.0. DiffFit is very robust against this parameter, a value between 0.1 - 5.0 is fine in this case.
    7. # shifts: 30
    8. # quaternions: 300
    9. Leave the rest as default and hit Run!
  6. After freezing for a couple of seconds (less than 30 seconds on one RTX 4090), ChimeraX should be back and responsive to you. Click the View tab to examine the results.
    1. Examine the fit, sort by a different metric
    2. If you want to change the cluster tolerance, or if you run Compute on a cluster, or if you accidentally close ChimeraX after Compute run, you can View the results by the following parameter settings
      1. Target volume: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589.map
      2. Structures folder: D:\GIT\DiffFitViewer\run\input\8SMK\subunits_cif
      3. Data folder: D:\GIT\DiffFitViewer\run\output\8SMK\composite_unique_chains
      4. Clustering - Shift Tolerance: 6 or the value you desire
      5. Clustering - Angle Tolerance: 15 or the value you desire
      6. Hit Load
    3. Save a molecule if desired
    4. Set the Resolution as 3.5, and click Simulate volume
    5. Change the surface level threshold for the simulated volume if necessary
    6. Click Zero density
    7. Repeat the last 4 steps (Save, Simulate, Zero) for the same Mol Id at a different place, or for a different Mol Id until there is no good fit
    8. Save the last working volume by File > Save > Files of type as MRC > Map as the desired one as a new name, for example, emd_40589_round_1.mrc
  7. Repeat Step 5-6 until satisfied with the whole compositing
    1. Change the Target volume as: D:\GIT\DiffFitViewer\run\input\8SMK\emd_40589_round_1.mrc
    2. If needed, take out the already fitted chains from subunits_cif and subunits_mrc
    3. Give a new Experiment name: round2
    4. You may lower the # shifts, for example, to 10, and the # quaternions to 100
    5. Hit Run!

Scenario 3: Identify unknown densities

The whole procedure is the same as in Scenario 1: Fit a single structure, only that there will be multiple structures under subunits_cif.

There is a demo data set with one volume map and three structures to search against. If you have put DiffFit under D:\GIT\DiffFitViewer, you can just hit Run! in the Compute tab and then go to the View tab. If otherwise, you just need to change the path for the input and the output data.

If you want to search against the whole candidate library for this case from DomainFit, you can either follow Steps 1-3 from its doc to generate the PDB files for the domains, or just download the ones generated by us from this Google Drive link. Of note is that we generated 359 PDB files by following DomainFit's Steps 1-3, instead of the mentioned 344 files.

The computing time for searching the whole candidate library on one RTX 4090 is about 10 minutes.

difffitviewer's People

Contributors

rodenluo avatar ondrejstrnad avatar

Stargazers

Qihe Chen avatar  avatar

Watchers

 avatar Tom Goddard avatar  avatar

difffitviewer's Issues

New translation initialization

First, expand the active region by a user-controlled amount of grids on each axes, and then perform uniformly spaced sampling on the grid indices in the hope of getting more or less uniformly spaced sampling in the Angstrom unit.

Random placement initializations

If I log the tf.translation() for the tf from this line, https://github.com/RBVI/ChimeraX/blob/7ac0f1f53131e3e2914484d75d3d5df60221983a/src/bundles/map_fit/src/search.py#L78, and then plot them, the results for fitmap PDB-8GAM in EMDB-29900 search 100 are attached below, for both thresholds (0.14 and 0.209). It astonishes me, to be honest. I thought random_translation(bounds), 4 lines above, would generate points within the bounding box of the volume. Not sure if I misunderstood anything.

image

By adaptive sampling, I meant instead of sampling from the volume box or the bounding box of active voxels, just sampling from the active voxels themselves only.

I came up with all the random initialization heuristics after seeing fitting processes similar to this one: https://i.gyazo.com/135a64c79862a191a433420b85a08e93.mp4. This is a case where the initial orientation is close to the final orientation, but the translation is very off, and it fits in. My current conclusion is that the orientation is way much more important than the translation. I probably should have said uniform grid rather than uniform random. This is in echo with your idea of "choosing the placements to be as far as possible from all previously tried placements." The results of initializing translations at uniform grid intersection points are much better than the randomly generated points (even though I kinda forgot this later when I had the adaptive initialization in place. we were really rushed). I expect similar behavior for the orientation. Together with ChatGPT, I coined this function,

def generate_random_quaternions(n):
"""
Generate n random quaternion vectors that evenly sample directions.
Parameters:
- n: The number of quaternion vectors to generate.
Returns:
- quaternions: An array of shape (n, 4) containing n quaternions.
# Example usage:
n = 5 # Number of quaternions to generate
quaternions = generate_random_quaternions(n)
print(quaternions)
"""
# Randomly sample the angle
angles = 2 * np.pi * np.random.rand(n)
# Randomly sample the square root of the distribution for the axis
u = np.random.rand(n, 3)
sqrt_u = np.sqrt(u)
# Generate the axis components
axis = np.zeros((n, 3))
axis[:, 0] = np.sin(angles) * sqrt_u[:, 0]
axis[:, 1] = np.cos(angles) * sqrt_u[:, 1]
axis[:, 2] = np.sin(angles) * sqrt_u[:, 2]
# Normalize the axis to ensure it's a unit vector
norm = np.linalg.norm(axis, axis=1)
axis = axis / norm[:, None]
# Sample the cos(theta/2) for the quaternion's real part
cos_theta_2 = np.cos(np.random.rand(n) * np.pi)
# Combine to form quaternions: q = [cos(theta/2), sin(theta/2)*axis]
quaternions = np.zeros((n, 4))
quaternions[:, 0] = cos_theta_2
quaternions[:, 1:] = axis * np.sin(np.arccos(cos_theta_2))[:, None]
return quaternions
, for orientation. Per my visual check, it achieves "grid/even" sampling of the orientation. But I have not yet thought carefully about the quaternion-related math yet. So, there are still two TODOs for me in this regard. One is to get adaptive grid sampling for the translation; the other is to verify the orientation is indeed even/grid sampling.

Get ChimeraX data structurea and Gaussian smoothing to DiffFit

Playground code

from chimerax.core.commands import run
v_ = run(session, "volume gaussian #1 sdev 2")
v_.id

Using "volume gaussian #1 sdev 2" reduces the 3.6A resolution to maybe 8A
you have to smooth to a resolution much coarser than atomic (say 10 - 20A)

I have also heard from YouTube tutorial videos that one wants to smooth the map to the point where one can see secondary structures but not side chains. For this part, you might have already noticed that DiffFit generates a multi-resolution array for the input volume and then samples from the whole array. What I would like to understand is the relationship between the original resolution, the sDev, the outcome resolution (applying sDev 2 to 3.6A gets maybe 8A). And what is your recommendation on what resolutions to choose in the array (such as, 8, 10, 12, 14, 16, 18, 20, 22; or 8, 14, 20, 26)? Smoothing the original volume with different sDev, vs. iteratively (set the output as the input for the next round) smoothing with the same sDev, which one makes more sense? Does it make sense to find coarse fits based on the smoothed array of volumes first and then refine from there with the original higher resolution map?

`v.writable_copy().matrix().sum()` is different from `v.matrix().sum()`

Hi @tomgoddard,

With EMD-29900 open in ChimeraX, and running the following script shows that the sums of the matrix are different for the original map and the writable copy. Am I doing anything wrong? Thanks.

from chimerax.map.volume import volume_list
v = volume_list(session)[0]
v.matrix().sum()
Out[5]: -3153.0344

volume_copy = v.writable_copy()
volume_copy.matrix().sum()
Out[6]: -25224.293

And the surface looks different at the same threshold level.

Original volume:

image

Writable copy:

image

Volume matrix accessing and modifying

Running the following code in Tools > General > Shell, I can access the numpy matrix of a volume opened in ChimeraX; how can I access the filtered (by the "Level") matrix?

from chimerax.map.volume import volume_list
v = volume_list(session)[0]
v.matrix()

If I change the v.matrix() through the Shell interactively, how can I get a new drawing? I tried the following, but it is not working.

v.matrix().sum()
Out[62]: -3153.0344
v.matrix()[v.matrix() < 0.1] = 0
v.matrix().sum()
Out[64]: 4556.745
v.update_drawings()

Is it possible to access the matrix after dust removal, as in the following command?

surface dust #2 size 8.1

Buttons for DiffFit Viewer

Run in ChimeraX command line

runscript "D:\Research\IPM\PoseEstimation\fit_search\fit_in_chimerax.py" "D:\Research\IPM\PoseEstimation\fit_search\DomainFitExample1\single_domains\I7MLV6_D3.pdb" "D:\Research\IPM\PoseEstimation\fit_search\DomainFitExample1\solutions" "D:\Research\IPM\PoseEstimation\fit_search\DomainFitExample1\density2.mrc" 0.886 4 10

Place Copy and Save PDB (CIF) (change to CIF, always use CIF when saving, CIF is to be the go to format) are needed

image

Install DiffFit to ChimeraX

To build and install the plugin

devel build D:\GIT\DiffFitViewer; devel install D:\GIT\DiffFitViewer

Commands to be typed in cmd, change the path to ChimeraX Python

Install PyTorch via

C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install BioPython, mrcfile, scikit-learn

C:\Users\luod\AppData\Local\ChimeraX\bin\python.exe -m pip install biopython mrcfile scikit-learn

Now, DiffFit is fully installed.

Use these commands to prepare the simulated volumes in a batch processing mode when necessary.

runscript "D:\GIT\DiffFitViewer\src\split_chains.py" "D:\GIT\DiffFitViewer\dev_data\input\8SMK_all_subunits_shifted\8smk_shifted.cif" "D:\GIT\DiffFitViewer\dev_data\input\8SMK_all_subunits_shifted\subunits" 3.5 0.835

runscript "D:\GIT\DiffFitViewer\src\convert2mrc_npy.py" "D:\Research\IPM\PoseEstimation\DiffComp3D\data3D\dev_comp\domain_fit_demo_all\subunits_cif" "D:\Research\IPM\PoseEstimation\DiffComp3D\data3D\dev_comp\domain_fit_demo_all\subunits_mrc" "D:\Research\IPM\PoseEstimation\DiffComp3D\data3D\dev_comp\domain_fit_demo_all\subunits_npy" 4.0 1.37

The following replies are not necessary. They were used in early development.

Asterisk unpacking in numpy array indexing requires ChimeraX 1.7 which uses Python 3.11

With ChimeraX 1.6.1, the following error shows up. The line causing the problem is probably already not in use. I need to check further to see if we can remove or change it to have DiffFit running on v1.6.

Traceback (most recent call last):
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\toolshed\info.py", line 560, in start_tool
    ti = api._api_caller.start_tool(api, session, self, tool_info)
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\toolshed\__init__.py", line 1328, in start_tool
    return cls._get_func(api, "start_tool")(session, bi, ti)
  File "C:\Users\jiad\AppData\Local\UCSF\ChimeraX\1.6\site-packages\chimerax\difffit\__init__.py", line 29, in start_tool
    from . import tool
  File "C:\Users\jiad\AppData\Local\UCSF\ChimeraX\1.6\site-packages\chimerax\difffit\tool.py", line 26, in <module>
    from .parse_log import look_at_record, look_at_cluster, look_at_MQS_idx, animate_MQS, animate_MQS_2
  File "C:\Users\jiad\AppData\Local\UCSF\ChimeraX\1.6\site-packages\chimerax\difffit\parse_log.py", line 202
    shift = e_sqd_log[*MQS, iter_idx][0:3]
                          ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\ui\gui.py", line 1656, in <lambda>
    run(ses, "ui tool show %s" % StringArg.unparse(tool_name)))
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\commands\run.py", line 38, in run
    results = command.run(text, log=log, return_json=return_json)
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\commands\cli.py", line 2897, in run
    result = ci.function(session, **kw_args)
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\ui\cmd.py", line 219, in ui_tool_show
    bi.start_tool(session, name)
  File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\toolshed\info.py", line 567, in start_tool
    raise ToolshedError(
chimerax.core.toolshed.ToolshedError: start_tool() failed for tool DiffFit in bundle DiffFitViewerTool:
invalid syntax (parse_log.py, line 202)

chimerax.core.toolshed.ToolshedError: start_tool() failed for tool DiffFit in bundle DiffFitViewerTool:
invalid syntax (parse_log.py, line 202)

File "C:\Program Files\ChimeraX\bin\lib\site-packages\chimerax\core\toolshed\info.py", line 567, in start_tool
raise ToolshedError(

See log for complete Python traceback.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.