Code Monkey home page Code Monkey logo

covid's People

Contributors

anabiman avatar apayne97 avatar asfo1979 avatar binikarki avatar calintat avatar carlos-a-ramos-g avatar chig avatar cmanci avatar danielschemmel avatar djhuggins avatar dmmorozo avatar egoldber avatar geemi725 avatar ghorbanimahdi73 avatar gorangiud avatar hannahbrucemacdonald avatar hdokainish avatar jamesmkrieger avatar jcgumbart avatar jchodera avatar lcasalino avatar lnaden avatar ozlemd78 avatar pabloggaray avatar pkoukos avatar rafwiewiora avatar rebeccawalters95 avatar sjayellis avatar tsztain avatar zhang-ivy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid's Issues

Distinguish between unsolvated and solvated models

@Lnaden:

It would be useful to distinguish between unsolvated models in the "Models" section that is associated with each Structure, as well as on the Models page. For example:
image
The explicitly solvated snapshot may be useful for MD simulations, but is more likely important for provenance tracking purposes with the associated simulation dataset that contains trajectories since very few modeling applications can use the solvated snapshot.

By contrast, the unsolvated protein models---into which missing loops have been built, structural modeling errors have been corrected, misperceived structural ions have been corrected, etc. can be used in essentially all modeling workflows.

I would suggest we move the solvated snapshots to be associated with their relevant simulation trajectories into "Simulations", and reserve "Models" for pre-solvated models that have corrected issues with the original structural data.

Suggestions to ingest data automatically

@apayne97, @henriberger and I have been talking about solutions to incorporate information from the Thorne Lab in a more automated way. We have come with this "ideal" pipeline:

Tier 1) Create a script that can diff their PDB IDs with our PDB IDs. Report the set difference for a human to review which new ones are worth adding.

Tier 2) Create a GitHub Actions pipeline that does this automatically either with an hourly cronjob or, if technically possible, after every push to the Thorne Lab repo

Tier 3) Add bot features to GHA to submit the PRs needed for each new candidate PDB ID. A human reviews it, editing the information as needed, and merges or rejects it. The closed PRs serve as a history on what we have tried so we don't resubmit twice.

Let us know if you have feedback!

Improving internal and external links

Several times now I have been looking for something I know is on the website but just can't seem to find it - for instance, I just wanted to download a video showing a DESRES trajectory but just kept going in circles clicking on links that led nowhere.

One concrete suggestion is to have a constantly visible legend for the different labels telling me where I will go when I click on it (i.e. a "proteins" page, an external link, etc)
image

Additionally, if there could be some way to orient the user in "space" in the website, i.e. by highlighting the panel you're in and showing how far down the page you are (maybe not possible with a static site? idk)

Remove stoplights from structural data

We need to remove the stoplights from structural data. They are entirely misleading as to the quality of the structures and their utility for different purposes.

Publication status has no impact on structure quality. If we want to communicate publication status as presence of preprint or published version, we should simply come up with an icon that is displayed for preprint and published that shows up if these are available and absent if they are not.

This is going to cause active harm to the community if we keep these.

The appropriate annotation data should instead be pulled from the Coronavirus Structural Task Force, but we shouldn't wait for the implementation of that to strip out the stoplight nonsense.

Request for data addition/refinement for variants

Please briefly describe your suggestion.
It would be good to have a field about variants so they are easily searchable rather than just being plain text.

Please provide the schema for the new/refined data class of interest.
List below all the keywords/values you would like to modify or add.

20E (EU1; D614G+A222V)
Alpha (B.1.1.7)
Beta (B.1.351)
Delta (B.1.617.2)
Epsilon (B.1.427 and B.1.429)
Omicron (B.1.1.529)

Additional context
Modelling variants and their mutations is important to understand what they are doing and this area will likely continue to grow. I have recently worked on this in both my previous postdoc in the Bahar lab (https://dx.doi.org/10.2139/ssrn.3907841) and my current Marie Curie fellowship in the Carazo/Sorzano lab (https://www.biorxiv.org/content/10.1101/2021.12.05.471263v2).

How do data analysis offers get handled?

How will this hub take in potential data analysis contributions? We have taken the DESRES 3CLpro trajectories and extracted a small set (34, but adjustable) diverse conformations of the catalytic domain from the 100,000 snapshots - something we think could be useful to those planning docking studies. "Analyses" are not a current data type, so I guess a new schema is needed. In our case the input data (in addition to the DESRES trajectories) is a Jupyter notebook and one PDB format file, the output is the 34 selected structures, again as PDB files. As a starter:

type: one of [Jupyter notebook, bash script....]
title: (required)
description: (required)
creator: (required)
organization: (optional)
lab: (optional)
institute: (optional)
models: (optional) must point to model in models dir
- modelname_1
- ...
proteins: (required) Must be a valid protein (see proteins dir)
- protein 1
- ...
structures: (optional) must point to structure which could be in structure dir
- structure 1
- ...
simulations: (optional) must point to simulation which could be in simulation dir
- simulation 1
- ...
rating: (optional) int on domain [1,5], 5 is better
files: (required) URLs to input and supporting files.
- file 1
- ...
references: (optional) List of referfences associated with the programs and methods you want to mention. For publications tied to this exact analysis, use the publicaton and preprint categories
- ref1
- ref2
publication: (optional) URL of the publication which includes THIS analysis
preprint: (optional) URL of the preprint for the publication. Can also be used to note if submitted to a peer reviewed journal by the exact word "Submitted"

Collaborating -> Contributing?

It seems like it might be helpful to change "About > Collaborating" to "About > Contributing" to lower the perceptual barrier for others to get involved:
image
Here, it looks like we have to go through a significant process to decide whether someone is allowed to be a "collaborator" for a monolithic site that is intended to be a community hub.

Instead, it may make sense to list contributors so far (which can be automatically pulled from YAML files, which we could add contributor: fields to) and describe several ways in which folks can get involved by

  • contributing a PR about a new potential target, structure, model, or molecule (dataset)
  • joining a data/contribution review team
  • helping with web development
  • collaborating in a more substantial way (for orgs like BioExcel/JEDI)

How to present drug discovery efforts against 3CLpro, PLpro, RdRP, etc

Not a bug, I just didn't see a format that looked right
There are a WHOLE bunch of 3CLpro (Mpro, Main Protease, nsp5) structures. And potentially a WHOLE BUNCH of molecules that will target it. I think it's worth thinking about the best way to curate and share this data.
My current idea would be to just:

  1. identify useful key classes of small molecules
  2. curate just a few structures / specific examples of those classes, and display those directly
  3. have a separate page for linking to other repositories for more of this info.

This could be expanded to PLpro (nsp3) and RdRP (nsp12) in a similar fashion.

Add DOI for RIKEN trajectories

I believe the citation for RIKEN trajectories like https://covid.molssi.org/simulations/#riken-cpr-tms-tmd1_toup-trajectory should be:

Takaharu Mori, Jaewoon Jung, Chigusa Kobayashi, Hisham M Dokainish, Suyong Re, Yuji Sugita (2021):
Elucidation of interactions regulating conformational stability and dynamics of SARS-CoV-2 S-protein.
Biophysical Journal 120(6)
https://doi.org/10.1016/j.bpj.2021.01.012

Note that there are several deposits but I have not checked all of them.

"New data entries" and "Tracking issue" issue tracker buttons don't open an issue

Describe the bug
The "New data entries" and "Tracking issue" issue tracker buttons take you nowhere useful.

To Reproduce
Click on the "New data entries" and "Tracking issue" issue tracker buttons.

Expected behavior
These buttons should take you to where they claim to.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Set up a review process for new data with goal to have review pre-merge

A review process needs to be established for data.

One process, proposed at the onset of the project was to just merge all data as it came in, and give it a color coded system to indicate review. It has been pointed out that this will likely lead to chaos and be hard to maintain.

Another process suggested was to have data be reviewed before it ever gets merged and I think this is what should be done. The pipeline would be like this:

  • Contributing person accumulates all data they wish to submit and open a PR with their own assessment of the quality of the data.
  • That assessment is reviewed by the curation team for that type of data
  • Adjustments are made / discussed in PR
  • Quality is assessed and noted in the PR
  • Merged

cc @Andrew-AbiMansour @sjayellis @jchodera

Simulation data descriptions do not render Markdown properly

Describe the bug
The description field in simulations do not render Markdown correctly.
We'll need this for our incoming Folding@home data sharing PRs.
Fortunately, this seems easy to fix---will create a PR momentarily.

To Reproduce
Example: https://covid.molssi.org//simulations/#sars-cov-2-spike-s-glycoprotein

Expected behavior
Markdown should render correctly to allow inclusion of links to simulation data sources and inline shell examples of how to download the data

Screenshots
image

Update targets for Folding@home dataset?

Some of the Folding@home simulation datasets end up under "no specified targets" when there are defined targets:

  • nsp7 appears under "No Targets Recorded", rather than "Inhibition of viral polymerases"
  • nsp8 appears under "No Targets Recorded", rather than "Inhibition of viral polymerases"
  • nsp12 (RdRP) appears under "No Targets Recorded", rather than "Inhibition of viral polymerases"
  • nsp13 appears under "No Targets Recorded", rather than "Inhibition of nsp13 helicase activity"

Any idea how we fix this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.