Code Monkey home page Code Monkey logo

ampliconrepository's People

Contributors

ahujar27 avatar edwin5588 avatar forrestkim avatar ginop123 avatar jluebeck avatar liefeld avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

ampliconrepository's Issues

Clicking a row in a sample table will display the amplicon png image on the sample page

Vineet and I were exploring some data from a collaborator, and found that we very frequently did the following steps while interacting with the site

  1. Go to a sample page
  2. Search the sample table for "ecDNA" (or some term)
  3. Attempted to click on various rows to display the amplicon image in the page (either above the CN plots, or as an on-click hover-type element).

Currently users need to open a separate tab with the image link to view the amplicon for some row in the table, and the constant tab-switching for each amplicon was a bit of a burden for us. We expect that most users will want to click a row in the table and see the amplicon image displayed somewhere on the page (as opposed to opening a tab or hunting for the ID in the CN plotly element).

image

Adding support for GRCh37

Currently, AmpliconRepository will only properly work with data from GRCh38 (hg38). However many large databases of cancer samples are aligned to GRCh37 so we will receive a good number of GRCh37 samples. In the future we will want all the AA reference genomes supported (GRCh38, GRCh37, hg19, mm10 and GRCh38_viral).

Let us first start this process by adding a project-specific variable indicating the reference used, to allow a project to report if hg38 or GRCh37 is used, and hopefully we can generalize to additional references from there.

Here are some instructions for @edwin5588 to create a single-sample testing dataset for GRCh37, which perhaps he can share the results of in this thread for others to test with:

  • Run AmpliconSuite on the FF-17 fastqs with GRCh37 as the reference genome.
  • Run this one sample through the Aggregator.
  • Share results here

Adding support for GRCh37 should resolve the issue in #22.

IGV viewer still displays "all" view on many amplicons

Using the prod version of AmpRepo I still get the 'all' view on many amplicons. E.g., 59M_OVARY amplicon2 gives the all view. Clicking through a few samples seemed to give either
a) single locus view (correct)
b) multiple highlighted regions for a single chromosome (not wrong, but perhaps not what we wanted?)
c) the "all" view (bug)

SW579_THYROID is an example that has amplicons with all three views above.

I couldn't figure out what properties seemed to trigger each of these three cases, but perhaps samples with multiple intervals on a chromosome were more likely to give b).

Issue when loading samples derived from intermediate AmpliconSuite entry-points

When samples are run starting from an intermediate point (e.g. one stage for CN, one stage for AA), the copy number file will reflect only
a small subset of the genome. While the complete copy number file may exist in the collection of files, the one that is referenced in sample_data is the partial data used as a jumping-off point for the second-stage of a run.

This causes sample_plot.py to crash, as it expects complete copy number data.

This is an issue we should handle during sample aggregation, by searching in the expected location for the full copy number calls and linking those.

I am documenting this issue in the AmpliconRepository page so that others can be aware of what this issue is caused by if they encounter it with additional testing data that will arive, and will close the issue once we have addressed the root cause in the aggregation module.

We will need to also update sample_plot.py to handle the case where missing or incomplete CN data is present, as this same issue will occur. I will create a separate issue, or just issue a bugfix outright.

image

"Project already exists" message after deleting project

After deleting a project and creating a new project of the same name (to replace the input data while testing), I received the "Project already exists" message when attempting to create a new project of the same name as the old project.

The behavior I was expecting was that after deleting a project, I could create a new project of the same project name.

For now we can require that project names be unique. But once deleted, the project name should be unreserved and available for use.

Project URL based on a UID instead of project title

Hi Forrest,

I noticed that the project page URLs are based on the project name, e.g.
https://ampliconrepository.org/project/Contino%20et%20al.%20EAC%20cell%20lines

I am wondering if instead we can assign a UID to the project (e.g. a numeric ID) that is perhaps just based on a counter of the total number of public and private projects added to the site (1, 2, 3... etc).

This would help provide more succinct URLs for sharing the project to others, and possibly help prevent issues related to collisions between project names. Thoughts?

Project members required to be specified on new project

On the (create) New Project page, the user must provide an email address for members of the project in order to complete the form. Is it assumed that the creating user is already in this list? If not, they probably should be. In either case, if a user is making a project only editable by themselves, it may not make sense to ask the user to either write their own email address in the form, or require them to place some dummy text to the box in order to submit.

Display sample metadata on sample pages

We would like to add the following to each sample page, between the title and the CN plots (picture below). The element should take some space (maybe 200px), but hopefully not bump the CN plots so far down the page that the user doesn't see them at load.

  1. Reference genome build of sample
  2. A link to download the sample's run metadata json. (if provided)
  3. Display the contents of the sample metadata json file (if provided)

image

UnboundLocalError at /create-project/

Instructions for reproducing this issue:

  1. Access AmpliconRepository.org using Safari.
  2. Signed in via Google
  3. Selected "New Project"
  4. Entered required fields, checked "private project" and entered my account email as the project member.
  5. For the file upload, I selected the .tar.gz file available here, which I had downloaded locally.

Expected Behavior:

Successful upload of the aggregated AA dataset (aggregated by the AmpliconAggregator module), and creation of a new project, followed by a page redirect to that new project page.

Observed Behavior/Error Message:

UnboundLocalError at /create-project/

local variable 'cnv_file_id' referenced before assignment
Request Method: POST
http://ampliconrepository.org/create-project/
7.0.6
UnboundLocalError
local variable 'cnv_file_id' referenced before assignment
/srv/caper/caper/views.py, line 655, in create_project
/opt/venv/bin/python
4.8.10
['/srv/caper', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/opt/venv/lib/python3.8/site-packages']
Tue, 14 Feb 2023 18:10:53 +0000

InvalidID error when downloading CCLE test dataset

AmpliconRepository produces the following error when clicking the "Download All Project Data" on the following page:
https://ampliconrepository.org/project/CCLE

InvalidId at /project/CCLE/download

'Not Provided' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string
Request Method: GET
http://ampliconrepository.org/project/CCLE/download
4.0.6
InvalidId
'Not Provided' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string
/opt/venv/lib/python3.8/site-packages/bson/objectid.py, line 38, in _raise_invalid_id
/opt/venv/bin/python
3.8.10
['/srv/caper', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/opt/venv/lib/python3.8/site-packages']
Tue, 21 Feb 2023 18:18:13 +0000

Sample download should include the AA outputs for the sample

The updated project and sample downloads are working very nicely. However we would like to include the AA output files for the sample itself in the download for a sample.

This may be a bit more challenging since an explicit reference to the location of the relevant files is not stored in sample_data.

In sample 59M_OVARY the files we want to include can be found in the "results/AA_outputs/59M_OVARY" directory of the uploaded data. It should be sufficient to match the sample name to the directory name in results/AA_outputs/ and pull that directory into the collection of downloads for a sample
image

However, this may not work if the directory name doesn't match the sample name (uncommon). In that case, a better, more general solution may be to take the path of the sample_data['AA_summary_file'] path, and extract a prefix of that as a way to locate the parent directory of the relevant files.

New projects should be private by default

Instead of an opt-in checkbox to create a private project, can we change so that private project is the default for a new project, and users can instead opt-in via checkbox to making their project public?

For most real-world use cases, people will first upload unpublished data for their own internal exploration, then choose to share it with the rest of the world when they are ready.

igv.js returning "all" instead of specific location

Screen Shot 2023-04-05 at 11 45 29 AM

When pressing on the plotly visualizer on the sample page to see the location visualized in igv.js, I am seeing the location as 'all' instead of the specific location chosen. Not sure where this bug came from.

Thanks,
Forrest

Updating CCLE samples

I will update the CCLE dataset to include no-amp samples and generally be more similar in format to the Contino data.

Broken AA amplicon images on Sample pages

It appears that some of the AA amplicon images are broken in various sample view pages. For example, if a user navigates to:
https://ampliconrepository.org/project/CCLE/sample/59M_OVARY
and clicks on "Amplicon 5"

Clicking on the image itself yields a 400 Bad Request error: http://ampliconrepository.org/project/CCLE/sample/59M_OVARY/feature/59M_OVARY_amplicon5_Linear%amplification_1/download/png/63e551078e48210577692f0b

I have double-checked and the image for AA amplicon 5 does exist in the source data, so I am wondering if this is a pathing-related issue.

Make private/Make public functionality for projects

We would like AmpliconRepository to support functionality for a project owner to

  • Make their private project public
  • Make their public project private

We envision that this function would be captured in a single button on the project page, only visible to project owner(s). When clicked the button explains the consequences of either option, and asks the user to confirm before proceeding.

Create default 'testuser' account in default GB/Django settings

To make development easier it would be nice to have a default user stored in the Mongo DB (or Django DB/config, depending on where the user accounts are stored) . Then when developers are working they could skip the step of having to login/create a user each time they start up.

File Not Found Error in Sample View

I'm facing a file not found error when trying to click into any of the sample view pages after creating a project locally.

69251541730__418019B5-07ED-49C4-A015-04D2E278E921

I believe the file that it is trying to find is located in my directory at this location but whatever process is trying to find it can't.

PNG image

Clear all local projects?

Is there an easy way to clear/delete all projects from a local deployment of the site, without individually deleting each?

Toggling amplicons from sample page plotly view hides only the first interval

When clicking the "Amplicon X" text to toggle the presence of an amplicon in the sample page plotly view, only the first interval from the amplicon is hidden. If there are other/multiple intervals, they remain in the plot.

Perhaps this is an issue with the plotly library itself, in which case we may need to take the issue to them. Alternatively we may need to explore changing the way we add visual elements to the plot.

Handling of no-amp samples

We recently introduced representation of no-amp samples into the AmpliconClassifier outputs. Previously, we did not report any table entries for samples without focal amplifications. The newer versions of AmpliconSuite add no-amp samples to the feature table with dummy values in the relevant fields (e.g. an empty list for genes, empty list for gene regions, "NA" for feature bed file).

I have created a public project for this latest version of the Contino data here: https://ampliconrepository.org/project/Contino%20et%20al.%20EAC%20cell%20lines

Sample CP-D is the one and only no-amp sample in this project. It is not reported in the previously uploaded version of this dataset. Currently, the website doesn't know how to handle the empty sample, and crashes when clicking on its sample page, here: https://ampliconrepository.org/project/Contino%20et%20al.%20EAC%20cell%20lines/sample/CP-D

The aggregated source data used to create the project:
https://drive.google.com/file/d/12sHExxNJBnYnRU767atVv6B778BlxUBR/view?usp=sharing

Adding a "Featured" tab to main page projects list

In addition to public and private - as we put more datasets on the site, it would be good to have a tab for 'Featured' public datasets (e.g. CCLE). For instance, someone maybe analyzes a few CCLE samples on their own and pushes a public project called CCLE to the site. In the list of all public projects, it might be good if we can show a few curated ones in a separate tab.

I imagine this would involve adding an admin feature, for a site admin to mark a public project as 'featured'.

Local deployment code 400 when clicking images

It seems both Edwin and I are getting the following console error when clicking an amplicon image from the sample page (and no image appears on the site)
image

This seems to have appeared with the latest updates to the project_view_plotly branch.

Project load/display must handle missing images

e.g. in CCLE we see these two samples/features that lack images, however they do actually have bed files

Sample: sample_231 Feature: RMGI_OVARY_amplicon10_Complex non-cyclic_1 BED_ID is -> Not Provided <-
Sample: sample_275 Feature: ZR751_BREAST_amplicon2_Complex non-cyclic_1 BED_ID is -> Not Provided <-

In the case of RMGI_OVARY amplicon 10 (haven't looked at the other) there actuallys is a bed file, but it is not linked into the project. There are not images (due to a numpy bug per Jens) but there ARE links to images for different/wrong amplicons.

Fixing the sample page display will be a different ticket. This ticket is to cover the project load, so it should be able to load the bed file but no images for RMGI amplicon 10 and have the right information available when the sample page is displayed

Easier local deployment

As I fix around the Docker container deployment, I wanted to make an easier method of locally deploying since we are all using that as a baseline for testing. I will add all the necessary config settings into a new config file that should allow us to run the local deployment by using something like source config-local.sh and then running the server.

Thanks,
Forrest

Unable to edit members of private project

I created a new project with the following initial users:
image

I then attempted to add or remove users from the list and face the following errors:

Add a user (e.g. my gmail address)

image
image

Remove a user (e.g. Michael)

image
image

KeyError when clicking on samples

When testing a dataset received from a collaborator which I uploaded to the site as a private project, I encounter the following error when clicking on samples.

I will attempt to debug locally as we cannot yet share this dataset outside our lab. Any existing insights or prior experience with this issue would be appreciated. I'm tagging @ahujar27 @GinoP123 here as well, since this occurs in the sample_plot.py script.

image

Add "Genome" column to project table

To inform users of which reference genome a project uses, a column labeled "Genome" should be present in the project tables. It can pull the genome build info from an arbitrary sample_data entry.

More robustly, instead of pulling from an arbitrary sample, the solution could iterate over all samples and collect the reference genome builds as a set. Usually there should only be one in the set, but if multiple are found this is bad (unsupported by the website) and the column should perhaps print "MULTIPLE" or something informative about the issue.

UI improvements

In general, the UI should

  1. Not show links that cannot be used
  2. make a clear "affordance" to show when things can be clicked on

To that end, the create project does not belong on the header (logged in or not) and that clicking on the username at the top right opens a profile page is also not obvious. The following change is suggested and was approved by Jens 3/28/23 ~1:30pm PST.

  1. Add an icon next to the username (either gear or downward triangle) to indicate something happens when you click the username.

  2. When you click the username, open a menu with the following options;

    Profile
    Create new project
    Logout
    
  3. These links behave the same as the current links for the profile page, logout, create but are just relocated to the menu, and the menu (and gear icon) are not displayed to users who are not logged in

IGV receives wrong coordinates for multi-chromosomal amplicons

The current code for the IGV display genome coordinates (or "locus" in IGV terminology), appears to work by taking a list of genome-coordinate strings, extracting the minimum coordinate from the first string and the maximum coordinate from the last string (views.py: 258). The chromosome number is set to the chromosome number of the first element in the list.

However, the list of genome positions of the feature can cross multiple chromosomes, so it will often end up using the coordinate from a different chromosome as the endpoint of the visualized interval.

Ideally, a fix would be to create a Multi-locus view IGV plot that creates a section for each coordinate range in the list of coordinates.

Perhaps we should disable the IGV toggle until this is addressed since the current method only properly displays specifically the relevant regions of the genome if a single interval is in the list.

Sample and project download revisions

We would like to include a number of additional files in the batch downloads for the Sample and Project page downloads.

First, and hopefully simplest, on the project page, please add a button to download the original aggregated .tar.gz file that the project was created from.

Additional proposed changes go as follows:

At the top level when downloading a project:

image

At the sample-level (these changes happen both when downloading a project and downloading a sample):

image

Because the CNV bed will be presented once at the top level of a folder, the CNV.bed can be removed from each individual feature's folder:

image

Set an empty table on sample page when no focal amps present

Thank you @ahujar27 and others for updating the sample plotting to handle no amps!

We would like to update the table on the sample page for samples with no focal amps. Currently it shows as follows (project_view_plotly branch):
image

If instead it could be either empty, or just state "No focal amps detected", that would be preferred. Currently, it provides some links for downloads which of course cause an error if clicked, and we don't need those links for empty cases.

You can catch these cases if the number of features detected is one and the feature name ends with "_NA".

Project deletion doesn't work the second time around

I ran into the following issue on prod and in local deployment where I couldn't delete a project of the same name as a previously deleted project.

Steps I did to encounter this issue:

  1. Create a project, 'A'
  2. Delete that proejct
  3. Create a project, with same name as previous ('A')
  4. Clicking delete a second time has no effect.

Timestamp formatting

Current formatting of timestamp for projects is a little clunky:
2023-03-29T18:45:50.886433

Can we do something like
2023-03-29 18:45:50 PST
Or
March 23rd, 2023 18:45:50 PST

Where the timezone is set automatically to either the user's timezone or GMT?

Sample display must handle missing images

When an image (or bed file) is not present then the links to it should not be present (but broken) on the sample page. They should be replaced with non-link text stating

"BED file unavailable"

or

"Image unavailable"

instead

Parsing and tokenizing email addresses in project members form

After some experimentation I found that project member emails must be formatted as
"email1, email2"
with a comma-space between each address.

I initially tried to input emails to the form using a space-separated list, which it accepted without error, but it did not share the project to other people. I realize now that it treated that space-separated list as one very long email address.

Can we please add functionality to parse the email address list by

  • spaces
  • comma
  • tab
  • semicolon

We view this as a high priority improvement.

As a secondary goal, if we could tokenize the email addresses after adding each that would be a very nice UI improvement, e.g. how gmail tokenizes email addresses -
image

Newly created private projects not showing up in private projects tab

Using a local deployment of main, when I create a new private project, it does not appear in the "private projects" tab of the home page. It does however appear in the table shown on the 'Profile' page, as expected. Is anyone else able to recreate this issue locally? Not sure if coincidental, but it seems to have started after I first tried the purge-local-db.py script. That came out around the same time as some other changes to main, so I would be interested to know what happens for others when they create a local private project.

Enabling all-chromosome view toggle on sample pages

For the chromosome visualization plot shown on the sample pages, a subset of relevant chromosomes are shown. Gino and Rohil had previously implemented a toggle button to enable showing all chromosomes. The code for this toggle has been commented out on sample pages in AmpliconRepository.org:

<!-- <span style='margin-left: 1rem'> Display All Chromosomes </span>
	<input id="toggle-igv" type="checkbox" class="form-check-input" data-toggle="toggle"> -->

We would like to re-enable this functionality, so that both the IGV toggle and the Display All Chromosomes toggle can exist side-by-side.

Thanks,
Jens

No table column-sorting on Sample page table

While the project page allows sorting of the samples table based on the various columns, a sorting functionality is not available for the similar table on the sample pages (table of amplicons). Would we please be able to enable that same sorting feature on the sample pages?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.