Code Monkey home page Code Monkey logo

microsalt's Introduction

Build status Coverage Status DOI

Microbial Sequence Analysis and Loci-based Typing pipeline

The microbial sequence analysis and loci-based typing pipeline (microSALT) is used to analyse microbial samples. It produces a quality control of the sample, determines a sample's organism specific sequence type, and its resistance pattern. microSALT also provides a database storage solution and report generation of these results.

microSALT uses a combination of python, sqLite and flask. Python is used for the majority of functionality, the database is handled through sqLite and the front-end is handled through flask. All analysis activity by microSALT requires a SLURM cluster.

Quick installation

  • yes | bash <(curl https://raw.githubusercontent.com/Clinical-Genomics/microSALT/master/install.sh)
  • cp configExample.json $HOME/.microSALT/config.json
  • vim $HOME/.microSALT/config.json

Configuration

Copy the configuration file to microSALTs hidden home directory, or copy the configuration file anywhere and direct the envvar MICROSALT_CONFIG to it. See example:

cp configExample.json $HOME/.microSALT/config.json

or

cp configExample.json /MY/FAV/FOLDER/config.json
export MICROSALT_CONFIG=/MY/FAV/FOLDER/config.json

Then edit the fields to match your environment.

Usage

  • microSALT analyse contains functions to start sbatch job(s) & produce output to folders['results']. Afterwards the parsed results are uploaded to the SQL back-end and produce reports (HTML), which are then automatically e-mailed to the user.
  • microSALT utils contains various functionality, including generating the sample description json, manually adding new reference organisms and re-generating reports.

Databases

MLST Definitions

microSALT will automatically download & use the MLST definitions for any organism on pubMLST (https://pubmlst.org/databases/). Other definitions may be used, as long as they retain the same format.

Resistance genes

microSALT will automatically download & use the resistance genes of resFinder (https://cge.cbs.dtu.dk/services/data.php). Any definitions will work, as long as they retain the same formatting.

Requirements

Hardware

  • A slurm enabled HPC
  • A (clarity) LIMS server

Software

Contributing to this repo

This repository follows the Github flow approach to adding updates. For more information, see https://guides.github.com/introduction/flow/

Credits

  • Isak Sylvin - Lead developer
  • Emma Sernstad - Accreditation ready reports
  • Tanja Normark - Various issues
  • Maya Brandi - Various issues

microsalt's People

Contributors

b4ckm4n avatar emmser avatar henningonsbring avatar ingkebil avatar mayabrandi avatar mropat avatar pbiology avatar seallard avatar sylvinite avatar talnor avatar vince-janv avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microsalt's Issues

Jobs cannot be tracked for microsalt

MicroSALT uses a timestamp in the output directory name. This means that the path to it cannot be passed to trailblazer when a pending analysis is created. In turn, this means that jobs cannot be tracked for microsalt cases in trailblazer.

We could apply some workaround for the microsalt pipeline, but it makes more sense to change the name of the output dir to follow the structure the rest of the pipelines do. Basically, it should just be the case name (instead of lims project id + random timestamp).

Add prefix to all output files

Moving sample files into housekeeper removes all intermediate folders. This means files like "contigs.fasta" needs a sample prefix.

Blast db creation bug

Describe the bug
A clear and concise description of what the bug is.

Running microSALT manually, the following error message appears during the reference updates:
BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB

To Reproduce
Steps to reproduce the behavior:

  1. Start any case with microSALT manually

Expected behavior
Update should finish without the error message.

Screenshots

microSALT analyse /home/proj/production/microbial/queries/ACC8133A95.json --input /home/proj/production/microbial/fastq/usablegoblin/ACC8133A95/
INFO - Checking versions of references..
WARNING - Unable to find requested organism 'SARS-CoV-2' in pubMLST database
INFO - pubMLST reference for Enterobacter cloacae updated to 2021-06-30 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/enterobacter_cloacae
INFO - pubMLST reference for Enterococcus faecium updated to 2021-06-29 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/enterococcus_faecium
INFO - pubMLST reference for Glaesserella parasuis updated to 2021-06-29 from 2021-03-25
BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/glaesserella_parasuis
INFO - pubMLST reference for Pseudomonas aeruginosa updated to 2021-07-01 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/pseudomonas_aeruginosa
INFO - pubMLST reference for Salmonella spp. updated to 2021-07-01 from 2021-06-25
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/salmonella_spp.
INFO - pubMLST reference for Staphylococcus aureus updated to 2021-06-29 from 2021-06-25
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/staphylococcus_aureus
INFO - pubMLST reference for Staphylococcus epidermidis updated to 2021-06-28 from 2021-06-26
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/staphylococcus_epidermidis
INFO - pubMLST reference for Streptococcus pneumoniae updated to 2021-06-29 from 2021-06-09
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/streptococcus_pneumoniae
INFO - Downloading new MLST profiles for Escherichia coli#1
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/escherichia_coli
WARNING - Unable to update pubMLST external data: HTTP Error 503: Service Unavailable
INFO - Cached resFinder database identical to remote.
INFO - Version check done. Creating sbatch jobs
INFO - Created runfile for sample ACC8133A95 in folder /home/proj/production/microbial/results//ACC8133A95_2021.7.2_15.4.9
INFO - Saved Trailblazer slurm report file to /home/proj/production/microbial/results/reports/trailblazer/ACC8133A95_slurm_ids.yaml and /home/proj/production/microbial/results/ACC8133A95_2021.7.2_15.4.9/ACC8133A95_slurm_ids.yaml
INFO - Execution finished!

Software version (please complete the following information):

Additional context
Error appear to be species specific. Might be affecting Glaesserella parasuis analayses.

Bug in external pubMLST reference updates

Describe the bug
Error messages for external pubMLST references.

To Reproduce

  1. Run any case manually.

Expected behavior
The following error message appears when updating references:
WARNING - Unable to update pubMLST external data: HTTP Error 503: Service Unavailable

Screenshots

microSALT analyse /home/proj/production/microbial/queries/ACC8133A95.json --input /home/proj/production/microbial/fastq/usablegoblin/ACC8133A95/
INFO - Checking versions of references..
WARNING - Unable to find requested organism 'SARS-CoV-2' in pubMLST database
INFO - pubMLST reference for Enterobacter cloacae updated to 2021-06-30 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/enterobacter_cloacae
INFO - pubMLST reference for Enterococcus faecium updated to 2021-06-29 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/enterococcus_faecium
INFO - pubMLST reference for Glaesserella parasuis updated to 2021-06-29 from 2021-03-25
BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/glaesserella_parasuis
INFO - pubMLST reference for Pseudomonas aeruginosa updated to 2021-07-01 from 2021-06-23
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/pseudomonas_aeruginosa
INFO - pubMLST reference for Salmonella spp. updated to 2021-07-01 from 2021-06-25
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/salmonella_spp.
INFO - pubMLST reference for Staphylococcus aureus updated to 2021-06-29 from 2021-06-25
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/staphylococcus_aureus
INFO - pubMLST reference for Staphylococcus epidermidis updated to 2021-06-28 from 2021-06-26
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/staphylococcus_epidermidis
INFO - pubMLST reference for Streptococcus pneumoniae updated to 2021-06-29 from 2021-06-09
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/streptococcus_pneumoniae
INFO - Downloading new MLST profiles for Escherichia coli#1
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/escherichia_coli
WARNING - Unable to update pubMLST external data: HTTP Error 503: Service Unavailable
INFO - Cached resFinder database identical to remote.
INFO - Version check done. Creating sbatch jobs
INFO - Created runfile for sample ACC8133A95 in folder /home/proj/production/microbial/results//ACC8133A95_2021.7.2_15.4.9
INFO - Saved Trailblazer slurm report file to /home/proj/production/microbial/results/reports/trailblazer/ACC8133A95_slurm_ids.yaml and /home/proj/production/microbial/results/ACC8133A95_2021.7.2_15.4.9/ACC8133A95_slurm_ids.yaml
INFO - Execution finished!

Software version (please complete the following information):

Additional context
Updates at pubMLST. Seems to be affecting the updates of external pubMLST references.

Revise deliverables file

Deliverables file incorrectly displays '~' instead of ~ for index path.
Json dump file is reffered by ticket ID instead of LIMS ID.

Add automatic blacking

Is your feature request related to a problem? Please describe.
Automatically run black to reformat the structure of the code in new PRs.

Describe the solution you'd like

  • Figure out how automatic blacking works
  • Implement this in microSALT

Describe alternatives you've considered
None

Additional context
None

Issue with reporting resistance OXA-48

Describe the bug
Perfect matches to OXA-48 are no longer reporter in microSALT, instead microSALT reports OXA-566 resistance of lower %identities.

Message from KS in Ticket 722237:

...angående att OXA-48 har börjat rapporteras som OXA-566 (identity 99,88%). Kör jag fastq-filerna från er i Resfinder så rapporterar den OXA-48 (100% match). Jag skulle då meddela ungefär när denna förändring inträffat och vad jag kan se så rapporteras prov 20ET200231 (Ticket 112544, v44 2020) som OXA-48 medan prov 20ET500236 (Ticket 935485, v47 2020) rapporteras som OXA-566. Är det något som ändrats i assembly under denna period?

To Reproduce
Steps to reproduce the behavior:

  1. Run sample 20ET200231 or 20ET500236 with microSALT > 3.1.3 and microSALT 3.1.0

Expected behavior
OXA-48 should be reported if this resistances has higher %identity.

Additional context
Is OXA-48 in the raw blast results?

Adding some of the commits made during the period below. Has any affected OXA-48 reporting? Scrape_blast changes likely.

HTML reports unnecessarily pulls content from the web - with potential privacy and security concerns

Describe the bug
The stylesheet files and images included in the microSALT reports are gathered from various places on the web, such as GitHub and some content delivery networks (CDNs).

While this is practical for many reasons, it also means that all the accesses of microSALT reports are logged by various commercial and/or state organizations outside of Sweden, such that they can see IP-addresses and a lot of browser information about the one opening the reports, which is probably not desirable.

The biggest problem is perhaps if this information is used by evil actors to identify IP addresses where sensitive information is stored, and thus draws attention to those.

Also sometimes the viewing of a report can stall on "Establishing a TLS handshake with CDN ..." as seen in one of the screenshots below.

To Reproduce
Steps to reproduce the behavior:

  1. Open up a microSALT report in e.g. Firefox.
  2. Press Ctrl+U to view the source code of the report
  3. Search for "<img" or "stylesheet".
  4. Notice that the source of these are addresses on the web.

Expected behavior

I think it would probably be desirable that stylesheets and images where either linked to local files, or embedded in the HTML (which is possible even for png images, using base64 encoding (See e.g. here).

Screenshots

image

image

Software version (please complete the following information):

  • microSALT 3.3.5

Resistance call debug info

End-users frequently use resistance hits as a way to verify that the correct organism was called.
As such hits below the current thresholds are relevant to display in a separate section; to use for debugging purposes.

Deliverables file generation on old runs

Use microSALT to initialize analysis on old runs. This will produce deliverables files, which will be useful for visualising trends of old data.

Also rename ticket.json to limsid.json.

microSALT cannot be installed on hasta

Describe the bug
Old conda version on hasta means microSALT cannot currently be installed on hasta with the installation script in use. The installation gets stuck on "Solving environment" for the following step:
conda install -y -c bioconda -c conda-forge blast=2.12.0 bwa=0.7.17 picard=2.20.3 pigz=2.4 quast=5.0.2 samtools=1.13 spades=3.13.1 trimmomatic=0.39=1 r-base=4.1.1

To Reproduce
bash update-microsalt-stage.sh master

Expected behavior
All microSALT dependencies are installed in S_microSALT and P_microSALT resp. by running the command above.

Additional context

  • Change versions?
  • Use more specific versions (as with trimmomatic above)?
  • Install sequentially?
  • Stop using conda? Options?

Add fields in the report for important resistances

Is your feature request related to a problem? Please describe.

Additional fields for resistances are requested by KS.

  • New columns in the summary of the typing report: ”Anmälningspliktig resistens”, ”Genkategori”
  • New column in the sample-level resistance table: ”Kategori”

The fields should more clearly mark out samples with resistances of high concern.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Vi har arbetat med ett projekt med syfte att förenkla tolkningen av resistensgenerna i MikroSALT rapporten för personalen som svarar ut, vilket vi har diskuterat på flera epitypningsmöten under våren.
Syftet är att man redan i översikten i rapporten ska kunna se om isolatet är ESBL-CARBA, ESBL, MRSA eller VRE. Detta för att kunna reagera och utföra felsökning om man har typat ett isolat som man tror bär en anmälningspliktig resistens, men det visar sig att det inte har det.

Vi har kodat resistensgenerna i resfinder i bifogad och vill att denna info ska visas både i översikten och i detaljinfon för varje gen (kategorierna som anges skiljer sig lite grann mellan de två delarna).
Vi har försökt förtydliga hur vi tänker oss att kategorierna ska visas i bifogad ppt.

Det är tre olika kolumner vi önskar (Förslag på rubriker – men kommer ni på bättre och kortare rubriker, föreslå gärna det):
Översiktsvyn – ”Anmälningspliktig resistens” och ”Genkategori”
Detaljinfo för varje isolat - ”Kategori”

Tror ni att något i denna stil är möjligt att skapa?
Återkom gärna med frågor och förslag!

Mvh
Inga

Typing of Mycobacterium abscessus fail

Describe the bug
Files added to:
/home/proj/production/microbial/references/ST_loci/mycobacterium_abscessus
and
/home/proj/production/microbial/references/ST_profiles/mycobacterium_abscessus

Despite the files are added (seemingly) in the same way as files has been added to the database for the other organisms the typing still fail (see ticket 715235 for example).

CG doesnt find mandatory files

Describe the bug
CG is complaining about mandatory files not existing. See /home/proj/production/logs/cg.workflow-microsalt-store-completed.log

"Total reads" header

I suggest that when "Total reads" is used, and it is intended to refer to "Total reads mapped", then this should be more clear. Now the "Total reads" header that is used in some tables lead to 0 total reads in some cases, which can be confusing.

For example "Total reads" is used in the QC report table.

/Henning

Add static report storage

In order to support interaction with other tools; the reports generated from microSALT need to be stored better.
Allocate a reports folder, then add practically all reports into subfolders of said folder.

Issues with downloading from PubMLST

Describe the bug
PubMLST references are not updated for some species. The following error is displayed when starting an analysis:
HTTP Error 404: Not Found

To Reproduce
Run microSALT for a sample.

Expected behavior
The ST_profile and ST_loci should be updated with the most recent data from PubMLST for the species and there should be no HTTP errors when running microSALT.

Make validation samples available

Describe the bug
The following projects are used to validate microSALT: MIC3109, MIC4107, MIC4109 & ACC5551.

  • Only ACC5551 can be found in statusDB/housekeeper.
  • ACC5551 has spring-compressed samples that cannot be decompressed with cg decompress. On closer inspection, the case is a mix of validation and routine samples. May be a good idea to have a separate case for the validation samples only.

Expected behavior
All cases and samples should be available for validations in cg.

Additional context

Updating compress api
Set dry run to False
Fetch latest version from bundle ACC5551A31
Found file /home/proj/production/demultiplexed-runs/190417_A00621_0064_AHJM7JDSXX/Unaligned-Y151I10I10Y151/Project_609108/Sample_ACC5551A31/HJM7JDSXX_609108_S313_L004.spring
Found file /home/proj/production/demultiplexed-runs/190417_A00621_0064_AHJM7JDSXX/Unaligned-Y151I10I10Y151/Project_609108/Sample_ACC5551A31/HJM7JDSXX_609108_S313_L002.spring
Found file /home/proj/production/demultiplexed-runs/190417_A00621_0064_AHJM7JDSXX/Unaligned-Y151I10I10Y151/Project_609108/Sample_ACC5551A31/HJM7JDSXX_609108_S313_L001.spring
Found file /home/proj/production/demultiplexed-runs/190417_A00621_0064_AHJM7JDSXX/Unaligned-Y151I10I10Y151/Project_609108/Sample_ACC5551A31/HJM7JDSXX_609108_S313_L003.spring
Check if pending compression file exists
/home/proj/production/demultiplexed-runs/190417_A00621_0064_AHJM7JDSXX/Unaligned-Y151I10I10Y151/Project_609108/Sample_ACC5551A31/HJM7JDSXX_609108_S313_L004.crunchy.pending.txt does not exist
Check if SPRING archive file exists
Check if FASTQ pair exists
FASTQ files already exists
SPRING to FASTQ decompression not possible for ACC5551A31
Skipping sample ACC5551A31

Duplication rate missing in json output

Describe the bug
The duplication_rate field is empty in the output json file when it should not be empty. The values are reported in the html output report.

Improve description of analytical limitations in reports

Is your feature request related to a problem? Please describe.

Analytical limitations is sparsely described in the microSALT report.

Current descriptions are included below.

The descriptions should e.g. include the following:

  • Analytical limitations of the bioinformatic analysis
  • Limits of detection
  • More thorough description of prep and sequencing method.

Describe the solution you'd like
Improve descriptions according to SWEDAC requirements.

  • Remove text Analysen kan enbart beställas av gruppen Klinisk Mikrobiologi.
  • Decide what information should be added
  • Update text in reports

Additional context

Limitations in QC-report:

Tillförlitligheten hos resultaten förutsätter att informationen som bifogats från kund är korrekt.

Limitations in typing report:

Analysen kan enbart beställas av gruppen Klinisk Mikrobiologi. Laboratoriet har inte haft ansvar för provtagningsstadiet och extraktion, resultaten gäller för provet såsom det har mottagits. Typningen begränsas av den information som vid analys återfinns i de publikt tillgängliga databaserna pubMLST och resFinder. Tillförlitligheten hos resultaten förutsätter dels att informationen som bifogats från kund är korrekt. Dels att proverna uppnår de fördefinierade tröskelvärdena; och dels att de organismerna som analyseras har tidigare manuellt verifierats tidigare av personal på Clinical Genomics.

Technical Description

Mikrobiell helgenomssekvensering av rutinprov, med krav på minst 3 miljoner läspar. Nextera library preparation.

Add all samples to report

Is your feature request related to a problem? Please describe.
The QC and typing report should always contain all samples in the ticket. Even if some of these samples have failed sequencing or not (yet) been sequenced.

Describe the solution you'd like

  • Add row to the summary columns in both reports
  • Fields == ""?
  • If sample failed sequencing this should be noted somewhere in the summary

Describe alternatives you've considered
None

Additional context
None

What should we trend in vogue for microsalt?

Trending statistics from microsalt analyses in vogue can give an insight into the quality of the lab work performed by Clinical genomics. We need to decide which metrics to monitor.

The metrics that matters when we deliver data to the customers:

  • Number of reads
  • 10x coverage
  • If typing was successful

What I suggest we trend:

  • Fraction of samples that get the ordered number of reads
  • 10x coverage for MWR app tags
  • If typing was successful for MWR app tags

Data shown by vogue won't say much about the quality of our work at Clinical genomics, basically all metrics are affected by what the customer write when they make the order. If a customer write wrong species, or submit a contaminated sample, then coverage will be low and the typing will fail.

The only important metric I can think off that we fully control if we reach the ordered number of reads or not for the samples. The customers that order MWR usually order well known patogens with support for typing by the pubMLST database, that also will be easy to type with MALDI-TOF. Therefore, the reference from start tend to be correct and vary less for the MWR-orders. For the research app tags the effort is very low when it comes to guess what species that is submitted. Because of this I suggest 10x coverage and successful typing is only tracked for MWR app tags.

Is it possible to only track metrics for a certain app tag? What do you think about my suggestion regarding metrics to pay attention to? @moahaegglund @talnor @karlnyr @Vince-janv @Karl-Svard @keyvanelhami

This issue can be closed when we have agreed on what metrics we should pay extra attention for monitoring the quality of microsalt orders over time.

ERROR - Reference update function failed prematurely. Review immediately

Describe the bug
Cronic detected failure for the command:
/home/proj/production/servers/resources/hasta.scilifelab.se/crontabs/microsalt-start.sh

ERROR - Reference update function failed prematurely. Review immediately
Traceback (most recent call last):
File "/home/proj/bin/conda/envs/P_microSALT/bin/microSALT", line 8, in
sys.exit(root())
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/cli.py", line 396, in autobatch
ext_refs.update_refs()
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/utils/referencer.py", line 83, in update_refs
self.fetch_external(self.force)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/utils/referencer.py", line 160, in fetch_external
profile_no = re.search(r"\d+", sample[2]).group(0)
AttributeError: 'NoneType' object has no attribute 'group'

RESULT CODE: 1

To Reproduce
My guess on Steps to reproduce the behavior:

  1. Become hiseq.clinical
  2. Bash /home/proj/production/servers/resources/hasta.scilifelab.se/crontabs/microsalt-start.sh

Expected behavior
No errors, all waiting microbial projects should have been started

Screenshots
If applicable, add screenshots to help explain your problem.

Software version (please complete the following information):

Additional context
Add any other context about the problem here.

Should we report the number of non-trimmed reads?

Is your feature request related to a problem? Please describe.
A situation we have ended up in: A sample get too few reads. Then more sequencing data is generated for that sample. After as much as 5 million reads are generated (only 3 million needed) then we have generated as much reads as we have promised. However, more than 2 million reads can be discarded by trimmomatic after trimming leading to that less than 3 million reads are displayed in the QC-report.

Now we have done what we have promised, but it still looks like we have not done that when we deliver to the customer. Should we add a column for total number of non-trimmed reads? Adding such a column would give the customer better understanding for the data.

Describe the solution you'd like

  • Total reads should display the raw number of reads
  • Add a new column to the report for the number of trimmed reads

Describe alternatives you've considered
None

Additional context
None

Update housekeeper deliverables

Is your feature request related to a problem? Please describe.
Deliverables could be updated. Some files are unnecessarily stored. Storage tag should correspond to files that are stored
indefinitely?

See attached file for more info.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

microSALT_tags_to_store_in_hk.xlsx

Chronic: microSALT attempts to write to a read only database

Cron hiseq.clinical@hasta $CRONIC ${PRODUCTION_HOME}/servers/resources/hasta.scilifelab.se/crontabs/microsalt-start.sh >> ${LOG_BASE}/microsalt.analysis.log 2> >(tee -a ${LOG_BASE}/microsalt.analysis.log >&2)

Cronic detected failure for the command:
/home/proj/production/servers/resources/hasta.scilifelab.se/crontabs/microsalt-start.sh

touch: cannot touch ‘/home/proj/production/microbial/meta/microsalt.db’: Permission denied
ERROR - Reference update function failed prematurely. Review immediately
INFO - pubMLST reference for Staphylococcus aureus updated to 2020-07-13 from 2020-07-10
INFO - Re-indexed contents of /home/proj/production/microbial/references/ST_loci/staphylococcus_aureus
Traceback (most recent call last):
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context
cursor, statement, parameters, context
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: attempt to write a readonly database

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/proj/bin/conda/envs/P_microSALT/bin/microSALT", line 8, in
sys.exit(root())
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/cli.py", line 396, in autobatch
ext_refs.update_refs()
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/utils/referencer.py", line 82, in update_refs
self.fetch_pubmlst(self.force)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/utils/referencer.py", line 484, in fetch_pubmlst
{"version": external_ver},
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/microSALT/store/db_manipulator.py", line 160, in upd_rec
eval(megastring + ".update(upd_dict)")
File "", line 1, in
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 4009, in update
update_op.exec
()
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1697, in exec

self._do_exec()
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1893, in _do_exec
self._execute_stmt(update_stmt)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3514, in _execute_crud
return conn.execute(stmt, self._params)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1020, in execute
return meth(self, multiparams, params)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_clauseelement
distilled_params,
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1324, in execute_context
e, statement, parameters, cursor, context
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1518, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from
=e
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise

raise exception
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context
cursor, statement, parameters, context
File "/home/proj/bin/conda/envs/P_microSALT/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) attempt to write a readonly database
[SQL: UPDATE versions SET version=? WHERE versions.name = ?]
[parameters: ('2020-07-13', 'profile_staphylococcus_aureus')]
(Background on this error at: http://sqlalche.me/e/e3q8)

RESULT CODE: 1

Automatic reference downloader doesnt kick in

Describe the bug
Automatic reference downloader doesnt kick in.

To Reproduce
Remove an existing reference or add a new one. It wont be downloaded by microSALT.

Expected behavior
Its expected to be downloaded.

Additional context
It could possibly be due to changes in NCBIs API. Or a timeout issue. Or the reference name is parsed incorrectly
Identified through sample ACC7248A23

Trimmomatic randomly fails

Describe the bug
Trimming with trimmomatic sometimes (seemingly) randomly fails. It has been observed that runs that fail may complete without errors when running the same command again. This has been occuring since fixing the microSALT installation issues in PR #142.

Expected behavior
Trimmomatic should complete without errors.

Screenshots

# A fatal error has been detected by the Java Runtime Environment:  
#  
# SIGILL (0x4) at pc=0x00007f51c8695367, pid=5762, tid=5819  
#  
# JRE version: OpenJDK Runtime Environment (11.0.2+7) (build 11.0.2+7)  
# Java VM: OpenJDK 64-Bit Server VM (11.0.2+7, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)  
# Problematic frame:  
# J 784 c2 org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer$IlluminaLongClippingSeq.readsSeqCompare(Lorg/usadellab/trimmomatic/fastq/FastqRecord;)Ljava/lang/Integer; (306 bytes) @ 0x00007f51c8695367 [0x00007f51c8693e40+0x0000000000001527]  
#  
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again  
#  
# If you would like to submit a bug report, please visit:  
# https://github.com/AdoptOpenJDK/openjdk-build/issues  
#

Software version:

  • Trimmomatic version 0.39
  • openjdk version 11.0.1
  • microSALT version 3.3.4 (4d369b0)

Slurm id file naming

The naming of the slurm ids yaml file differs depending on the number of samples in the project which does not make sense at all and just makes things unnecessarily complicated. Change it to always use the project id.

Replace microSALT LIMS dependency

Currently microSALT is dependent on a Genologics LIMS instance.
Remove this LIMS dependency, and replace it with sample info being read in through a JSON file through the CLI.

Add catch for samples mixed with same species

Is your feature request related to a problem? Please describe.
We can expect samples to be contaminated on rare occasions. If a sample should be contaminated by a another sample of the same species, the typing results might be affected. It would be good to have a way of catching these cases and ensure they are not marked as passed.

There are two possible outcomes affecting the typing of such a sample:

  1. One or more loci lack a contig that completely covers the region (currently marked as failed)
  2. One or more loci have several contigs that completely covers the region

Describe the solution you'd like
Implement a check for outcome 2 and mark such samples in the typing report.

Change report to follow accreditation guidelines

Add things to report that is required by accreditation
7.8.2.1

  • Titel
  • Laboratory name and adress (Clinical genomics)
  • End of report
    (- [ ] Page numbers - skipping that for now)
  • Customer name and contact information - INSTEAD use CustID (statusDB)
  • Application tag (LIMS)
  • cust-id (LIMS) (to be able to show customer information in AM/statusDB)
  • Version of uSALT (from where?)
  • Description of the sample-object, = sampleID
  • Dates for reception, library prep, sequencing and analysis (LIMS)
  • Method document and version (Sequencing and Library prep)
  • Date for printing of report
  • The result
  • Deviations from the method
  • Identification of the person(s) approving the report
  • Teknisk beskrivning av analysen (nu hårdkodad text - i framtid från statusDB)
  • Begränsningar av analysen (nu hårdkodad text - i framtid från statusDB)
  • Översätt till svenska - åäö problem

7.8.2.2

  • Separate table holding information from customer/star for customer information
  • Friskrivningsklausul "The results apply to the sample that has been provided customer"

Emails with reports from slurm not received

Describe the bug
It seems like emails that are sent from slurm are not recieved. I.e. no emails are sent when an analysis run finishes. Reports are however sent when running microsalt utils report manually. Makes it difficult to keep track of when microbial analyses are done.

To Reproduce
Steps to reproduce the behavior:
Run the following as bash and sbatch:

  • source activate P_microSALT
  • microSALT utils report /home/proj/production/microbial/queries/934197.json --email <>@scilifelab.se

assembly issues in arcC loci for Staphylococcus Aureus

Describe the bug
Since the switch from NovaSeq 6000 to NovaSeq X, Staphylococcus Aureus samples have started failing in the analysis. The issue is very systematic and has a similar effect on the analysis of a relatively large percentage of the Staphylococcus Aureus samples. This is seen as the contig covering the arcC loci is split in the middle of the region, meaning that no MLST type can be reliably assigned to the sample due to insignificant coverage of any of the spanning contigs. See more info in the deviation here.

The issue needs to be fixed so that these samples can be typed in microSALT.

To Reproduce
Steps to reproduce the behavior:

  1. Run microsalt on a Staphylococcus Aureus sample sequenced on the NovaSeq X
  2. Check the loci results in the "MLST" table for the sample
  • The field Längd (HSP) % will show a low span of around ~70%.
  1. Check position 2631741 in AP017922.1 coordinates for an A->G minority SNP.

Expected behavior
To circumvent the issues discussed there are a number of options:

  • A. Implement the --uncareful flag in cg and rerun samples that fail. Notify customer in ticket.
  • B. Update microSALT to not run "careful" assembly" for all S.aureus samples.
  • C. Update microSALT to remove the "careful" assembly for all species by default. [preferred option]

With the data we have to work with, we think it is better to skip the spades --careful flag. Given that we get the same results as before with option C, this can be done for all samples to ensure that it is clear how the analysis is performed and to enable easier handling of microbial samples.

Test with e.g. ticket 121778.

Software version

  • MicroSALT version 3.3.5

Additional context

As a side note, microSALT does still give an estimate of the loci allele for samples that fail typing QC, but because of the limited data, this allele estimation can be expected to vary when resquencing the sample.

microSALT report - text for MLST thresholds above table is not correct

Is your feature request related to a problem? Please describe.
The text above the table for MLST in the typing report doesn't correspond to the criteria in the method document for microSALT. This has led to misunderstandings between prodbioinfo team and the customers when delivering samples.

In the report it says:
Identitetströskel 100%, Längdtröskel 90%, Novel misstankströskel 99.5%

In the document:
At least one allele number for each locus is determined with 99.5% identity, and 100% span

Describe the solution you'd like
I want the correct criteria written in the report and they should be the same as in the method document.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Better support reruns

It should be possible to re-analyse a single sample without trashing the entire previous analysis.

Also review whether it is necessary for the finish and report commands to contain a config.json.

Changing Allel information in typing report for Novel allels

This request was made by cust056 in ticket 533193.

Is your feature request related to a problem? Please describe.
Customer thinks assigning the closest allel to a Loci if the identity is not 100%, is misleading.

Describe the solution you'd like
Customer suggests any of the following information in the field instead:

  • Novel
  • Novel (allelnummer som liknar mest), t ex Novel (3)
  • (allelenummer som liknar mest), t ex (3) eller 3*

Update 2022-01-19:
cust056 has also requested to have the sekvenstyp in the summary table set to okänd for the samples that fail the thresholds.

Additional context
Screenshot of the field in question is shown below:
Screenshot 2021-09-20 at 13 11 26 (1)

Screenshot of the field in the summary table:
resultatsammanstallning_ex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.