Code Monkey home page Code Monkey logo

lrsday's Introduction

LRSDAY

LRSDAY logo

LRSDAY: Long-read Sequencing Data Analysis for Yeasts

A highly transparent, automated and powerful computational framework for high-quality genome assembly and annotation.

License: MIT

Description

Long-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here we present LRSDAY, the first one-stop solution to streamline this process. LRSDAY can produce chromosome-level end-to-end genome assembly and comprehensive annotations for various genomic features (including centromeres, protein-coding genes, tRNAs, transposable elements and telomere-associated elements) that are ready for downstream analysis. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable for virtually any eukaryotic organisms.

LRSDAY_flowchart

Citations

Jia-Xing Yue & Gianni Liti. (2018) Long-read sequencing data analysis for yeasts. Nature Protocols, 13:1213–1231.

Jia-Xing Yue, Jing Li, Louise Aigrain, Johan Hallin, Karl Persson, Karen Oliver, Anders Bergström, Paul Coupland, Jonas Warringer, Marco Cosentino Lagomarsino, Gilles Fischer, Richard Durbin, Gianni Liti. (2017) Contrasting evolutionary genome dynamics between domesticated and wild yeasts. Nature Genetics, 49:913-924.

Release history

  • v1.7.2 Released on 2023/03/20
  • v1.7.1 Released on 2023/01/28
  • v1.7.0 Released on 2022/12/31
  • v1.6.0 Released on 2019/10/03
  • v1.5.0 Released on 2019/05/13
  • v1.4.0 Released on 2019/03/21
  • v1.3.1 Released on 2019/01/22
  • v1.3.0 Released on 2018/11/13
  • v1.2.0 Released on 2018/10/15
  • v1.1.0 Released on 2018/07/11
  • v1.0.0 Released on 2018/02/04

License

LRSDAY itself is distributed under the MIT license. A number of LRSDAY's dependencies (e.g. CAP3, MAKER, GATK, blat, RepBase, etc) are under more restricted licenses, for which commerical use of the software needs to be discussed with the corresponding developers.

Requirements

Hardware, operating system and network

This protocol is designed for a desktop or computing server running an x86-64-bit Linux operating system. Multithreaded processors are preferred to speed up the process since many steps can be configured to use multiple threads in parallel. For assembling and analyzing the budding yeast genomes (genome size = ~12.5 Mb), at least 16 Gb of RAM and 100 Gb of free disk space are recomended. More disk space is needed if you want to use LRSDAY to do Nanopore basecalling. When adapted for other eukaryotic organisms with larger genome sizes, the RAM and disk space consumption will scale up, majorly during de novo genome assembly (performed by Canu by default. Plese refer to Canu’s manual for suggested RAM and disk space consumption for assembling large genomes. Stable Internet connection is required for the installation and configuration of LRSDAY as well as for retrieving the test data.

Software or library requirements

lrsday's People

Contributors

fomightez avatar jiblits avatar nicolo-tellini avatar yjx1217 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lrsday's Issues

logs and err - for the installer

Hello JX,

the installer runs several software.

what about redirecting logs and errors output inside software-named txt and storing them inside an appropriate folder?

For example

what if the main installer contains several .sh scripts each one installing a specific software ( for example: guppy.sh) ?

Again, let's say guppy.sh contains :

#!/bin/bash

# common vars
LRSDAY_HOME=$(pwd)
build_dir=$LRSDAY_HOME"/build"

# specific vars
GUPPY_VERSION="4.0.15" # released on 2020.08.30
GUPPY_DOWNLOAD_URL="https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy-cpu_${GUPPY_VERSION}_linux64.tar.gz"

# software dir
guppy_dir="$build_dir/ont-guppy-cpu/bin"

if [ -z $(check_installed $guppy_dir) ]; then
   cd $build_dir
   echo "Download Guppy-v${GUPPY_VERSION}"
   download_and_extract $GUPPY_DOWNLOAD_URL "ont-guppy-cpu_${GUPPY_VERSION}_linux64.tar.gz"
   note_installed $guppy_dir 
fi

and the main installation script runs :

/usr/bin/time -v bash guppy.sh > $build_dir/logs/guppy.log 2> $build_dir/errs/guppy.err

At the end of the software installation .err files will contain, for example :

	Command being timed: "bash  guppy.sh"
	User time (seconds): 53314.37
	System time (seconds): 43012.15
	Percent of CPU this job got: 2431%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:06:00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 15448
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 5286729714
	Voluntary context switches: 70514711
	Involuntary context switches: 1405776
	Swaps: 0
	File system inputs: 16
	File system outputs: 488704
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

After the bash guppy.sh the installation can be easily checked for the presence of errors as follow :

        check=$(grep Exit ${build_dir}/logs/guppy.err | cut -d":" -f2 | tr -d "[:blank:]")
	if [ $check == 1 ]
	then
	echo "Exit at guppy.sh"
	exit
	else
	echo "guppy.sh ... OK"	
	fi

This is a lot reorganization but it can make the installation process easier to debug.

best,

Nic

Installing PerlIO::gzip failed

I'm installing LRSDAY-v1.7.1 on Ubuntu 22.04.2 LTS. Running install_dependencies.sh raises the error "! Installing PerlIO::gzip failed.", I've tried passing the --force flag into misc/cpanm but I still get the same error.

Here's the build.log, any input for troubleshooting would be greatly appreciated, thanks!

Interest in a refactor of the installation script?

I found that getting the install script to work in one-shot to be a bit tricky so I've been refactoring it to try and reduce some duplication as well as make it more straightforward to install on a fresh system.

My deployment scenario is on-demand cloud services. Basically the script runs, it installs all the system requirements followed by all the other LRSDAY dependencies within the script. The script also attempts to mark what has been installed to avoid building programs that have already been successful.

How to fix broken links and dependencies - LRSDAY

--- POST UPDATE---

Last update: 18/08/2022

--- END ---

The operations below are the results of hard debugging on the OS:

Linux Ubuntu 18.04.5 LTS 5.4.0-122-generic #138~18.04.1-Ubuntu SMP x86_64 GNU/Linux

I think that, as long as the OS is Debian/Linux distribution based on Debian (Ubuntu, Mint, Kali, Raspberry Pi OS, etc ...), the following points might be helpful to succeed with the installation and complete the tutorial of LRSDAY.

Please, note that the tutorial make use of PacBio reads. I did not test the software for analyzing nanopore reads, yet.

Please, install all the software with copy-paste from the installer as a new installer is not available yet.

You will need to make a few corrections along the way.

Note that the software may not occur in the same order of the installer so, before installing, check if they are on the list :

  • cpanm needs libpq-dev (it is a Debian package see here. For SUSE the package should be this while I did not find the corresponding package for RedHat). If you install either in SUSE or RedHat consider to test if cpanm works properly without installing additional packages.

  • nanopolish installation from ubuntu repo results in nanopolish v.0.9.0 (I need to check if it is compatible with LRSDAY script; this is true for all the software to analyze MinION data);

  • CAP3 requires manual installation from Iowa State University;

  • GATK3 moved to GATK4 so GATK3 needs manual installation from Google Cloud GATK. Please, note that GATK3 requires java 8 / openjdk v.1.8.0_302 find here how to install it and here how to set it as default;

  • MAKER3.00.0 is outdated and not available anymore, a new version can be found at @Yandell-Lab (v3.01.04)

  • pbalign v 0.3.0 was installed from ubuntu repos because of conda conflicts with other packages versions. As general rule, to find out if a package is available from Ubuntu repos :

nico@Thor: apt-cache policy pbalign
pbalign:
  Installed: 0.3.0-1
  Candidate: 0.3.0-1
  Version table:
 *** 0.3.0-1 500
        500 http://it.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages
        500 http://it.archive.ubuntu.com/ubuntu bionic/universe i386 Packages
        100 /var/lib/dpkg/status

The command apt-cache policy runs also on software not installed yet.

  • bax2bam v0.0.9 and 0.0.11 provided by conda do not run properly for a series of bugs never fixed by the developer.
    see bug v 0.0.11 issue 398,
    see bug v 0.0.9 issue 341.

The code below creates a fresh conda environment and fixes the problem.

pacbio_dir2=$build_dir/conda_pacbio_env2/bin
if [ -z $(check_installed $pacbio_dir2) ]; then
    
   cd $build_dir
    $miniconda3_dir/conda create -y -p $build_dir/conda_pacbio_env2 python=2.7
    source $miniconda3_dir/activate $build_dir/conda_pacbio_env2

    $miniconda3_dir/conda install -y -c bioconda pb-assembly=0.0.3
    $miniconda3_dir/conda install -y -c bioconda pbbam=0.18.0
    $miniconda3_dir/conda install -y -c bioconda bax2bam=0.0.9

    source $miniconda3_dir/deactivate
fi
note_installed $pacbio_dir2

note that, because a wonderful bug, conda install -y -c bioconda bax2bam=0.0.9 will end up installing the version 0.0.8 which works nicely !

  • because of gnuplot error ( Inappropriate ioctl for device ) I installed mummer4.0.0beta5 and gnuplot 5.2 patchlevel 2 ( This second requires superuser privilege for running make install AND it will replace the gnuplot installed on your system so keep it in mind! ). With gnuplot 5.2 patchlevel 2 "Inappropriate ioctl for device" is a warning/not fatal error that allows the script to terminate correctly. If you cannot reach the end of the run please consider the solutions at the issue #36 proposed by the users asdcid and/or gonzalezibeas.

  • before configuring RepeatMasker please check the package libtext-soundex-perl is installed (here HOWTO).

  • variantCaller is not available in pb-assembly v.0.0.8, anymore. A solution is to install pb-assembly v.0.0.3, along with pb-assembly v.0.0.8. Please, note that v. 0.0.3 has several conflicts with python 3.7 packages thus it requires python 2.7. The path to variantCaller needs to be changed accordingly. variantCaller is a function installed with pb-assembly v.0.0.3 you will not find the installation commands on the installer.

  • Bioperl functions for MFANNOT, ORGANELLAR GENOME ANNOTATION PROGRAM can be installed with:

apt-get install libbio-perl-perl
  • modify the script REannotate_longname at line 1004; replace my($SEQ,$FASTAentries_, my $MESSAGES) = @_; with my($SEQ,$FASTAentries_,$MESSAGES) = @_; as suggested by @matdechiara ;

  • IMPORTANT: Every new software version will probably come along with different PATH to the binary files. This NEW paths have to be correctly reported in env.sh and the scripts in Project_Example.

  • PROTEINORTHO v.6.0.25 installs proteinortho6.pl ; the script LRSDAY.14.Gene_Orthology_Identification.sh has to be changed with a few replacements proposed in a new branch of LRSDAY.

best,

Nic

MAKER version

Hi,
Thank you for this useful tool.
I have question about the installation of MAKER.
In the install_dependencies.sh script it is not the latest version that it is installed (3.00.0-beta vs latest that is 3.01.03)
I was wondering why this choice has been made, is it installation issues with the latest version or is the latest version output not compatible with LRSDAY pipeline?
Do you think if I use the latest version of MAKER it would still work fine for the following step?
Kind regards,

Guillaume

MFANNOT mtDNA annotation

Hello @yjx1217 ,

I'm currently using LRSDAY v1.72 and encountering the following problem during the mtDNA annotation phase.
Perhaps you've encountered this issue before.

1) Translate (flip)...
2) Blast...
3) Parse Blast Results...
4) Select best proteins for Exonerate...
5) Annotate genes w/o introns...
6) Intron identification...
Cannot read result for 'IntronI'; maybe there were problems with the commands?
Cannot read result for 'IntronII'; maybe there were problems with the commands?
7) Annotate genes with introns...
8) Identify gene fusions...
9) Annotate mini exons...
10) Annotate RNA genes...
    'rns'...
    'rnl'...
    'tRNA'...
Cannot read result for 'tRNA'; maybe there were problems with the commands?
    'rnpB'...
Cannot read result for 'rnpB'; maybe there were problems with the commands?
    'rrn5'...
Cannot read result for 'rrn5'; maybe there were problems with the commands?
11) Adjust intron boundaries...
12) Identify start codons; identify gene fusions...
13) Find extra genes with HMM
14) ORF annotation...
15) Intron ORF annotation...

thanks

nic

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.