genomehubs / demo Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 88 KB

Example configurations to set up custom Ensembl sites using GenomeHubs

Home Page: genomehubs.org

License: MIT License

Ruby 19.94% HTML 26.77% Shell 39.74% CSS 13.54%

demo's Introduction

GenomeHubs

About

GenomeHubs comprises a set of tools to parse index and search and display genomic metadata, assembly features and sequencing status for projects under the Earth BioGenome Project umbrella that aim to sequence all described eukaryotic species over a period of 10 years.

Genomehubs builds on legacy code that supported taxon-oriented databases of butterflies & moths (lepbase.org), molluscs (molluscdb.org), mealybugs (mealybug.org) and more. Genomehubs is now search-oriented and positioned to scale to the challenges of mining data across almost 2 million species.

The first output from the new search-oriented GenomeHubs is Genomes on a Tree (GoaT, goat.genomehubs.org), which has been opublised in: Challis et al. 2023, Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Research, 8:24 doi:10.12688/wellcomeopenres.18658.1

The goat.genomehubs.org website is freely available with no logins or restrictions, and is being widely used by the academic community and especially by the Earth BioGenome Project to plan and coordinate efforts to sequence all described eukaryotic species.

The core GoaT/Genomehubs components are available as a set of Docker containers:

GoaT UI

A bundled web server to run a GoaT-specific instance of the GenomeHubs UI, as used at goat.genomehubs.org.

Usage

docker pull genomehubs/goat:latest

docker run -d --restart always \
    --net net-es -p 8880:8880 \
    --user $UID:$GROUPS \
    -e GH_CLIENT_PORT=8880 \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_SUGGESTED_TERM=Canidae \
    --name goat-ui \
    genomehubs/goat:latest

Genomehubs UI

A bundled web server to run an instance of the GenomeHubs UI, such as goat.genomehubs.org.

Usage

docker pull genomehubs/genomehubs-ui:latest

docker run -d --restart always \
    --net net-es -p 8880:8880 \
    --user $UID:$GROUPS \
    -e GH_CLIENT_PORT=8880 \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_SUGGESTED_TERM=Canidae \
    --name gh-ui \
    genomehubs/genomehubs-ui:latest

Genomehubs API

A bundled web server to run an instance of the GenomeHubs API. The GenomeHubs API underpins all search functionality for Genomes on a Tree (GoaT) goat.genomehubs.org. OpenAPI documentation for the GenomeHubs API instance used by GoaT is available at goat.genomehubs.org/api-docs.

Usage

docker pull genomehubs/genomehubs-api:latest

docker run -d \
    --restart always \
    --net net-es -p 3000:3000 \
    --user $UID:$GROUPS \
    -e GH_ORIGINS="https://goat.genomehubs.org null" \
    -e GH_HUBNAME=goat \
    -e GH_HUBPATH="/genomehubs/resources/" \
    -e GH_NODE="http://es1:9200" \
    -e GH_API_URL=https://goat.genomehubs.org/api/v2 \
    -e GH_RELEASE=$RELEASE \
    -e GH_SOURCE=https://github.com/genomehubs/goat-data \
    -e GH_ACCESS_LOG=/genomehubs/logs/access.log \
    -e GH_ERROR_LOG=/genomehubs/logs/error.log \
    -v /volumes/docker/logs/$RELEASE:/genomehubs/logs \
    -v /volumes/docker/resources:/genomehubs/resources \
    --name goat-api \
genomehubs/genomehubs-api:latest;

Genomehubs CLI

command line tool to process and index genomic metadata for GenomeHubs. Used to build and update GenomeHubs instances such as Genomes on a Tree goat.genomehubs.org.

Usage

docker pull genomehubs/genomehubs:latest

Parse [NCBI datasets](https://www.ncbi.nlm.nih.gov/datasets/) genome assembly metadata:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs parse \
            --ncbi-datasets-genome sources/assembly-data \
            --outfile sources/assembly-data/ncbi_datasets_eukaryota.tsv.gz"

Initialise a set of ElasticSearch indexes with [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy/) data for all eukaryotes:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs init \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --taxonomy-jsonl sources/ena-taxonomy/ena-taxonomy.extra.jsonl.gz \
            --taxonomy-ncbi-root 2759 \
            --taxon-preload"

Index assembly metadata:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs index \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --assembly-dir sources/assembly-data"

Fill taxon attribute values across the tree of life:

docker run --rm --network=host \
    -v `pwd`/sources:/genomehubs/sources \
     genomehubs/genomehubs:latest bash -c \
        "genomehubs fill \
            --es-host http://es1:9200 \
            --taxonomy-source ncbi \
            --config-file sources/goat.yaml \
            --traverse-root 2759 \
            --traverse-infer-both"

Related projects

Some GenomeHubs components are hosted in separate open source repositories (all under MIT licenses), including:

BlobToolKit

Interactive quality assessment of genome assemblies.

Explore analysed public assemblies at blobtoolkit.genomehubs.org/view

GoaT CLI

A command line interface for GoaT.

The GoaT CLI builds URLs to query the Goat API, removing some of the complexity of the GoaT API. for the end user.

demo's People

Contributors

Stargazers

Watchers

Forkers

ajs6f cnyuanh

demo's Issues

h5ai server 500 - Internal Server Error

The Demo script seems to run correctly (up to a point - see Issue #3 ) but attempting to connect results in a "500 - Internal Server Error"

Can't import local files

problem like this:
cp: cannot stat '/genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz': No such file or directory
ERROR: could not cp /genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz to Ptrichocarpa_210_v3.0.gene.gff3.gz
preparing gff
cp: cannot stat '/genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz': No such file or directory
ERROR: could not cp /genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz to Ptrichocarpa_210_v3.0.gene.gff3.gz
importing gene models
cp: cannot stat '/genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz': No such file or directory
ERROR: could not cp /genomehubs/Phytozome/PhytozomeV12_unrestricted/Ptrichocarpa/v3.0/annotation/Ptrichocarpa_210_v3.0.gene.gff3.gz to Ptrichocarpa_210_v3.0.gene.gff3.gz
verifying import

import.sh won't run

Distributor ID: Ubuntu
Description: Ubuntu 14.10
Release: 14.10
Codename: utopic

Followed instructions here - http://genomehubs.org/documentation/installing-docker/ and here https://easy-import.readme.io/docs/quick-start-guide with same results as below...?

Any thoughts? Seems to fail at step 3 on permissions!?

bash demo/import.sh
Step 1. Set up mySQL container
b730d173423ec43c53b16c1edb03c680cad41365162e783df612681c775c29a2
Step 2. Set up template database using EasyMirror
Working on ftp://ftp.ensemblgenomes.org/pub/release-32/pan_ensembl/mysql//ncbi_taxonomy as ncbi_taxonomy
ncbi_taxonomy.ncbi_taxa_name: Records: 2230273 Deleted: 0 Skipped: 0 Warnings: 0
ncbi_taxonomy.ncbi_taxa_node: Records: 1473354 Deleted: 0 Skipped: 0 Warnings: 0
Working on ftp://ftp.ensemblgenomes.org/pub/release-32/metazoa/mysql//melitaea_cinxia_core_32_85_1 as melitaea_cinxia_core_32_85_1
melitaea_cinxia_core_32_85_1.alt_allele: Records: 0 Deleted: 0 Skipped: 0 Warnings: 0
CLIPPED
Step 3. Import sequences, prepare gff and import gene models
mkdir: cannot create directory 'operophtera_brumata_v1_core_32_85_1': Permission denied
/import/startup.sh: line 51: cd: operophtera_brumata_v1_core_32_85_1: No such file or directory
mkdir: cannot create directory 'log': Permission denied
importing sequences
tee: log/import_sequences.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
Obru_genes.gff.gz: Permission denied
ERROR: could not wget http://download.lepbase.org/v4/provider/Obru_genes.gff.gz to Obru_genes.gff.gz
preparing gff
tee: log/prepare_gff.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
Obru_genes.gff.gz: Permission denied
ERROR: could not wget http://download.lepbase.org/v4/provider/Obru_genes.gff.gz to Obru_genes.gff.gz
importing gene models
tee: log/import_gene_models.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
Obru_genes.gff.gz: Permission denied
ERROR: could not wget http://download.lepbase.org/v4/provider/Obru_genes.gff.gz to Obru_genes.gff.gz
Step 4. Export sequences, export json and index database for imported Operophtera brumata
mkdir: cannot create directory 'operophtera_brumata_v1_core_32_85_1': Permission denied
/import/startup.sh: line 51: cd: operophtera_brumata_v1_core_32_85_1: No such file or directory
mkdir: cannot create directory 'log': Permission denied
exporting sequences
mkdir: cannot create directory '/import/download/sequence': Permission denied
tee: log/export_sequences.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.meta' doesn't exist at /ensembl/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseMetaContainer.pm line 140, line 94.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.meta' doesn't exist at /ensembl/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseMetaContainer.pm line 140, line 94.
/import/startup.sh: line 143: cd: exported: No such file or directory
ls: cannot access Operophtera_brumata_v1.scaffolds.fa: No such file or directory
ls: cannot access Operophtera_brumata_v1.cds.fa: No such file or directory
ls: cannot access Operophtera_brumata_v1.proteins.fa: No such file or directory

cp: cannot stat 'exported/Operophtera_brumata_v1.scaffolds.fa': No such file or directory
parallel: Warning: $SHELL not set. Using /bin/sh.
Can't do inplace edit: /import/blast/ is not a regular file.
Can't rename /import/blast/.scaffolds.fa /import/blast/_scaffolds.fa: No such file or directory
Can't rename /import/blast/.cds.fa /import/blast/_cds.fa: No such file or directory
Can't rename /import/blast/.proteins.fa /import/blast/_proteins.fa: No such file or directory
gzip: exported/.fa: No such file or directory
mv: cannot stat 'exported/.gz': No such file or directory
exporting json
mkdir: cannot create directory '/import/download/json': Permission denied
mkdir: cannot create directory '/import/download/json': Permission denied
mkdir: cannot create directory '/import/download/json': Permission denied
mkdir: cannot create directory '/import/download/json': Permission denied
tee: log/export_json.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.meta' doesn't exist at /ensembl/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseMetaContainer.pm line 140, line 94.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.meta' doesn't exist at /ensembl/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseMetaContainer.pm line 140, line 94.
done
mv: cannot stat 'web/.codon-usage.json': No such file or directory
mv: cannot stat 'web/.assembly-stats.json': No such file or directory
mv: cannot stat 'web/*.meta.json': No such file or directory
indexing database
tee: log/index_database.err: No such file or directory
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
keys on reference is experimental at /ensembl/easy-import/modules/EasyImport/Core.pm line 738.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.meta' doesn't exist at /ensembl/easy-import/modules/EasyImport/Search.pm line 26, line 94.
DBD::mysql::st fetchrow_array failed: fetch() without execute() at /ensembl/easy-import/modules/EasyImport/Search.pm line 27, line 94.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.seq_region' doesn't exist at /ensembl/easy-import/modules/EasyImport/Search.pm line 11, line 94.
DBD::mysql::st fetchrow_array failed: fetch() without execute() at /ensembl/easy-import/modules/EasyImport/Search.pm line 12, line 94.
DBD::mysql::st execute failed: Table 'operophtera_brumata_v1_core_32_85_1.gene' doesn't exist at /ensembl/easy-import/modules/EasyImport/Search.pm line 39, line 94.
DBD::mysql::st fetchrow_array failed: fetch() without execute() at /ensembl/easy-import/modules/EasyImport/Search.pm line 40, line 94.
Unable to set up GenomeHubs site, removing containers
genomehubs-mysql
genomehubs-mysql
Error response from daemon: no such id: genomehubs-ensembl
Error: failed to stop containers: [genomehubs-ensembl]
Error response from daemon: no such id: genomehubs-h5ai
Error: failed to stop containers: [genomehubs-h5ai]
Error response from daemon: no such id: genomehubs-sequenceserver
Error: failed to stop containers: [genomehubs-sequenceserver]

issues setting up the DEMO on a HEADLESS server

I have ran into issues running the demo site on an VM as a demo for Smithsonian Inst.

I have changed the following parameters inside /gh-ensembl-plugin/conf/ini-files/DEFAULTS.ini to get around the local loopback issues (127.0.0.1)

BLAST_URL = http://0.0.0.0:8083
DOWNLOAD_URL = http://0.0.0.0:8082
ASSEMBLY_STATS_URL = http://0.0.0.0:8082/html/assembly-stats/assembly-stats.html?path=/demo/json/assemblies/&
CODON_USAGE_URL = http://0.0.0.0:8082/html/codon-usage/codon-usage.html?path=/demo/json/annotations/&

The main ensembl site comes up on my public IP on port 8081 and i'm able to browse each "modules" on 8082,and 8083 individually. However the main site on 8081, shows broken links (404) when clicking on "Blast" or "BioMart", not sure if other links are working correctly either.

access logs on the container, show:
[444/- - -/-=-] 160.111.254.17/- -/- -/- [17/Oct/2017:19:54:29 +0000] "GET /Multi/Tools/Blast?db=core HTTP/1.1" 404 3222 "http://34.193.222.50:8081/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" "-" -/-

Waiting for site to load

The demo script seems to run through correctly for all steps up to 9, which just sits at a loading screen and gives this error "This site can’t be reached" when trying to access it in browser.

Step 8. Startup GenomeHubs Ensembl mirror
c8f00fceb15df551521befcd312d1c60207c32fcf3ab7c89d975656792f0c008
Step 9. Waiting for site to load
........................................

Sorry, not sure what more data I can give to you to troubleshoot this...I still need to find a machine where I can run as UID1000 maybe it is related?

download URL's are broken

Demo site's downpage sample downloads (FASTA, mysql, etc) are broken they point to:
ftp://ftp.ensemblgenomes.org/pub/release-demo/metazoa/fasta/melitaea_cinxia/dna/

However the "release-demo" folder doesn't exist on the ftp site.
ftp://ftp.ensemblgenomes.org/pub/release-demo

demo fails to start

Hi, I'm trying to set up a demo of GenomeHub as described here but startup is failing out with a problem finding a config file:

awk: cannot open /ensembl/conf/database.ini (No such file or directory)

Here's the whole log for review. I'm assuming that the awk calls are getting made inside some Dockerfile or setup script in one of the many images that are involved here?

Thanks for any help!

genomehubs / demo Goto Github PK

demo's Introduction

GenomeHubs

About

GoaT UI

Usage

Genomehubs UI

Usage

Genomehubs API

Usage

Genomehubs CLI

Usage

Related projects

BlobToolKit

GoaT CLI

demo's People

Contributors

Stargazers

Watchers

Forkers

demo's Issues

Recommend Projects

Recommend Topics

Recommend Org