galaxyproject / galaxy Goto Github PK

Data intensive science for everyone.

License: Other

Shell 0.50% JavaScript 14.96% Python 59.49% Mako 2.20% CSS 0.93% Perl 0.26% HTML 0.13% Makefile 0.05% Lua 0.01% Jupyter Notebook 3.47% Vue 9.32% Dockerfile 0.03% SCSS 0.71% Smarty 0.02% R 0.10% TypeScript 7.81% Sass 0.01%

bioinformatics workflow genomics science sequencing ngs dna usegalaxy pipeline workflow-engine

galaxy's Introduction

The latest information about Galaxy can be found on the Galaxy Community Hub.

Community support is available at Galaxy Help.

Galaxy Quickstart

Galaxy requires Python 3.8 . To check your Python version, run:

$ python -V
Python 3.8.18

Start Galaxy:

$ sh run.sh

Once Galaxy completes startup, you should be able to view Galaxy in your browser at: http://localhost:8080

For more installation details please see: https://getgalaxy.org/

Documentation is available at: https://docs.galaxyproject.org/

Tutorials on how to use Galaxy, perform scientific analyses with it, develop Galaxy and its tools, and admin a Galaxy server are at: https://training.galaxyproject.org/

Tools

Tools can be either installed from the Tool Shed or added manually. For details please see the tutorial. Note that not all dependencies for the tools provided in the tool_conf.xml.sample are included. To install them please visit "Manage dependencies" in the admin interface.

Issues and Galaxy Development

Please see CONTRIBUTING.md .

galaxy's People

Contributors

Stargazers

Watchers

Forkers

natefoo dannon martenson nsoranzo jmchilton peterjc roalva1 guerler andrewjrobinson kushal124 bgruening blankenberg sbelluzzo wezen nturaga davebx jxtx sarah-peter hexylena yixf-self igortopcin intel-hss jlw-laputa gregvonkuster mdshw5 yhoogstrate gigascience fjfd cthgj kellrott tewaris abretaud vavrusa nekrut jencabral marc09 dpryan79 gvlproject remimarenco eterisokhoyan vimalkvn pkr2103 valentinapeona singing-scientist charz jgoecks ratzeni gitter-badger juvion iracooke jessevdam cool-aleuser01 boratonaj hrf92 christian-b richhogg moskalenko ericenns nebiolabs mvdbeek madduri elixir-oslo rusalkaguy liamvdp nicksto scholtalbers jhnsnlim uh-ci hewie7 jdimeng gabrielnicolasavellaneda bodleen mariamiskander qpxu007 chambm lparsons lijingbu nathan2wong tmehoke scitao dib-lab mr-c pjbriggs remyd1 rjohns03 malloryfreeberg asgercohr forrestzhang certus-tech common-workflow-lab fescudie michal-stuglik matthewralston leondutoit galaxyguardians openlangrid specimentracking markiskander sulakhe zagorskid

galaxy's Issues

TravisCI: check everything with flake8.

Presently the py27-lint and py26-lint tox environments for TravisCI execute bash .ci/flake8_wrapper.sh, which in turn uses flake8 to check:

a subset of Python files (listed in .ci/pep8_sources.txt) for all errors/warnings (as defined in setup.cfg)
most Python files for a small subset of errors.

The goal is to incrementally extend .ci/pep8_sources.txt to include all Python files, so that all future pull requests will be automatically checked with flake8, even the one adding new Python files.

Part of this task can be done at the GCC2015 Coding Hackathon.

Update 2016 by John - adding checklist of what remains.

show information about collections

It would be nice to be able to see information about collections in the history.
At least how many elements the list has.
Further maybe a summary of the content (if possible)

Cuffmerge fails to display indexed genomes on Main

Cuffmerge ->
Use sequence data :Yes ->
Choose the source for the reference list: Locally cashed ->
Using reference genome: No options available

In certain cases, the first genome indexed for this tool with show up for a second (hg19), then "No options available" will replace it.

In other cases, "No options available" will display with a spinner at the end of the selection line.

To replicate, just click on the tool and use the path above. If you want to test it with real data, import any RNA-seq history based on hg19 from "Saved Histories" to provide the tool with input. Other histories will work, since hg19 is not the only indexed genome, but this is a simple test.

Upgrading tools within workflows

An issue that I run into quite often is the need to update workflows to use new versions of tools. Without the ability to update the version of a tool, this is very tedious as I have to add the new tool and manually "move" over all of the parameters, output renaming, annotations, etc. which is very tedious and error prone.

related to #523 #557

from @lparsons on trello

Potentially delaying 15.07 Freeze Date until after GCC?

Today was intended to be the freeze date for the 15.07 release. I'd like to open this issue to discuss freezing after GCC instead, to allow for contributions from the Hackathon and other things that may arise in the next couple weeks. My guess is also that people will have less time than ever in the next three weeks to actually test and fix this release, so delaying would benefit that effort as well.

update docker fork to latest version

Are there any plans to update the fork that is used for galaxy-in-docker to version 15.07?
We are planning to deploy Galaxy in the coming weeks and would like to start of with the latest release.

Thanks!
Matthias

Enable Read The Docs webhook

From http://read-the-docs.readthedocs.org/en/latest/webhooks.html :

"We have support for hitting a URL whenever you commit to your project and we will try and rebuild your docs. This only rebuilds them if something has changed, so it is cheap on the server side. As anyone who has worked with push knows, pushing a doc update to your repo and watching it get updated within seconds is an awesome feeling."

Instructions on how to enable the GitHub webhook are on the same page.

HTML tags shown in ToolShed banner messages

Using the main ToolShed or the TestToolShed (in Safari on a Mac), after creating a new repository, a green banner message was shown with the raw HTML bold tags visible:

Repository <b>package_biopython_1_65</b> has been created.

After uploading a file, the same problem was observed.

From @peterjc on trello

@martenson is this still an issue?

Creating a custom build using a .len file triggers an error

Description is in this biostars post. Replicated on Main by jen. To replicate as well, just use the example data. Fails for both a browsed/loaded len file or for one that is pasted in.

https://biostar.usegalaxy.org/p/12710/#12740

Selection of tool version in workflow editor

After installing multiple versions of a tool, it is not possible to select old versions when adding it as a step in the workflow editor

From @nsoranzo on trello

Collaboration Improvements

This issue encompasses quite a few subproblems to be handled:

Collaborative histories
- My users often desire that multiple people be able to truly share a history. This includes:
  - working on it simultaneously
  - all users able to edit who the history is shared with
  - generally N users being treated like owners of a history.
  - Notification that a workflow/history/etc has been shared with you specifically
More collaborative workflows
- Workflows could stand to be more collaborative. They could stand to adopt a more VCS like model with merging of "forked" workflows.
- Workflows would benefit from versioning/changelogs that a VCS/VCS like system could provide
- In some cases, workflows could stand to be less collaborative.
  - Currently, when a workflow is "shared with a user", they get a read-only version of the workflow, as opposed to a "shared workflow" that's imported as an editable copy.
  - Roughly 95% of the time with my users, they want a read-only pointer to the latest version. They don't want to change anything, they just want to use the latest, best, version of a workflow I'm developing. (The other 5% is for those workflow newbies who need a jump start on how to build their workflow, and then are interested in taking it from there)
  - I believe users would gain more utility from "importing a shared workflow" and "sharing a workflow with a user" BOTH being read-only link to a workflow, and then providing an edit action on top of that. Additionally we'd cut down on the unnecessary duplication of workflows and old data that users have.

Custom galaxy tool greyed out in workflow editor

Hello everyone. I have a situation where I have a valid tool that works fine in Galaxy otherwise. However, in the workflow creator GUI, the tool appeared greyed out. Similar tools that I have created are fine. Can someone point me in the right direction?

plug in openSNP as data source

https://opensnp.org/
~2000 individuals genotype+fenotype totally opensource and CC licensed

Failure to parse job_conf.xml if id and tags attributes are identical

The following XML code block is valid but causes a parsing error:

<destination id="trackster" runner="torque" tags="trackster">
    <param id="destination">trackster</param>
</destination>

The exception doesn't mention what is wrong and reports:

Exception: Problem parsing the XML in file /galaxy_dist/config/job_conf.xml, please correct the indicated portion of the file and restart Galaxy.'tuple' object has no attribute 'append'

The problem is fixed if id= and tags= are set to different values.

From @unode on trello.

@erasche confirmed this bug today, simply adding tags="local" to the sample_basic reproduces it.

Enact procedure to backout of quickly merged PRs

Merging PRs quickly keeps us moving forward fast - but could potentially be abused. Rather than imposing a minimum wait time @jxtx had the idea that a -1 could force a backout and review after the fact.

What is a reasonable amount of time for that - 72 hours?
Is anyone going to object to this?
Anything else to think about before composing this procedures PR.

[15.05] Regression: not possible for admins to see dataset file_name through API

Since release_15.05 it is not more possible for (only) admins to see the file_name in the dataset dict returned by the API calls /api/datasets/<dataset_id> and /api/histories/<history_id>/contents/<dataset_id>.
Setting expose_dataset_path = True in config/galaxy.ini would still allow all users to see the file paths.

Relevant code is at: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/managers/datasets.py#L176

Ping @carlfeberhard .

automatically install latest version when installing tools through the API

When using BioBlend to install a tool from the Tool Shed, the API should allow to automatically install the latest version of said tool, when the toolset_revision argument is left empty of tagged as "latest".

[Feature Request] Explicit production of list:paired datasets

Hello everyone, I have a few tools where i'd like to explicitly create a list:paired collection, rather than a paired collection several times over a list:paired input. In such cases, the implicit list:paired collection generation does not apply and I find it difficult to configure tools in this way. For example, given a script which, when given an SRA accession, retrieves sample metadata and ftp locations for fastq files it would be interesting to create a tool which downloads all fastq.gz files from these ftp locations and loads them into galaxy into a list:paired collection.

Change virtual directory from `.venv`

Would it be possible to add an option to the config file to specify the directory for the python virtual environment that galaxy searches for? I am using conda to manage virtual environments, but unlike virtualenv, conda doesn't allow the creation of virtual environments starting with '.' (dot/period).

Cheers

Need API call to delete data library folders

My tool automatically manages certain folders of a library to reflect data contents elsewhere on server (available versions of data). I'd like to see the API extended with a library_delete_folder() call; (it currently has library_delete_dataset() call, which works fine!). Nice if bioblend picks this up too!

from @ddooley on trello

(Judging by my reading of library api and folder api this looks to still be a bug)

Upgrading `sicer` tool results in an error.

Traceback:

Error - <type 'exceptions.TypeError'>: object of type 'bool' has no len()
URL: http://galaxy.bi.uni-freiburg.de/admin_toolshed/update_to_changeset_revision?tool_shed_url=https://toolshed.g2.bx.psu.edu/&name=sicer&owner=devteam&changeset_revision=82a8234e03f2&latest_c
hangeset_revision=4a14714649b4&latest_ctx_rev=1
File '/usr/local/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/error.py', line 149 in __call__
  app_iter = self.application(environ, sr_checker)
File '/usr/local/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/recursive.py', line 84 in __call__
  return self.application(environ, start_response)
File '/usr/local/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpexceptions.py', line 633 in __call__
  return self.application(environ, start_response)
File '/usr/local/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 133 in __call__
  return self.handle_request( environ, start_response )
File '/usr/local/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 191 in handle_request
  body = method( trans, **kwargs )
File '/usr/local/galaxy/galaxy-dist/lib/galaxy/web/framework/decorators.py', line 87 in decorator
  return func( self, trans, *args, **kwargs )
File '/usr/local/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/admin_toolshed.py', line 1888 in update_to_changeset_revision
  irmm.generate_metadata_for_changeset_revision()
File '/usr/local/galaxy/galaxy-dist/lib/tool_shed/metadata/metadata_generator.py', line 443 in generate_metadata_for_changeset_revision
  metadata_dict )
File '/usr/local/galaxy/galaxy-dist/lib/tool_shed/metadata/metadata_generator.py', line 668 in generate_tool_metadata
  elif len( values ) == 1:
TypeError: object of type 'bool' has no len()

Roadmap: Galaxy Interactive Environments

Issue for tracking planned IE enhancements

IEs on a remote host

IEs should be able to be launched on remote hosts #356
Should use -P flag of docker. Per @jmchilton's comments and some discussion on IRC, it was obvious that this is the solution we've been looking for; offload port detection to docker, avoiding any race conditions, and allowing docker on a remote host to function 100% properly. #790
~~It would be nice if we could somehow take advantage of the built-in job scheduling in Galaxy and the docker job scheduling available as part of that.~~ Not useful with swarm & friends, and dedicated docker preferred by admins to keep images away from jobs.

General IE improvements

IEs with multiple images

This is more of a wishlist feature...

it'd be nice if IEs could be launched from docker-compose.yml files, somehow. #852

e.g. I'm slowly working on an Apache Zeppelin IE, it'd be great if I could re-use existing Zeppelin images and link an Apache Spark image, rather than having to bake everything into one.

This might involve shipping a docker-compose.yml in the settings folder, and applying a config variable somewhere marking this as a multi-image container. I worry about networking+picking out the correct ports and running multiple containers. Maybe we'd create a temp dir, copy in docker-compose, launch. We could do docker-compose ps and grab ports there maybe. This means config would have to either specially name the proxy image/we'd provide a self-killing proxy/the config would allow specifying the name of the main/proxy image.

IE Image selection

have a single viz which allows launching N different docker images #1403

IEs as Tools

~~[ ] Allow running IEs as tools OR converting them into .py + tool.xml files #1923~~ It would be a nice project, but a lot of work. If someone is motivated they will discover the issue and produce something, no need to keep this issue open just for that.

IE UX

provide a clear 'shutdown' button for IEs (from #1586)
make the Jupyter tooltip readable (it is too long for default width of history panel) (Addressed in #1403)
provide a way to start Jupyter without interacting with a dataset (Addressed in #1403)
Provide a sharable link to a notebook that would start Jupyter with the NB loaded in.
~~[ ] Provide an API to launching GIEs (?)~~ Why? We need a better use case for this.

Tool help text is missing in workflow mode

Every tool should offer a better parameter description in it's help section. But this information is not displayed in the workflow mode. We can include a small symbol next to every parameter, which displays the help text on mouse hover.

from @bgruening on trello

lib/galaxy/datatypes/binary.py:register_sniffable_binary_format() is un-documented

Tool grouping in the tool bar fails for bwa_mem

Originally opened as galaxyproject/tools-devteam#197 by @kaktus42 .

Confirmed also on my server with 3 different entries in the tool menu (and no Versions select menu in the tool central panel) for bwa_mem tool, while everything works fine for bwa tool (both from https://toolshed.g2.bx.psu.edu/view/devteam/bwa/ TS repository).

I suspect the grouping fails because of the underscore in the tool id.

Automatically build and test new versions of wheels

We want to update to new minor versions of our dependency packages whenever they become available, and this needs to be automated or else it will never be done on a reasonable time frame.

@jxtx suggested pinning to an exact version of dependencies but automatically running Galaxy unit tests for new versions and committing the version change if tests pass. Ideally we'd just do this with Travis.

xref #428

Rename PostJobAction not working on 'resume job dependencies'

If a workflow has one or more 'rename postjobactions' and the run fails, rerunning the failed tool with 'resume job dependencies' will not run the rename actions (or perhaps with wrong values). Unfortunately, for some of our workflows the renaming action is essential.

from @scholtalbers on trello

upload modal and switching history

@jennaj just showed me that while uploading 12 files through upload dialogue on the cloud she encountered a bug. When you switch history let's say after 5 files are added to the history the remaining 7 files will be added to the new history instead.

This is probably a bug. @guerler

reports_wsgi.ini should be renamed

reports_wsgi.ini should be renamed to reports.ini to be consistent with galaxy.ini

from @bgruening on trello

Maintenance scripts - ability to remove "deleted permanently" histories

We have a few users that make intensive use of galaxy via the API. They also usually create a new history for each "process". "Process" being roughly equivalent to launching a workflow, saving results and deleting the history.

The side-effect of this approach is that over a period of a year they have managed to create over 10000 histories under their user.

While the maintenance scripts include steps to purge deleted histories, users can still see them up under "advanced search". For the users mentioned above, this means over 200 pages of histories.

So ultimately we would like to have a way to delete (completely) histories that are in "deleted permanently" state and are older than a specified age.

Reported by @unode on trello

Confirmed by @erasche as of this morning. Separating "deleted" and "deleted permanently" in the UI might also solve this issue. I can go to "deleted" histories, select a page worth, click "delete permanently", and they continue to show up as "deleted" on the first page.

Data Library parameter allows to select deleted dataset versions

The data library input parameter fails to properly display multiple versions of the same underlying library dataset. The library API call should either not return deleted datasets or deleted datasets need to be filtered from its response. See also: #132 (comment)

Tool XML for minimum Galaxy/flag for internet

Minimum galaxy version.
- In tool XML file or at toolshed repository level?
Flag denoting internet access is required trello, simplifying jobs_conf.xml configuration.

Improve logging during package installation

It would be absolutely lovely if package installation logs could be prefixed.

E.g.

Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_gccalc.pl
Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_search2gff.pl
Writing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/lib/perl5/x86_64-linux/auto/Bio/.packlist
bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_download_query_genbank.pl

Would be:

[DEBUG:galaxy.package.install:package:package_bundle_jbrowse_1_0] Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_gccalc.pl
[DEBUG:galaxy.package.install:package:package_bundle_jbrowse_1_0] Installing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_search2gff.pl
[DEBUG:galaxy.package.install:package:package_bundle_jbrowse_1_0] Writing /home/hxr/work/galaxy/tool_dependencies/package_perl_bundle_jbrowse/1.0/iuc/package_perl_bundle_jbrowse_1_0/74e7c7dc6c01/lib/perl5/x86_64-linux/auto/Bio/.packlist
[DEBUG:galaxy.package.install:package:package_bundle_jbrowse_1_0] bundle_jbrowse_1_0/74e7c7dc6c01/bin/bp_download_query_genbank.pl

<filter> tag does not work on output collections

Hello,
I have an instance where I am trying to allow users to run a subsampling program on files with support for paired-end subsampling. I use a conditional to specify the list:paired input and the user should get a matching list:paired output dataset with list:paired input. However, the filter xml tag does not work on dataset collections or their nested "data" or "collection" children. Some example code which returns a very cryptic error message with ouput, implicit collections, and a few things. Any help is appreciated

 <inputs>
   <conditional name="single_vs_paired">
     <param name="single_vs_paired_selector" type="select" label="What type of Fastq files do you have?">
<option value="single_files">Single fastq files.</option>
    <option value="list_of_files">A list of fastq files</option>
    <option value="list_of_paired_files">A list of paired-end fastq files</option>
      </param>
      <when value="single_files">
    <param name="single_fastq_files" type="data" format="fastq,fastqsanger" multiple="true" label="Single fastq files." />
      </when>
      <when value="list_of_files">
    <param name="list_fastq_files" type="data_collection" collection_type="list" format="fastq,fastqsanger" label="List of fastq files" help="Read documentation on creating a list of files." />
      </when>
      <when value="list_of_paired_files">
    <param name="paired_fastq_files" type="data_collection" collection_type="paired" format="fastq,fastqsanger" label="Paired list of fastq files" help="Read documentation on creating a list of paired files." />
      </when>
    </conditional>    
    <param name="readsRequested" type="integer" min="1" value="100000" label="Total number of reads you want from the file"/>
  </inputs>

  <outputs>
    <data format="fastq" name="single_output" label="Subsampled single file ${on_string}" >
      <filter>single_vs_paired["single_vs_paired_selector"] == "single_files"</filter>
    </data>

    <collection name="list_output" type="list" structured_like="list_fastq_files" inherits_format="true" label="Subsampled ${on_string}">
    <filter>single_vs_paired["single_vs_paired_selector"] == "list_of_files"</filter>
    </collection>
    <collection name="paired_list_output" type="list:paired" structured_like="paired_fastq_files" inherits_format="true" label="Subsampled ${on_string}">
      <collection name="pairs" type="paired">
    <data format="fastq">
      <filter>single_vs_paired["single_vs_paired_selector"] == "list_of_paired_files"</filter>
    </data>
    <data format="fastq">
      <filter>single_vs_paired["single_vs_paired_selector"] == "list_of_paired_files"</filter>
    </data>
      </collection>
    </collection>
  </outputs>

creating new datasets in history inserts them in the middle of it

As you can see on the following pictures the latest dataset is numbered 310 but if I run fastqc it will create datasets numbered 16 and 17 and after refresh it will put them in the middle of the history.

I also think the history have been imported from @nekrut .

History link: https://usegalaxy.org/u/martenson/h/f4

ping @carlfeberhard

Installing tools+versions needed for a workflow

When I download the workflow I want to use, I am having hard time figuring out which tools and their version are the exact ones that the workflow needs. Alternatively, each tool and version could have searchable hash codes. I would then obtain all hashes my workflow needs and installed them one by one from toolshed.

from @biomonika on trello

@peterjc notes:

When an Admin installs a workflow this seems a natural thing to offer, reusing the existing dependency interface.

When a non-Admin imports a workflow, at least providing a summary of missing workflows would be good (that could then be passed on to an Admin).

@erasche note: this had 5 +1s on trello.

API key used isn't logged anywhere

As far as I can tell...the API key used isn't logged anywhere (apache, uWSGI, paste, handler logs).

If old API keys are kept around, ostensibly for auditing purposes, the API key used for a given request should probably be logged?

(here I understand "auditing purposes" to mean, for example, "someone uploaded bad things to our galaxy")

submitted by @erasche on trello

show a more user-friendly error when the webserver port is blocked

Currently the user sees the following:

galaxy.queue_worker INFO 2015-07-06 14:26:12,785 Binding and starting galaxy control worker for main
Starting server in PID 22842.
Traceback (most recent call last):
  File "./scripts/paster.py", line 37, in <module>
    serve.run()
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/serve.py", line 1049, in run
    invoke(command, command_name, options, args[1:])
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/serve.py", line 1055, in invoke
    exit_code = runner.run(args)
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/serve.py", line 220, in run
    result = self.command()
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/serve.py", line 670, in command
    serve()
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/serve.py", line 654, in serve
    server(app)
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 292, in server_wrapper
    **context.local_conf)
  File "/home/xsebi/programs/galaxy/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 97, in fix_call
    val = callable(*args, **kw)
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1342, in server_runner
    serve(wsgi_app, **kwargs)
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1291, in serve
    request_queue_size=request_queue_size)
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1134, in __init__
    request_queue_size=request_queue_size)
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1113, in __init__
    request_queue_size=request_queue_size)
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 328, in __init__
    HTTPServer.__init__(self, server_address, RequestHandlerClass)
  File "/usr/lib64/python2.7/SocketServer.py", line 423, in __init__
    self.server_close()
  File "/home/xsebi/programs/galaxy/galaxy/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1105, in server_close
    self.thread_pool.shutdown(60)
AttributeError: 'WSGIThreadPoolServer' object has no attribute 'thread_pool'

How to reproduce:

start any other server on port 8080 before starting galaxy

TS search at /api/tools is encoding tool id

given it is a string it should not encode it

"Join two Datasets" (join1) does not preserve column headers

I am referring to the tool https://github.com/galaxyproject/galaxy/blob/dev/tools/filters/joiner.xml implemented as the Python script https://github.com/galaxyproject/galaxy/blob/dev/tools/filters/join.py currently still part of the main Galaxy distribution.

Based on the example in the help text, consider the following as tabular files with a simple header:

#chr    start   end gene
chr1    10  20  geneA
chr1    50  80  geneB
chr5    10  40  geneL

and:

#gene   description
geneA   tumor-supressor
geneB   Foxp2
geneC   Gnas1
geneE   INK4a

Current output with default options,

Keep lines of first input that do not join with second input: No
Keep lines of first input that are incomplete: No
Fill empty columns: No

chr1    10  20  geneA   geneA   tumor-supressor
chr1    50  80  geneB   geneB   Foxp2

My desired output would preserve the column headers:

#chr    start   end     gene    gene    description
chr1    10  20  geneA   geneA   tumor-supressor
chr1    50  80  geneB   geneB   Foxp2

Likewise changing the options to:

Keep lines of first input that do not join with second input: Yes
Keep lines of first input that are incomplete: Yes
Fill empty columns: Yes
Only fill unjoined rows: Yes
Fill Columns by: Single fill value
Fill value: .

gives:

#chr    start   end gene    .   .
chr1    10  20  geneA   geneA   tumor-supressor
chr1    50  80  geneB   geneB   Foxp2
chr5    10  40  geneL   .   .

Here I would like the column headers from the second file to be included:

#chr    start   end gene    gene    description
chr1    10  20  geneA   geneA   tumor-supressor
chr1    50  80  geneB   geneB   Foxp2
chr5    10  40  geneL   .   .

Note separately that if the first dataset is considered as a bed file, then the output file is also considered bed, and this appears to wrongly mark column c6 (description) as strand.

from @peterjc on trello

Support for OAuth2/broken openID support for Google

OpenID 2.0 for Google Accounts has gone away https://support.google.com/accounts/answer/6206245?p=openid&rd=1

Now that myopenid has been away for a while and Google is shutting down openid2.0 (and already disabled it for new domains: https://developers.google.com/accounts/docs/OpenID#shutdown-timetable ) - I would really like to see "OpenID Connect" support in Galaxy.

from @scholtalbers on trello and trello

Improve quota display by distinguishing active and deleted but not purged space

More often than not users delete datasets or histories without purging resulting in unexpected quota being used.
In the case of full histories it's somewhat hidden how you can list which deleted but not purged entries you may have.

In addition, the set_user_disk_usage.py script only displays:

user <email> old usage: 1.5GB change: none

So it's also not easy for an administrator to see where the quota is being used.

I therefore suggest 2 enhancements, one on the set_user_disk_usage.py script that instead shows:

user <email> old usage: 1.5GB (active: 500MB, deleted: 1GB) change: none

Additionally the galaxy user interface could display a tooltip (clicking or hovering the quota widget) which would inform the user that a given % of their quota is being taken by unpurged datasets or histories.

from @unode on trello

Better tool_config_file default in galaxy.ini

Currently the tool_config_file section of galaxy.ini reads like this:

# Tool config files, defines what tools are available in Galaxy.
# Tools can be locally developed or installed from Galaxy tool sheds.
# (config/tool_conf.xml.sample will be used if left unset and
# config/tool_conf.xml does not exist).
#tool_config_file = config/tool_conf.xml,shed_tool_conf.xml

If you activate that setting by just removing the # from the last line, then, for an otherwise unmodified clone of Galaxy, you're getting an error

because the last entry should be config/shed_tool_conf.xml (include the path)
because Galaxy then stops using config/tool_conf.xml.sample as a fallback if config/tool_conf.xml does not exist
even config/shed_tool_conf.xml does not initially exist

So users have to change three things (add the missing path, and manually copy and rename both tool_conf.xml.sample and shed_tool_conf.xml.sample) to be able to use this line.

Either:

the .sample files should always be used as fallback, not just with the line commented out
the line should list the .sample files
Galaxy should auto-generate tool_conf.xml and shed_tool_conf.xml from the .sample files on first run (didn't it do that in earlier releases ?)

from trello

IEs don't work with upstream proxying rules

The default proxy rules suggest the following:

RewriteRule   ^/galaxy/static/style/(.*)     /home/galaxy/galaxy/static/june_2007_style/blue/$1   [L]
RewriteRule   ^/galaxy/plugins/(.*)          /home/galaxy/galaxy/config/plugins/$1 [L]
RewriteRule   ^/galaxy/static/scripts/(.*)   /home/galaxy/galaxy/static/scripts/packed/$1         [L]
RewriteRule   ^/galaxy/static/(.*)           /home/galaxy/galaxy/static/$1                        [L]
RewriteRule   ^/galaxy/favicon.ico           /home/galaxy/galaxy/static/favicon.ico               [L]
RewriteRule   ^/galaxy/robots.txt            /home/galaxy/galaxy/static/robots.txt                [L]

Specifically we take note of

RewriteRule   ^/galaxy/plugins/(.*)          /home/galaxy/galaxy/config/plugins/$1 [L]

which intends to match plugin JS files. Unfortunately this fails for interactive environments which still appear under a URL with visualization in it. I.e. https://FQDN/galaxy/plugins/visualizations/ipython/static/js/ipython.js should be https://FQDN/galaxy/plugins/interactive_environments/ipython/static/js/ipython.js

This doesn't appear without the upstream proxy, and short of symlinking IE files into the visualizations directory, there's no obvious solution.
CC @carlfeberhard

Fix incorrect fix in #336

We are to stringing a raw numeric ID instead of encoding it first.

See conversation on PR #336.

Cryptic error/exception while generating a list output from single data input

Custom galaxy tools that create a list collection (most often without a data collection input) produce the following json, which I assume comes from some exception or error, but no clue as to how to correct the issue or why a single datum input ->list mapping is undesirable or forbidden.

{
    "implicit_collections": [],
    "jobs": [],
    "output_collections": [],
    "outputs": []
 }

From the following xml inputs and outputs

  <inputs>
    <param name="sra_data" type="data" label="Provide an SRA txt file "/>
  </inputs>
  <outputs>
    <data format="txt" name="log" label="log"/>
    <collection name="sraFiles" type="list" label="sra files">
      <data name="sraFile" />
    </collection>
  </outputs>

Numpad doesn't work for integer inputs

Just had another user who could reproduce this (on a different OS+browser), so I'm reporting it here.

Numerical inputs (just an type="integer") won't accept entry from numpads (num lock is set correctly), only from the normal numeric keys above the keyboard on a standard QWERTY keyboard.

cc @guerler

from @erasche and @unode on trello.

Confirmed this is still a bug on :dev for numerical inputs. Text inputs work just fine.

Implement message categorization in logging and error reporting system

As discussed with @dannon and @erasche on IRC, it would be good to be able to tell apart logged messages based on who should be receiving them.
This is particularly useful when combined with Sentry which at the moment tends to exacerbate the criticality of a lot of the logged entries.

At the moment there are at least 3 kinds of "audience":

Developers
- Core
- Tools
Administrators
Users
- API (bioblend, etc.)
- Web (Galaxy UI)

Administrators may not want to worry about errors they cannot fix themselves or that are of the responsibility of developers/users.
On the other end, API users need to be given information about what went wrong but that exact same information is insufficient (lacks context) for a developer or administrator to decide if it's relevant or not (for them).

Additionally, and this concerns mostly Sentry, in some cases the error message itself contains unique information (job_id, name of /tmp/ folder, PID, etc..) that prevents aggregation.
Every single message is logged as "new", resulting in excessive noise.

Extending the current model by providing extra information as metadata and including who should receive the message, would allow better filtering.

Some examples (note these are anecdotal examples, final implementation should hopefully be less verbose):

log.warning("Interface of module X is deprecated, use Y instead", developer=True)
log.critical("Failed to connect database", administrator=True, extra=db_connection_backtrace)
log.warning("Size of upload is too large", user=True, extra=size_of_upload)  # user=True means Web and API

and a few complex cases:

log.error("Failed to run tool X", admin_msg="Cannot write to folder", admin=True, user=True, extra={"exception": tool_backtrace, "folder": output_folder})
log.exception(code_exception, developer=True, admin_msg="Internal exception while processing request of type X", admin=True, extra={"request": request})

The use of the extra attribute is to allow customizing the message and strip sensitive information as well as ensure repeated errors can be easily aggregated.

from @unode on trello

Repository installation throws exception and is left in inconsistent state when an unrelated repository is installed before it

I've been working to duplicate all of our installed repositories from production using scripted API calls for reproducibility. I've run into a problem where an installed repository appears to conflict somehow with a later installation of a completely different repository, throwing an unhandled exception and leaving the installing repository in an inconsistent state.

(Note: this is the only failing case I've reduced to the bare minimum from the ~110 tools I was trying to install. Before trying to reduce to a test case I provoked similar errors from at least three other repositories, so the problem doesn't seem to be limited to this combination.)

Against a brand new 15.05 instance, I ran:

$ python ./scripts/api/install_tool_shed_repositories.py --local $GALAXY_INSTALL_URL --api $GALAXY_INSTALL_KEY --tool-deps --repository-deps --url https://toolshed.g2.bx.psu.edu --owner iuc --name samtools_sort --revision 38ea74bd4054

followed by:

$ python ./scripts/api/install_tool_shed_repositories.py --local $GALAXY_INSTALL_URL --api $GALAXY_INSTALL_KEY --tool-deps --repository-deps --url https://toolshed.g2.bx.psu.edu --owner lparsons --name htseq_count --revision 6f920f33c5eb

The first repository, samtools_sort, installed fine. However, installation of htseq_count next failed and logged the following:

tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager DEBUG 2015-06-20 02:31:33,466 Creating repository dependency objects...
tool_shed.util.shed_util_common DEBUG 2015-06-20 02:31:34,552 Adding new row for repository 'htseq_count' in the tool_shed_repository table, status set to 'New'.
tool_shed.util.shed_util_common DEBUG 2015-06-20 02:31:34,880 Adding new row for repository 'package_numpy_1_7' in the tool_shed_repository table, status set to 'New'.
tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager DEBUG 2015-06-20 02:31:34,892 Skipping installation of revision 95d2c4aefb5f of repository 'package_samtools_0_1_19' because it was installed with the (possibly updated) revision 95d2c4aefb5f and its current installation status is 'Installed'.
tool_shed.util.shed_util_common DEBUG 2015-06-20 02:31:35,242 Adding new row for repository 'package_pysam_0_7_7' in the tool_shed_repository table, status set to 'New'.
tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager DEBUG 2015-06-20 02:31:35,255 Building repository dependency relationships...
tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager DEBUG 2015-06-20 02:31:35,266 Creating new repository_dependency record for installed revision 0c288abd2a1e of repository: package_numpy_1_7 owned by devteam.
tool_shed.galaxy_install.repository_dependencies.repository_dependency_manager DEBUG 2015-06-20 02:31:35,332 Creating new repository_dependency record for installed revision b62538c8c664 of repository: package_pysam_0_7_7 owned by iuc.
galaxy.web.framework.decorators ERROR 2015-06-20 02:31:35,387 Uncaught exception in exposed API method:
Traceback (most recent call last):
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/galaxy/web/framework/decorators.py", line 251, in decorator
    rval = func( self, trans, *args, **kwargs)
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/galaxy/webapps/galaxy/api/tool_shed_repositories.py", line 246, in install_repository_revision
    payload )
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/tool_shed/galaxy_install/install_manager.py", line 709, in install
    install_options
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/tool_shed/galaxy_install/install_manager.py", line 801, in __initiate_and_install_repositories
    return self.install_repositories(tsr_ids, decoded_kwd, reinstalling=False)
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/tool_shed/galaxy_install/install_manager.py", line 844, in install_repositories
    reinstalling=reinstalling )
  File "/mnt/scdata/scdata_03/galaxy/containers/galaxy-builder/stable/lib/tool_shed/galaxy_install/install_manager.py", line 864, in install_tool_shed_repository
    repo_info_tuple = repo_info_dict[ tool_shed_repository.name ]
KeyError: u'package_pysam_0_7_7'

This left the htseq_count and package_numpy_1_7 repositories in the "New" state and the package_pysam_0_7_7 repository in the "Cloning" state, even after a restart of the server.

However, without installing samtools_sort first, htseq_count and its dependencies install fine.

from @bcclaywell on trello

(Note from @erasche, untested, moving due to recency + test cases provided)

Workflow tool upgrade path is messy

From an IRC conversation:

jelle_ | how does Galaxy handle tool updates that would normally break workflows? They just break and one has to figure out how to reconnect newer tools? I guess you could at least get a workaround where you delete older workflows that people imported - and even import the newer one in their workflow overview but my problem is more that I do not know which tools people depend on, that's why I keep them installed but hidden. Because as soon as I delete it, they might have broken workflows and for some workflows it is hard to figure out what the appropriate parameters were

Partially I believe this would be solved by read-only workflows that users can run without importing.

from @erasche and @scholtalbers on trello