The biobitbot from dcdanko

BioBitBot is a tool to build interactive charts from your data. It is intended to allow users to easily access bioinformatics data analysis.

BioBitBot is based on MultiQC by Phil Ewels. MultiQC is intended to provide quality control reports from a number of common bioinformatics tools aggregated into a single quality control report for a set of samples. BioBitBot extends this concept to data analysis. BioBitBot combines a number of common data analysis 'modules' into single reports. The type of modules which are included in a report is based off of the type of experiment performed.

As an example the 'microbiome' report type includes:

Principal Component Analysis of the samples
Phylogeny trees
MA and Volcano charts
and many more...

As an example of how data analytic modules work consider that the 'microarray' report includes the MA chart module and the PCA module but does not include the Phylogeny Tree module.

BioBitBot is intended to work in concert with a data analytic pipeline. The data analytics pipeline does all computationally intensive work. Ideally the analytics pipeline would output a series of chart ready data tables; in practice BioBitBot does a fair amount of work to collate and lightly interpret the output of a pipeline. As a rough guide it should be possible to generate an BioBitBot report in less than a minute on a desktop machine; any longer and one should consider offloading some computation to a pipeline.

Every BioBitBot report type should include a specification stating the file types it requires to build areport. Data-analytics module should include a specification (at least in the source code) stating the data type they expect. BioBitBot is a research tool that is intended to quickly adapt to changing needs.

It is perfectly acceptable to build a module type which is only intended to run in a single report type. Data analytics modules are NOT intended to be perfectly modular. While a smaller, less redundant codebase is easier to maintain a codebase which allows some redundancy (or 'reinventing the wheel') is often easier to extend and easier for novice programmers to understand.

Proficient programmers should bear in mind that BioBitBot is intended to support scientific research. Many bioinformaticians are relatively inexperienced programmers who need their code to 'Just Work'. These contributions should be guided and checked but they should not be discouraged because they aren't written to a high standard. Less experienced programmers should work to make sure their contributions are well documented above all.

BioBitBot is actively supported and devloped. You can contact David Danko at [email protected] for help but the best way to get in touch is with an issue on github. In BioBitBot there are no stupid questions.

Terminology:

analysis, a collection of modules for a certain type of pipeline ouput. E.G. uarray, ubiome.
module, a cohesive piece of data analysis. E.G. significance plots, PCA
report, the single html file which is the tangible result of an analysis
pipeline, a seperate piece of software that produces data files which BioBitBot can interpret.

In BioBitBot analyses and modules share the role of modules in MultiQC

Deprecated Terminology: BioBitBot is based on MultiQC and is very actively developed. A number of terms may show up in the codebase which are no longer relevant to the function of BioBitBot.

sname, or sample name

BioBitBot was originally developed (from MultiQC!) at the Kennedy Institute of Rheumatology at Oxford. The work was supported by Dr. Nicholas Ilott and Prof. Fiona Powrie. At present David Danko has done the majority of work to modify MultiQC into BioBitBot.

Separate package instead of fork?

Hello!

I have a suggestion about how to mutually develop code. Could it make sense to develop this code in a separate package instead of a fork of the main tool? I can't think of any functionality that you'd miss out on that way, and it would mean that your code wouldn't diverge from the core MultiQC package..

The documentation on how to do this is a bit lacking, as I haven't done loads of it myself yet, but it's essentially the same concept as in MultiQC_NGI - using setuptools to add new functions, modules, templates and command line options.

The entry points are pretty powerful in this sense. For instance, you could also create your own namespace alongside for the common functions so that other packages can use yours as a dependency. eg:

entry_points = {
    # To tie into core MultiQC execution
    'multiqc.modules.v1': [
        'yourmodule = your_extension.yourmod:MultiqcModule'
    ],
    # Your common functions for others to import into their code
    'yourmodule.functions.v1': [
        'scatter_plot = your_extension.yourfuncs:ScatterPlot'
    ]
},

then someone else's code:

import pkg_resources

pkg_resources.load_entry_point('yourmodule', 'functions.v1', 'scatter_plot')

(I think - this is slightly different from what I do, but hopefully you get the idea)

Your package can essentially be as stand-alone as you like - you can import whatever core functions you want through import multiqc statements and you can export any functions you like using your own entry_points. As long as you add a multiqc entry_point so that your code is triggered when MultiQC runs, it should all tie together.

Does that make sense? What do you think?

Phil

dcdanko / biobitbot Goto Github PK

biobitbot's Introduction

biobitbot's People

Contributors

Stargazers

Watchers

biobitbot's Issues

Recommend Projects

Recommend Topics

Recommend Org