Code Monkey home page Code Monkey logo

biobitbot's Introduction

BioBitBot is a tool to build interactive charts from your data. It is intended to allow users to easily access bioinformatics data analysis.

BioBitBot is based on MultiQC by Phil Ewels. MultiQC is intended to provide quality control reports from a number of common bioinformatics tools aggregated into a single quality control report for a set of samples. BioBitBot extends this concept to data analysis. BioBitBot combines a number of common data analysis 'modules' into single reports. The type of modules which are included in a report is based off of the type of experiment performed.

As an example the 'microbiome' report type includes:

  • Principal Component Analysis of the samples
  • Phylogeny trees
  • MA and Volcano charts
  • and many more...

As an example of how data analytic modules work consider that the 'microarray' report includes the MA chart module and the PCA module but does not include the Phylogeny Tree module.

--

BioBitBot is intended to work in concert with a data analytic pipeline. The data analytics pipeline does all computationally intensive work. Ideally the analytics pipeline would output a series of chart ready data tables; in practice BioBitBot does a fair amount of work to collate and lightly interpret the output of a pipeline. As a rough guide it should be possible to generate an BioBitBot report in less than a minute on a desktop machine; any longer and one should consider offloading some computation to a pipeline.

Every BioBitBot report type should include a specification stating the file types it requires to build areport. Data-analytics module should include a specification (at least in the source code) stating the data type they expect. BioBitBot is a research tool that is intended to quickly adapt to changing needs.

It is perfectly acceptable to build a module type which is only intended to run in a single report type. Data analytics modules are NOT intended to be perfectly modular. While a smaller, less redundant codebase is easier to maintain a codebase which allows some redundancy (or 'reinventing the wheel') is often easier to extend and easier for novice programmers to understand.

Proficient programmers should bear in mind that BioBitBot is intended to support scientific research. Many bioinformaticians are relatively inexperienced programmers who need their code to 'Just Work'. These contributions should be guided and checked but they should not be discouraged because they aren't written to a high standard. Less experienced programmers should work to make sure their contributions are well documented above all.

BioBitBot is actively supported and devloped. You can contact David Danko at [email protected] for help but the best way to get in touch is with an issue on github. In BioBitBot there are no stupid questions.

--

Terminology:

  • analysis, a collection of modules for a certain type of pipeline ouput. E.G. uarray, ubiome.
  • module, a cohesive piece of data analysis. E.G. significance plots, PCA
  • report, the single html file which is the tangible result of an analysis
  • pipeline, a seperate piece of software that produces data files which BioBitBot can interpret.

In BioBitBot analyses and modules share the role of modules in MultiQC

Deprecated Terminology: BioBitBot is based on MultiQC and is very actively developed. A number of terms may show up in the codebase which are no longer relevant to the function of BioBitBot.

  • sname, or sample name

--

BioBitBot was originally developed (from MultiQC!) at the Kennedy Institute of Rheumatology at Oxford. The work was supported by Dr. Nicholas Ilott and Prof. Fiona Powrie. At present David Danko has done the majority of work to modify MultiQC into BioBitBot.

biobitbot's People

Contributors

avilella avatar chapmanb avatar dakl avatar dcdanko avatar ewels avatar guillermo-carrasco avatar lpantano avatar moonso avatar robinandeer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

biobitbot's Issues

Separate package instead of fork?

Hello!

I have a suggestion about how to mutually develop code. Could it make sense to develop this code in a separate package instead of a fork of the main tool? I can't think of any functionality that you'd miss out on that way, and it would mean that your code wouldn't diverge from the core MultiQC package..

The documentation on how to do this is a bit lacking, as I haven't done loads of it myself yet, but it's essentially the same concept as in MultiQC_NGI - using setuptools to add new functions, modules, templates and command line options.

The entry points are pretty powerful in this sense. For instance, you could also create your own namespace alongside for the common functions so that other packages can use yours as a dependency. eg:

entry_points = {
    # To tie into core MultiQC execution
    'multiqc.modules.v1': [
        'yourmodule = your_extension.yourmod:MultiqcModule'
    ],
    # Your common functions for others to import into their code
    'yourmodule.functions.v1': [
        'scatter_plot = your_extension.yourfuncs:ScatterPlot'
    ]
},

then someone else's code:

import pkg_resources

pkg_resources.load_entry_point('yourmodule', 'functions.v1', 'scatter_plot')

(I think - this is slightly different from what I do, but hopefully you get the idea)

Your package can essentially be as stand-alone as you like - you can import whatever core functions you want through import multiqc statements and you can export any functions you like using your own entry_points. As long as you add a multiqc entry_point so that your code is triggered when MultiQC runs, it should all tie together.

Does that make sense? What do you think?

Phil

Refactor modules into collections of modules

Both the microarray and metagenomic module are collections of plots. It may make sense to refactor the current modules into sets of modules that are run for the specific analysis.

This is similar to how the original MultiQC works but will have to handle files and data sharing within modules differently.

Side by side plots

Some plots would look better if they were side by side.

Particularly volcano and MA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.