Code Monkey home page Code Monkey logo

igh-evaluation's Introduction

igh-evaluation

Datasets and results to evaluate methods for analysing immunoglobulin sequences

Data

  • aborigin_simulations: A random selection of 1000 simulated sequences used to evaluate Ab-origin, downloaded from here.

  • igblast_vmatch: These data were used by Ye et al. (2013) to evaluate IgBLAST, and comprise of 100 IGH sequences with no mutations in IGHV, hence these sequences should contain few or no mutations in the D and J regions. These data were downloaded as a PDF from the supplementary information here and copied to a text file.

  • igscueal_simulations: These are simulated human IGH sequences described by Frost et al. (2015).

  • N152: This is a dataset of 11 clonally related IGH sequences taken from an HIV-1 infected donor N152, from whom the broadly neutralizing antibody 10E8 was isolated (Zhu et al (2013)). The sequences were downloaded from GenBank.

  • ohmlaursen: This is a dataset of 6329 clonally unrelated IGH sequences obtained from individuals homozygous for IGHV3-23*01 and IGHJ6*02, and amplified using primers that were intended to be specific for IGHV3-23. The original study also amplified a number of IGHV3-h pseudogenes, which have been excluded from the file. This dataset was used to evaluate Ab-Origin. The full dataset (including pseudogenes) was obtained from GenBank.

  • PNG: 1108 human IGH sequences from individuals in Papua New Guinea, known to contain clonally related sequences; a CSV file containing germline assignments was downloaded from here. Sequences were downloaded from GenBank directly, and sequences and annotations extracted using an IJulia notebook. Unfortunately, the (manual) assignments of sequences to clones is not available.

  • PW57: A dataset of 57 clonally related IGH sequences from IgD+ IgM-CD38+ B cells described by Wilson et al. (2000), corresponding to GenBank accessions AF262145โ€“AF262201, extracted from PopSet 8810007.

  • PW99: A dataset of 106 clonally related IGH sequences from IGD+ IGM- CD38+ B cells described by Zheng et al. (2005), corresponding to GenBank accessions EF544883-EF544988. This dataset is often referred to as PW99; however, I was unable to determine which sequences were excluded from this larger set.

  • S22: The Stanford S22 dataset is a set of 13,153 human IGH sequences derived from an individual who was fully genotyped. The performance of a utility is determined by the proportion of sequences that are assigned to a germline gene that is absent from the individual. These files were downloaded from the iHMMune website.

igh-evaluation's People

Contributors

sdwfrost avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.