gelog / adam-ibs Goto Github PK
View Code? Open in Web Editor NEWPorts the IBS/MDS/IBD functionality of Plink to Spark / ADAM
License: Apache License 2.0
Ports the IBS/MDS/IBD functionality of Plink to Spark / ADAM
License: Apache License 2.0
EZ IBD sharing expected value, based on just .fam/.ped relationship
Implement algorithm for RT Relationship type inferred from .fam/.ped file
EZ est un champ du fichier produit par la commande --genome
Similar to plink --genome
option. See the wiki on IBS-MDS Process and the diagram for the Genome file.
The input files are those created in #2.
The fields required for --cluster and --mds-plot are:
There is more fields, but they are will be done in Part II.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
This feature allow persistence of the PED/MAP fields imported in issue #2.
The Spark models should integrates with the ADAM format. The ADAM models for Variant and Genotypes should be a good start, but other record types may need to be added to the ADAM format.
The ADAM format can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
j'aurais besoin d'aide pour gerer les dependanses avec maven pour inclure au projet cette librairie
This feature adds the --ibm
constraint(s) on the --cluster
option described in issue #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#options
The input file is the model created in #3.
Add a comment to this issue with:
--ibm
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
This feature adds the --assoc, --model, --fisher, --linear, --logistic, --ci, --counts, --fisher, --cell, --within, --mh, --mh2, --bd, --homog, --gene-drop, --T2, --qt-means, --gxe, --covar, --reference-allele, --beta, --standard-beta, --genotypic, --hethom, --dominant, --recessive
option(s) based on the input file described in #2.
This feature also generates the following file formats: ASSOC, FISHER, MODEL, BEST.PERM, BEST.MPERM, GEN.PERM, TREND.PERM, DOM.PERM, REC.PERM, CMH, CMH2, HOMOG, T2.PERM, T2.MPERM, QASSOC, ASSOC.PERM, ASSOC.MPERM, QASSOC.MEANS, QASSOC.GXE, ASSOC.LINEAR, ASSOC.LOGISTIC
. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml
The scope of this feature may be too large to implement completely during MGL804. However, many of the file formats seems to share a similar structure... Anyway, the scope needs to be refined with Beatriz.
From a development point of view, it is interesting to note that feature is very distinct from the rest of the other features, so it would be easy to implement in parallel to the other features.
Note: I stopped listing the options and file formats at the section Covariates and interactions.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should integrates with the models implemented in scala by this project and use:
On the same way as plink does, we need to log the program execution.
I have created a WriteLog scala class which I call any time I would like to write a log. For now, it does only "println". A logging to file functionality needs to be implemented.
Similar to plink --make-bed
option. See the wiki on IBS-MDS Process.
The input files are PED
and MAP
. However, a relational model similar to the FAM - BED - BIM
class diagram is better, and should be used internally.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. The relevant records from the ADAM model are Variant and Genotypes, but some fields are missing and will need to be added.
Compléter Wiki avec le format de données MDS analysis formats (partie 5).
Rq: merci de mettre le lien vers la documentation officielle.
Compléter Wiki avec le format de données IBS clustering formats (partie 4).
Rq: merci de mettre le lien vers la documentation officielle.
This feature allow persistence of the models created in issue #21 (NEIGHBOUR).
The Spark models should integrates with the ADAM format. New record types may need to be added to the ADAM format.
The ADAM formats can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
All the dependencies used in this project (e.g. ADAM's formats, CLI parser, etc.) should be loaded externally via a build tool / dependency manager like Maven or SBT.
Update the README file at the root of the repo to describe how to build the project.
Calculate IBD (--genome)
This feature allow persistence of the model created in issue #15.
The Spark models should integrates with the ADAM format. New record types may need to be added to the ADAM format.
The ADAM formats can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
We have scala objects containing information. In order to make the same functionality as in Plink (writing to text file), those objects needs to be saved to file.
Task :
Example of use :
PS : All examples are related to our code !
This feature adds the --cc
constraint(s) on the --cluster
option described in issue #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#options
The input file is the model created in #3.
Add a comment to this issue with:
--cc
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
All these plink features uses a different command line options, and some of these options can be combined or not (e.g. --cluster
can be called alone are with multiple options). Each of these options can have 0, 1 or more parameters (e.g. --mcc A B
).
I think the best way to think about this is with a tree-like structure. For each invocation of the program there is one main use case (e.g. clustering). Then each use case require mandatory options and offer optional options. The optional options can have a default value. Some options are shared by many use cases.
So with plink, it is not very clear which options are applicable to a use case. To improve this, we need to find a flexible, easy-to-use, and light-weight option for scala.
I'm a big fan of the git
software approach:
$ git
usage: git [--version] [--help] [-C <path>] [-c name=value]
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
<command> [<args>]
The most commonly used git commands are:
add Add file contents to the index
bisect Find by binary search the change that introduced a bug
branch List, create, or delete branches
checkout Checkout a branch or paths to the working tree
clone Clone a repository into a new directory
commit Record changes to the repository
diff Show changes between commits, commit and working tree, etc
fetch Download objects and refs from another repository
grep Print lines matching a pattern
init Create an empty Git repository or reinitialize an existing one
log Show commit logs
merge Join two or more development histories together
mv Move or rename a file, a directory, or a symlink
pull Fetch from and integrate with another repository or a local branch
push Update remote refs along with associated objects
rebase Forward-port local commits to the updated upstream head
reset Reset current HEAD to the specified state
rm Remove files from the working tree and from the index
show Show various types of objects
status Show the working tree status
tag Create, list, delete or verify a tag object signed with GPG
'git help -a' and 'git help -g' lists available subcommands and some
concept guides. See 'git help <command>' or 'git help <concept>'
to read about a specific subcommand or concept.
$ git branch --help
(shows help on the git branch command)
Add a comment to this issue with:
Add a comment to this issue describing the "tree" of available commands for the scope of the features implemented in MGL804.
This need to be implemented in all features produced during MGL804.
La nouvelle tache pour suivre l'évolution de paramétrage d'environnement de développement.
Implement algorithm for RT Relationship type inferred from .fam/.ped file
RT est un champ du fichier produit par la commande --genome
Describe the "tree" of available commands for the scope of the features implemented in MGL804.
For each of the repo below:
Comment:
Similar to plink --cluster
option. See the wiki on IBS-MDS Process and the diagram for the Genome file.
More information can be found on the --cluster
and --genome-full
options in the section on Pairwise IBD estimation of plink manual.
The input file is the model created in #3.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
This feature adds the --mds-cluster
and --within
constraint(s) on the --mds-plot
option described in issue #17. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/mds.shtml#options
Add a comment to this issue with:
--mds-cluster
option--within
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
This feature allow persistence of the model created in issue #7 - #1 via the --cluster
option.
The Spark models should integrates with the ADAM format. New record types may need to be added to the ADAM format.
The ADAM formats can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
Describe the general idea of the solution
The compilation of the adam-ibs-data/ folder works but not the adam-ibs-core:
23% [:~/workspace … /adam-ibs-core] master* ± mvn clean compile
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Adam IBS Core 0.1.0
[INFO] ------------------------------------------------------------------------
Downloading: https://repo.maven.apache.org/maven2/com/ets/mgl804/adam-ibs/0.1.0/adam-ibs-0.1.0.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.666 s
[INFO] Finished at: 2015-10-29T13:56:53-04:00
[INFO] Final Memory: 14M/309M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project adam-ibs-core: Could not resolve dependencies for project com.ets.mgl804:adam-ibs-core:jar:0.1.0: Failed to collect dependencies at com.ets.mgl804:adam-ibs-data:jar:0.1.0: Failed to read artifact descriptor for com.ets.mgl804:adam-ibs-data:jar:0.1.0: Could not find artifact com.ets.mgl804:adam-ibs:pom:0.1.0 in central (https://repo.maven.apache.org/maven2) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
L'objectif de cette tache est de proposer un mapping de format de données Plink dans le Format ADAM.
Proposition de mapping pour le extensins .ped, .map, .bim, .fam, .bed.
Ideally this tool would automatically look at the code, set a quality score, and help us improve the code quality.
It would be nice if this tool could generate a badge in our README.md file as well.
We can find a non-exhaustive list of these tools on http://shields.io/ (for exemple CodeCov, Scrutinizer, etc..)
Obviously the tool needs to understand the language of the repo (Scala, etc.)
This feature adds the --matrix
and --distance-matrix
option(s) based on the input file described in #3. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#matrix
Add a comment to this issue with:
--matrix
option--distance-matrix
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
better than ADAM-IBS
La compilation fonctionne sur une machine Linux vierge (Ubuntu 14.04). Mais l'execution ne fonctionne pas:
docker run --rm -ti ubuntu:14.04
apt-get update
export JAVA_VERSION="7u85"
export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
DEBIAN_FRONTEND=noninteractive apt-get install -y openjdk-7-jdk=$JAVA_VERSION\*
apt-get install -y git
git clone https://github.com/GELOG/adam-ibs.git
cd adam-ibs/
apt-get install -y maven
mvn package
root@7d5f110a4f03:/adam-ibs# java -jar adam-ibs-core/target/adam-ibs-core-0.1.0-jar-with-dependencies.jar --help
23:33:40.516 [main] [INFO ] [c.e.m.c.Main$] : Begin with arguments : --help
--file <name> Specify .ped + .map filename prefix (default 'plink')
--genome Calculate IBS distances between all individuals [needs
--file and --out]
--make-bed Create a new binary fileset. Specify .ped and .map files
[needs --file and --out]
--out <name> Specify the output filename
--show-parquet Show shema and data sample ostored in a parquet file [needs
--file]
-h, --help <arg> Show help message
root@7d5f110a4f03:/adam-ibs# java -jar adam-ibs-core/target/adam-ibs-core-0.1.0-jar-with-dependencies.jar --file DATA/
test --out output --make-bed
23:22:23.977 [main] [INFO ] [c.e.m.c.Main$] : Begin with arguments : --file DATA/test --out output --make-bed
23:22:25.346 [main] [ERROR] [o.a.s.SparkContext] : Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424) ~[adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.AppContext$.<init>(AppContext.scala:16) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.AppContext$.<clinit>(AppContext.scala) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.cli.PlinkMethod$.<init>(PlinkMethod.scala:19) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.cli.PlinkMethod$.<clinit>(PlinkMethod.scala) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.Main$$anonfun$main$2.apply(Main.scala:26) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.Main$$anonfun$main$2.apply(Main.scala:23) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.Main$.main(Main.scala:22) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
at com.ets.mgl804.core.Main.main(Main.scala) [adam-ibs-core-0.1.0-jar-with-dependencies.jar:na]
Exception in thread "main" java.lang.ExceptionInInitializerError
at com.ets.mgl804.core.cli.PlinkMethod$.<init>(PlinkMethod.scala:19)
at com.ets.mgl804.core.cli.PlinkMethod$.<clinit>(PlinkMethod.scala)
at com.ets.mgl804.core.Main$$anonfun$main$2.apply(Main.scala:26)
at com.ets.mgl804.core.Main$$anonfun$main$2.apply(Main.scala:23)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at com.ets.mgl804.core.Main$.main(Main.scala:22)
at com.ets.mgl804.core.Main.main(Main.scala)
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at com.ets.mgl804.core.AppContext$.<init>(AppContext.scala:16)
at com.ets.mgl804.core.AppContext$.<clinit>(AppContext.scala)
... 8 more
This feature adds the --mds-plot
option(s) based on the input file described in #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#mds
Add a comment to this issue with:
--mds-plot
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
This feature adds the --match
and match-type
constraint(s) on the --cluster
option described in issue #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#options
The input file is the model created in #3.
Add a comment to this issue with:
--match
option--match-type
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
This feature allow persistence of the model created in issue #3 and #4 via the --genome
and --genome-full
parameters.
The Spark models should integrates with the ADAM format. New record types may need to be added to the ADAM format.
The ADAM formats can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
Compléter Wiki avec le format de données IBS Clustering outliers format (partie 6).
Rq: merci de mettre le lien vers la documentation officielle.
This feature adds the --rel-check, --min, --max, --het, --indep, --indep-pairwise, --homozyg, --homozyg-snp, --homozyg-kb, --homozyg-window-het, --homozyg-window-missing, --homozyg-window-threshold, --homozyg-density, --homozyg-gap, --homozyg-group, --pool-size, --homozyg-match, --consensus-match, --homozyg-verbose, --ibs-test, --segment, --all-pairs, --segment-length, --segment-snp, --segment-group, --segment-verbose, --mperm
option(s) based on the input file described in #3.
This feature also generates the following file formats: HET, HOM, HOM.OVERLAP, SEGMENT, SEGMENT.SUMMARY, SEGMENT.SUMMARY.MPERM
.For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should integrates with the models implemented in scala by this project and use:
La tache pour tracer nos réunions. Ordres de jour, compte rendus.
This feature adds the --K
constraint(s) on the --cluster
option described in issue #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#options
The input file is the model created in #3.
Add a comment to this issue with:
--K
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
La recette doit inclure:
This feature adds the --neighbour
option(s) based on the input file described in #3. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#outlier
Add a comment to this issue with:
NN
MIN_DST
Z
PROP_DIFF
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should integrates with the models implemented in scala by this project and use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
This feature adds the --ppc
and --ppc-gap
constraint(s) on the --cluster
option described in issue #7. For more info, check: http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml#options
The input file is the model created in #3.
Add a comment to this issue with:
--ppc
option--ppc-gap
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
It should be possible to build the project with a single command, e.g. not have to change folder or other manual operations.
This feature allow persistence of the models created in issue #17 and #18 (MDS, MIBS, MDIST).
The Spark models should integrates with the ADAM format. New record types may need to be added to the ADAM format.
The ADAM formats can be added as a maven dependency. The structure of the ADAM format is defined in an Avro file. The Avro file can be compiled into Java classes.
For updates to the ADAM format, choose the most simple solution for you as long as it is easy to diff the changes to the Avro file.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and explain how the persistence model will work.
The implementation should use:
Il faudrait qu'on se mette d'accord sur des règles de codage pour le nouveau code en Scala (pour faciliter la future maintenance). Je propose les points suivants:
N'hésitez pas à rajouter d'autres points
Similar to plink --cluster
with --mc
or --mcc
options. See the wiki on IBS-MDS Process and the diagram for the Genome file.
More information can be found on the --cluster
and --genome-full
options in the section on Pairwise IBD estimation of plink manual.
The input file is the model created in #3.
This feature adds a constraint on the --cluster
option described in issue #7.
Add a comment to this issue with:
--mc
option--mcc
optionAdd a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
The implementation should use:
Similar to plink --genome-full
option. See the wiki on IBS-MDS Process and the diagram for the Genome file.
More information can be found on the --genome
and --genome-full
options in the section on Pairwise IBD estimation of plink manual.
The input files are those created in #2.
This feature completes feature #3 by adding the missing fields.
Add a comment to this issue with:
Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.
Also update the class diagram on the wiki page describing PLink formats (when incomplete) and add a class diagram describing the models implemented in Scala for this feature on the wiki page on the MGL804 formats.
The implementation should use:
Important note: The model can be only in memory for now, but you'll need to integrate into the ADAM format later on. You'll probably need to create a new record type.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.