Code Monkey home page Code Monkey logo

reliefseq's Introduction

ReliefSeq

DEPRECATED

ReliefSeq has been deprecated and reimplemented in inbix on github.

A feature selection tool for biological SEQuence data

Description

ReliefSeq is a free, open-source command-line tool for analysis of GWAS (SNP) and other types of biological data. Several modes are available for various types of analysis.

ReliefSeq is developed by the In Silico Research Group at the Tandy School of Computer Science of the University of Tulsa. Our research is sponsored by the NIH and William K. Warren foundation. For more details, visit our research website.

Dependencies

  • GNU Scientific library (libgsl)
  • Boost system, filesystem, and program-options libraries
  • OpenMP is required to take advantage of the parallelized distance matrix calculations for ReliefF. This library is typically installed alongside the compiler toolchain.

Compilation Environment and Instructions

To compile this code, a GNU toolchain and suitable environment are required. GNU g++ has been used to successfully compile the code. We have successfully built and run ReliefSeq on:

  • Linux (64-bit Ubuntu) (gcc-4.6)
  • Linux (64-bit) gcc (Debian 4.4.5-8) 4.4.5

To build ReliefSeq, first run the bootstrap script:

./bootstrap.sh

Ignore any extraneous warnings. This calls autoreconf and generates the configure script.

Set any OS or other system-sepcific environment variables. For example OSX:

export BOOST_ROOT="/usr/local" export CXXFLAGS="-I/usr/local/Cellar/gcc/4.9.2_1/lib/gcc/4.9/gcc/x86_64-apple-darwin13.4.0/4.9.2/include $CXXFLAGS" export LDFLAGS="-L/usr/local/Cellar/gcc/4.9.2_1/lib/gcc/4.9 $LDFLAGS"

From this point, the standard build procedure:

./configure && make && sudo make install

will generate the Makefile, compile and link the code, and copy the executable to the installation directory prefix (default of /usr/local).

Usage

reliefseq:

--help                                produce help message
--verbose                             verbose output
--convert                             convert data set to data set - does not
																			run reliefseq
--write-best-k                        optimize k, write best k's
--write-each-k-scores                 optimize k, write best scores for each 
																			k
-c [ --config-file ] arg              read configuration options from file - 
																			command line overrides these
-s [ --snp-data ] arg                 read SNP attributes from genotype 
																			filename: txt, ARFF, plink (map/ped, 
																			binary, raw)
--snp-file-type arg                   Ignore file extension and use type: 
																			textwhitesp, wekaarff, plinkped, 
																			plinkbed, plinkraw, dge, birdseed
-n [ --numeric-data ] arg             read continuous attributes from 
																			PLINK-style covar file
-X [ --numeric-transform ] arg        perform numeric transformation: 
																			normalize, standardize, zscore, log, 
																			sqrt, anscombe
-a [ --alternate-pheno-file ] arg     specifies an alternative 
																			phenotype/class label file; one value 
																			per line
-g [ --algorithm-mode ] arg (=relieff)
																			Relief algorithm mode 
																			(relieff|reliefseq)
--seq-algorithm-mode arg (=snr)       Relief algorithm mode (snr|tstat)
--seq-snr-mode arg (=snr)             Seq interaction algorithm SNR mode 
																			(snr|relieff)
--seq-tstat-mode arg (=pval)          Seq interaction algorithm t-statistic 
																			mode (pval|abst|rawt)
--seq-algorithm-s0 arg (=0.050000000000000003)
																			Seq interaction algorithm s0 (0.0 <= s0
																			<= 1.0)
-t [ --num-target ] arg               target number of attributes to keep 
																			after backwards selection
-r [ --iter-remove-n ] arg            number of attributes to remove per 
																			iteration of backwards selection
-p [ --iter-remove-percent ] arg      percentage of attributes to remove per 
																			iteration of backwards selection
--normalize-scores arg (=0)           normalize ReliefF scores? (0|1)
-O [ --out-dataset-filename ] arg     write a new tab-delimited data set with
																			EC filtered attributes
-o [ --out-files-prefix ] arg (=reliefseq_default)
																			use prefix for all output files
--snp-metric arg (=gm)                metric for determining the difference 
																			between subjects (gm|am|nca|nca6)
-B [ --snp-metric-nn ] arg (=gm)      metric for determining the difference 
																			between subjects (gm|am|nca|nca6|km)
-W [ --snp-metric-weights ] arg (=gm) metric for determining the difference 
																			between SNPs (gm|am|nca|nca6)
-N [ --numeric-metric ] arg (=manhattan)
																			metric for determining the difference 
																			between numeric attributes 
																			(manhattan=|euclidean)
-x [ --snp-exclusion-file ] arg       file of SNP names to be excluded
-k [ --k-nearest-neighbors ] arg (=10)
																			set k nearest neighbors (0=optimize k)
--kopt-begin arg (=1)                 optimize k starting with kopt-begin
--kopt-end arg (=1)                   optimize k ending with kopt-end
--kopt-step arg (=1)                  optimize k incrementing with kopt-step
-m [ --number-random-samples ] arg (=0)
																			number of random samples (0=all|1 <= n 
																			<= number of samples)
-b [ --weight-by-distance-method ] arg (=equal)
																			weight-by-distance method 
																			(equal|one_over_k|exponential)
--weight-by-distance-sigma arg (=2)   weight by distance sigma
-d [ --diagnostic-tests ] arg         performs diagnostic tests and sends 
																			output to filename without running EC
-D [ --diagnostic-levels-file ] arg   write diagnostic attribute level counts
																			to filename
--dge-counts-data arg                 read digital gene expression counts 
																			from text file
--dge-norm-factors arg                read digital gene expression 
																			normalization factors from text file
--birdseed-snps-data arg              read SNP data from a birdseed formatted
																			file
--birdseed-phenos-data arg            read birdseed subjects phenotypes from 
																			a text file
--birdseed-subjects-labels arg        read subject labels from filename to 
																			override names from data file
--birdseed-include-snps arg           include the SNP IDs listed in the text 
																			file
--birdseed-exclude-snps arg           exclude the SNP IDs listed the text 
																			file
--distance-matrix arg                 create a distance matrix for the loaded
																			samples and exit
--gain-matrix arg                     create a GAIN matrix for the loaded 
																			samples and exit
--dump-titv-file arg                  file for dumping SNP 
																			transition/transversion information

All commands will include an input file (-s/--snp-data), and, optionally, an output file prefix (-o/--output-files-prefix).

To perform a standard, all-default-parameters analysis,

./reliefseq -s snpdata.ped -o result

This will use genotype/phenotype information from snpdata.ped, a PLINK plaintext GWAS file, in the feature selection. All of the output files produced will be prepended with 'result'.

This produces a file called result.reliefseq, in which the SNPs are ranked in descending order.

For additional examples, see the ReliefSeq page on our website.

Contributors

See AUTHORS file.

References

reliefseq's People

Contributors

hexhead avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

sandy4321

reliefseq's Issues

Boost problem, Ubuntu 14.04

I compiled Reliefseq last summer and had it running on a Ubuntu 12.04 system, then did not use it for a while. In the meantime, the system was upgraded to Ubuntu 14.04. When I tried Reliefseq again, it complained "error while loading shared libraries: libboost_program_options.so.1.46.1"
The upgraded system has Boost 1.47, but I thought perhaps not all Boost libraries had been upgraded, so I ran apt-get install libboost-all-dev to make sure all libraries were updated (several packages were installed, so that seemed to have been worthwhile). After that I removed the previously-compiled version of Reliefseq, cloned the Github repo, and set BOOST_ROOT to /usr/local/include/boost, then ran bootstrap.sh and configure, both of which seemed to complete without problems. When I run make, however, things go wrong after a while. What seems to me to be the last command executed is pasted below, followed by a series of error messages about boost_program_options. The output from ./configure indicated that the BOOST_PROGRAM_OPTIONS variables were set without issues, so I'm not sure how to proceed.

"/bin/bash ./libtool --tag=CXX --mode=link g++ -I. -I/usr/local/include/boost/include -fopenmp -O3 -Wall -I/usr/local/include -g -O2 -fopenmp -L/usr/local/lib -L/usr/local/Cellar/gcc/4.9.2_1/lib/gcc/4.9 -lgomp -o reliefseq ReliefSeqCLI.o Insilico.o DistanceMetrics.o Statistics.o Dataset.o ArffDataset.o PlinkDataset.o PlinkBinaryDataset.o PlinkRawDataset.o DgeData.o BirdseedData.o DatasetInstance.o AttributeRanker.o ChiSquared.o ReliefF.o RReliefF.o SNReliefF.o ReliefFSeq.o ReliefSeqController.o -L/usr/local/lib -L/usr/local/Cellar/gcc/4.9.2_1/lib/gcc/4.9 -lgomp -lgsl -lgslcblas -lboost_program_options -L. -lgomp
libtool: link: g++ -I. -I/usr/local/include/boost/include -fopenmp -O3 -Wall -I/usr/local/include -g -O2 -fopenmp -o reliefseq ReliefSeqCLI.o Insilico.o DistanceMetrics.o Statistics.o Dataset.o ArffDataset.o PlinkDataset.o PlinkBinaryDataset.o PlinkRawDataset.o DgeData.o BirdseedData.o DatasetInstance.o AttributeRanker.o ChiSquared.o ReliefF.o RReliefF.o SNReliefF.o ReliefFSeq.o ReliefSeqController.o -L/usr/local/lib -L/usr/local/Cellar/gcc/4.9.2_1/lib/gcc/4.9 -lgsl -lgslcblas -lboost_program_options -L. -lgomp -fopenmp
ReliefSeqCLI.o: In function std::basic_string<char, std::char_traits<char>, std::allocator<char> > const& boost::program_options::validators::get_single_string<char>(std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool)': /usr/local/include/boost/include/boost/program_options/detail/value_semantic.hpp:58: undefined reference toboost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::string const&, std::string const&)'
/usr/local/include/boost/include/boost/program_options/detail/value_semantic.hpp:62: undefined reference to boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::string const&, std::string const&)' ReliefSeqCLI.o:(.rodata._ZTVN5boost16exception_detail19error_info_injectorINS_15program_options20invalid_option_valueEEE[_ZTVN5boost16exception_detail19error_info_injectorINS_15program_options20invalid_option_valueEEE]+0x20): undefined reference toboost::program_options::validation_error::what() const'
ReliefSeqCLI.o:(.rodata._ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options20invalid_option_valueEEEEE[_ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options20invalid_option_valueEEEEE]+0x20): undefined reference to boost::program_options::validation_error::what() const' ReliefSeqCLI.o:(.rodata._ZTVN5boost16exception_detail19error_info_injectorINS_15program_options16validation_errorEEE[_ZTVN5boost16exception_detail19error_info_injectorINS_15program_options16validation_errorEEE]+0x20): undefined reference toboost::program_options::validation_error::what() const'
ReliefSeqCLI.o:(.rodata._ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options16validation_errorEEEEE[_ZTVN5boost16exception_detail10clone_implINS0_19error_info_injectorINS_15program_options16validation_errorEEEEE]+0x20): undefined reference to boost::program_options::validation_error::what() const' ReliefSeqCLI.o:(.rodata._ZTVN5boost15program_options20invalid_option_valueE[_ZTVN5boost15program_options20invalid_option_valueE]+0x20): undefined reference toboost::program_options::validation_error::what() const'
collect2: error: ld returned 1 exit status
make[1]: *** [reliefseq] Error 1
make[1]: Leaving directory `/home/ross/software/reliefseq'
make: *** [all] Error 2"

OpenMP test?

I compiled and installed ReliefSeq on an Ubuntu 12.04 system that has Boost and OpenMPI libraries installed. The program ran successfully using the --dge-counts-data option as outlined on the web page (http://insilico.utulsa.edu/ReliefSeq.php), although the screen output said it would use 24 threads and the 'top' command showed only one processor used by the reliefseq process. When I try using a PLINK-covariate-file-format input file with the -n option, and a phenotype file specified by the -a option, the program output again says the program will use all 24 available cores, but hangs and does nothing for hours. Is there a way to test the program to make sure it compiled correctly to give access to the openmpi libraries?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.