Code Monkey home page Code Monkey logo

harry's Introduction

  • ๐ŸŒˆ Hi, Iโ€™m @rieck
  • ๐ŸŽ“ I am a Professor at TU Berlin, where I head the Chair of Machine Learning and Security.
  • ๐Ÿ› ๏ธ Although I rarely find the time anymore, I love programming and tinkering with code.

harry's People

Contributors

chwress avatar k-freeman avatar msagency avatar rieck avatar tastuteche avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

harry's Issues

Support for Python 3

It would be great, if the Python module for Harry could be extended to support Python version 3.

Support for compression of stdout and stdin

If possible Harry should support receiving zlib compressed input on stdin as well as output compressed data on stdout. This feature needs to be merged with the compress flag in the output configuration.

Testing of Python module

The Python module requires a thorough testing:

  • Does the mapping of command-line options to keyword arguments work?
  • What happens if an external configuration file is loaded?
  • Is the module robust to incorrect usage, e.g. tampering with the formats?

Happy testing.

Split computation broken

There is a bug in the new split computation if certain ranges are selected. Example:

./harry -x :10 -y -10: harry.c /dev/null
harry: hmatrix.c:159: hmatrix_inferspec: Assertion `spec->b_left + spec->a + spec->b_right == width' failed.
Aborted

./harry -x 0:2 -y 3:4 -v harry.c /dev/null
harry: hmatrix.c:159: hmatrix_inferspec: Assertion `spec->b_left + spec->a + spec->b_right == width' failed.
Aborted

Comparison of string pairs

There exist analysis tasks where the similarity between pairs of strings needs to be computed. In this setting, computing a similarity matrix over all strings is clearly an overkill and it would be great if Harry could support this setting, e.g. using a special command-line option.

An error when compile

./configure --prefix=/opt/harry --with-libarchive
make
error message
make[3]: Entering directory /home/harry-0.4.0/src' /bin/sh ../libtool --tag=CC --mode=link gcc -g -O2 -DNDEBUG -std=c99 -fgnu89-inline -Wall -fPIC -fopenmp -o harry harry.o libharry.la -lpthread -larchive -lz -lm -lconfig libtool: link: gcc -g -O2 -DNDEBUG -std=c99 -fgnu89-inline -Wall -fPIC -fopenmp -o harry harry.o ./.libs/libharry.a -lpthread -larchive -lz -lm -lconfig -fopenmp ./.libs/libharry.a(input_arc.o): In functioninput_arc_open':
/home/harry-0.4.0/src/input/input_arc.c:50: undefined reference to archive_read_support_filter_all' collect2: ld returned 1 exit status make[3]: *** [harry] Error 1 make[3]: Leaving directory/home/harry-0.4.0/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory /home/harry-0.4.0/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/home/harry-0.4.0'
make: *** [all] Error 2

I have check the tools version of libarchive.
It's libarchive-devel-2.8.3-4.el6_2.x86_64
My system version is CentOS release 6.5

Please help me to fix this issue
Thanks!

Number of running threads decreases

During the computation of a similarity matrix, the number of running threads steadily decreases. The bug occurs on Linux and Mac OS X. It's unclear if it has been introduced in the last commits or has been around for a long time.

Reproduce with

src/harry -v -n 4 examples/dna/dna_hg16.txt /dev/null

At around 26% the first of the four threads stops running and idles.

Uniform splitting of full square matrices.

Internally, Harry only stores the upper triangle of full square matrices. If these matrices are split row-wise using the command-line option -s, the first block will require far more computations than lower blocks.

problem building Harry 0.4.1 with "undefined reference to 'config_init'" error

Hi,
This is the first time that i tried to build Harry (with latest version 0.4.1). However, i ran into some front-end compilation errors while firing off "make" (the ./configure appeared to go well) such as
.../src/harry.c:357: undefined reference to 'config_init'
.../src/harry.c:360: undefined reference to 'config_read_file'
....

All the errors reported are related to missing config_* references (looks like for the C API). i'm using libconfig_1.5 that i recently built. Any idea what went astray?

Thanks for reading.

Support for bit substrings

776a9ec introduces support for comparing strings on the level of bits. However, similarity measures that rely on the efficient extraction of substrings are not yet supported.

Testing of new input feature

The current master contains a new feature: The command-line tool can be called with either one or two input sources. This feature needs to be tested:

  • Does the new feature work with different input formats?
  • What happens if x-ranges and y-ranges are provided?
  • Are there cases where the new feature breaks functionality?

Happy testing. It might be possible to construct a test case and add it to tests

Support for automatic splitting of matrices

Harry should support automatically splitting a large matrix into smaller pieces. In particular, the user should be able to specify the number of required splits and the specific split to compute. A possible command-line option could look like this

harry -s 5:2 ...

where Harry would compute the 2nd out of 5 splits. This feature can be used to compute a similarity matrix in a distributed fashion by concurrently running -s 5:1, -s 5:2, ... -s 5:5.

As the merging of the splits depends on the selected output format, merging should be postponed to a later version.

Increase size of symbols to 64 bit

When splitting a string into words using delimiters, Harry represents each word by a hash value. Currently, this hash value has 16 bit and thus collisions are likely if many different words are present.

The hash size should be increased to 64 bit, such that the hash values till fit into a generic data type. This requires reviewing and updating all similarity measures for side effects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.