Code Monkey home page Code Monkey logo

streamstats.jl's Introduction

StreamStats

Julia functions to calculate

  • Shannon Entropy
  • Monte Carlo Pi-Deviation
  • Chi-Squard

Installation Steps

  1. Install julia from source or with your preferred package manager.
julia -e 'using Pkg; Pkg.add(PackageSpec(url="https://github.com/z4rd0s/StreamStats.jl"))'

Usage from julia

  1. Execute julia
using StreamStats

random(n) = rand(UInt8, n)
data = StreamStats.get_all(random(1000))

Usage from python

Install

  1. python3 -m venv .venv && source .venv/bin/activate
  2. pip install -r requirements.txt
  3. python -c "import julia; julia.install()"

Usage

Python 3.6.10 (default, Jan 16 2020, 09:12:04) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from julia.api import Julia
>>> Julia(compiled_modules=False)
>>> from julia import StreamStats
>>> import os
>>> data = bytearray(os.urandom(1000))
>>> stats = StreamStats.get_all(data)

Usage as Binary

Creating Binary

  1. Precompile StreamStats.jl
cd bin
cp ../src/StreamStats.jl .
julia --startup-file=no --trace-compile=StreamStats_precompile.jl StreamStats.jl "<anyfile.bin>example.pdf"
  1. Generate Custom Julia Image containing Precompiled
julia --startup-file=no -J"<path to julia/sys.so>/usr/share/julia-1.4.2/lib/julia/sys.so" --output-o StreamStats-sys.o StreamStats_image_generator.jl
gcc -shared -o StreamStats-sys.so -fPIC -Wl,--whole-archive StreamStats-sys.o -Wl,--no-whole-archive -L"<path to julia/lib>/usr/share/julia-1.4.2/lib/" -ljulia
  1. Building the executable
gcc -DJULIAC_PROGRAM_LIBNAME=\"StreamStats-sys.so\" -o StreamStats.bin StreamStats.c StreamStats-sys.so -O2 -fPIE -I'<path to julia>/usr/share/julia-1.4.2/include/julia' -L'<path to julia binary>/usr/share/julia-1.4.2/lib' -ljulia -Wl,-rpath,'<path to julia binary at remote destination>/usr/share/julia-1.4.2/lib:$ORIGIN'
  1. Run the executable StreamStats.bin and StreamStats-sys.so must be executed within the same directory
$ ./StreamStats.bin <path to any file>

Calling Binary from python

import subprocess
import logging
import json

def run(cmd: str) -> bool:
    """Executes a command line locally

    :param cmd: The command line string
    :type cmd: str
    :return: Returns True if the commandline could be exectued
                successfully, otherwise False
    :rtype: bool
    """
    try:
        proc = subprocess.Popen(
                cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output, error = proc.communicate(timeout=15)

        if proc.returncode == 0:
            return output
        logging.error("Command <%s> failed", cmd)
        return False
    except subprocess.TimeoutExpired:
        proc.kill()
        logging.error("Command <%s> timed out", cmd, exc_info=True)
        return False
    except Exception:
        logging.error("Command <%s> failed", cmd, exc_info=True)
        return False

cmd_StreamStats = ["./bin/StreamStats.bin", "./bin/example.pdf"]
data = json.loads(run(cmd_StreamStats))

streamstats.jl's People

Contributors

z4rd0s avatar jloehel avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

jloehel

streamstats.jl's Issues

Shannon entropy

The test data: TestInlyseTestRandom320940923u4kkljsdflkajsdf

If I calculate this via python pandas and scipy: 2.9960008231368387

Python 3.6.10 (default, Jan 16 2020, 09:12:04) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> from scipy.stats import entropy
>>> data = bytearray(b"TestInlyseTestRandom320940923u4kkljsdflkajsdf")
>>> pd_series = pd.Series(data)
>>> counts = pd_series.value_counts()
>>> entropy = entropy(counts)
>>> print(entropy)
2.9960008231368387

If I calculate it with StreamStats: 0.6235

I think there is something wrong with the frequency array in the variable d:

test_data = Vector{UInt8}("TestInlyseTestRandom320940923u4kkljsdflkajsdf")
...
1|debug> n
In compute_shanon(data) at /home/jloehel/Projekte/github.com/jloehel/StreamStats.jl/src/StreamStats.jl:6
  9      data -- data bytes
 10  """
 11  entropy::Float16 = 0
 12  d = Array([(count(x->x==i,data)) for i in data])
>13  for i in d
 14      it = float(i/length(data))
 15      if i > 0
 16          entropy += - it * log(it ,2)
 17      end

About to run: <(iterate)([2, 3, 5, 2, 1, 2, 3, 1, 5, 3, 2, 3, 5, 2, 1, 2, 2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, ...>
1|debug> w add d
1] d: [2, 3, 5, 2, 1, 2, 3, 1, 5, 3, 2, 3, 5, 2, 1, 2, 2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 3, 3, 3, 2, 5, 3, 2, 3, 3, 2, 2, 5, 3, 2]

Can you please check this

[DOCS] Python installation

Maybe we can use python3 to be clear about the version here. There is also a small typo. It should be requirements.txt. Furthermore pip 20.1.1 fails during the installation of the requirements:

pip install -r requirements.txt     
Collecting julia==0.5.3
  Using cached julia-0.5.3-py2.py3-none-any.whl (62 kB)
ERROR: Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 2)) (from versions: none)
ERROR: No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 2))

Please see: https://bugs.launchpad.net/ubuntu/+source/python-pip/+bug/1635463

[DOCS] Installation

The installation method from the README did not work for me:

julia> using Pkg

julia> pkg.add("https://github.com/z4rd0s/StreamStats.jl")
ERROR: UndefVarError: pkg not defined
Stacktrace:
 [1] top-level scope at none:0

julia> Pkg.add("https://github.com/z4rd0s/StreamStats.jl")
ERROR: https://github.com/z4rd0s/StreamStats.jl is not a valid packagename
Stacktrace:
 [1] pkgerror(::String) at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/Types.jl:120
 [2] check_package_name(::String) at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:22
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect(::Base.Generator{Array{String,1},typeof(Pkg.API.check_package_name)}) at ./array.jl:619
 [5] #add_or_develop#11 at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:28 [inlined]
 [6] #add_or_develop at ./none:0 [inlined]
 [7] #add_or_develop#10 at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:27 [inlined]
 [8] #add_or_develop at ./none:0 [inlined]
 [9] #add#18 at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69 [inlined]
 [10] add(::String) at /home/abuild/rpmbuild/BUILD/julia-1.0.3/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69
 [11] top-level scope at none:0

I installed it like this:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.3 (2018-12-16)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

(v1.0) pkg> add "https://github.com/z4rd0s/StreamStats.jl"
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Cloning git-repo `https://github.com/z4rd0s/StreamStats.jl`
  Updating git-repo `https://github.com/z4rd0s/StreamStats.jl`
 Resolving package versions...
 Installed OffsetArrays โ”€ v1.1.0
  Updating `~/.julia/environments/v1.0/Project.toml`
  [3001c9e7] + StreamStats v0.1.0 #master (https://github.com/z4rd0s/StreamStats.jl)
  Updating `~/.julia/environments/v1.0/Manifest.toml`
  [6fe1bfb0] + OffsetArrays v1.1.0
  [3001c9e7] + StreamStats v0.1.0 #master (https://github.com/z4rd0s/StreamStats.jl)

[DOCS] Usage

Can you please add the following to the usage:

using StreamStats
using Random

urandom(n) = rand(RandomDevice(), UInt8, n)

chi = StreamStats.chi_squared(urandom(10))
shanon = StreamStats.compute_shanon(urandom(10))
carlo = StreamStats.compute_monte_carlo(urandom(10))

[DOCS] Python Usage

Can we please add the following to the python usage:

>>> from julia.api import Julia
>>> jl = Julia(compiled_modules=False) 
>>> from julia import StreamStats
>>> import os
>>> data = bytearray(os.urandom(10))
>>> chi = StreamStats.chi_squared(data)
>>> shanon = StreamStats.compute_shanon(data)
>>> carlo = StreamStats.compute_monte_carlo(data)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.