Light

cbmira01 / featureranking Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 400 KB

A data science technique implemented in OpenCL.

License: Other

Python 82.94% C 16.70% Batchfile 0.36%

data-science entropy feature-reduction machine-learning-algorithms numpy opencl pyopencl python windows-10

featureranking's People

Contributors

Watchers

featureranking's Issues

Needs tests

Needs tests for:

each OpenCL kernel
dataset entropy
column entropy
compare OpenCL and Numpy results

Do this issue before refactoring issues

Need this project to work on Linux

What will docs look like?

What will device driver setup look like?

How will project deployment work?

What will different Linux distributions look like? Or just target one (Mint/Ubuntu, CentOS)?

Move performance timing into get_entropy function

We want to measure computation effort, not trial runner management.

Fix console run of opencl-handler.py

Cannot iterate over device list.

Also: make sure each module can run at console, either to run a test or to direct to the main module.

Feature ranking is completely wrong.

Feature ranking is completely wrong.

Re-read Kantardzic p68: Implement this properly in the trial runner.

Fix Wiki Report, restate properly how features are selected for removal in the round-robin trial.
Also, get rid of existing workstation results in the report.

Fix other wording as encountered.

OpenCL double-precision option

Look at kernel double-precision floats (cl_khr_fp64 option), use if possible.

Look for platforms that DO NOT have this option, and see how that works.

Need a Wiki

This project needs a wiki!

Make host code less grindy

Is there any way to make the host-code portion look any less ugly...??
Or is OpenCL host code just ugly to begin with?

Refactor ranking protocol

'Ranking protocol' code is duplicated in the feature reduction test runners, and needs to be pulled out into its own 'trial runner' module. That's also the place to make a 'trial context' object that knows everything about how to run a test.

Clean up data

Put feature names in the CSVs
Do better job indicating key and target fields
Clean up descriptive text

Fix equation rendering in Wiki report section.

Fix equation rendering in Wiki report section.

Here is a sample codecog:

https://latex.codecogs.com/svg.image?D_i_j&space;=&space;\left&space;[&space;\sum_{k=1}^{n}&space;((x_i_k&space;-&space;x_j_k)&space;/&space;(max_k&space;-&space;min_k))^{2}&space;\right&space;]^{1/2}

Work on logging

This project should have had logging early, to support refactoring and optimization.
Log files should be ignored by git.

Precompute column entropies and value ranges

Pre-compute column entropies and value ranges.
Big performance gain here.
Do the 'trial runner' first, because that's where these pre-computations will go.
Work on logging and OpenCL reduction before pre-computation.

NAN failure in cardio dataset trial

A NAN is being returned in OpenCL get_entropy function. Track this down please.
At least guard and fail nicely for it before going deeper.

Fix Open-CL sum reduction

We need Open-CL sum reduction to work properly, because at least two big calculations depend on it.

A fix here may generalize to min/max reductions.

Strip input

Make sure the menu input is less confusing

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.