Code Monkey home page Code Monkey logo

data_mining_tools's Introduction


Python Data Mining Tools

  1. Pandas is used for general data wrangling.

  2. Data can be read in directly from a text file into the pandas data frame, or via an SQL database.

  3. Regression and Classifcation machine learning tasks are performed using scikit-learn.


Files and Directories

process_data.in - exmaple input deck file containing the input parameters process_data.py - main program README.md - this read me file run - run script to run the main program

libClean/ - library containing the subroutines undertaking the data reading and cleaning clean.py - source to read data directly into a pandas dataframe clean_sql.py - source to read data into SQL database

libInputDeck/ - library containing source to process the input deck input_deck.py - source to process the input deck

libMachineLearning/ - library containing the supervised machine learning tools classification.py - classification specific class machine_learning.py - general base class regression.py - regression specific class

libVisualisation/ - library containing the subroutines for visualisation the output visualisation.py - source to produce standard matplotlib line plots visualisation_sb.py - source to produce seaborn correlation and pair plots

libTheano/ - library neural networks implemented using Theano - source code needs to be revised to integrate with above libaries


To Do List

  1. add pandas and SQL to a master database class with additional options for: MongoDB; Hadoop; Spark

  2. integrate theano neural network source into machine learning library

  3. upload data, add examples and associated results

data_mining_tools's People

Contributors

vassilikitsios avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.