Code Monkey home page Code Monkey logo

genetic_algorithm_challenge's Introduction

genetic_algorithm_challenge

Genetic Algorithm Challenge for Learn Python for Data Science #6 by @Sirajology on Youtube

Overview

This is the code for Genetic Algorithms by @Sirajology on Youtube. In this demo code we use the MAGIC Gamma Telescope dataset to build a classifer. The classifier will train on the dataset and then be able to classify whether or not some energy is either Gamma Radiation or Hadron Radiation. Instead of guessing and checking the best ML model and hyperparameters to use, we use a genetic programming library called tpot to do that for us by trying out a bunch of them. See this link for an IPython notebook version of this code.

Dependencies

  • Numpy
  • tpot
  • scikit-learn
  • pandas

Use pip to install any missing dependencies

Usage

To run the demo code, after installing the dependencies, just run the following in terminal

python3 demo.py

Challenge

The challenge for this video is to use the TPOT library to make a discovery based on a question you pose. '

Step 1 - Download this Climate Change Dataset

Step 2 - Think of a question that this dataset will help you answer like "Has the temperature in India risen over the past 20 years?"

Step 3 - Clean the data and use TPOT to help you build a machine learning pipeline to answer your question

Step 4 - Post your GitHub link in the comments!

Credits

Credit for the vast majority of code here goes to Randy Olson. I've merely created a wrapper around all of the important functions to get people started.

genetic_algorithm_challenge's People

Contributors

llsourcell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

genetic_algorithm_challenge's Issues

i had---sudo pip install tpot-----but----ImportError: cannot import name TPOT

what ever " python demo.py" or "python3 demo.py", err is same......



➜  python git:(jifen) ✗ sudo pip install tpot
The directory '/Users/tonylibai/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/tonylibai/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: tpot in /Library/Python/2.7/site-packages
Requirement already satisfied: numpy in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from tpot)
Requirement already satisfied: scipy in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from tpot)
Requirement already satisfied: scikit-learn in /Library/Python/2.7/site-packages (from tpot)
Requirement already satisfied: deap in /Library/Python/2.7/site-packages (from tpot)
Requirement already satisfied: update_checker in /Library/Python/2.7/site-packages (from tpot)
Requirement already satisfied: tqdm in /Library/Python/2.7/site-packages (from tpot)
Requirement already satisfied: requests>=2.3.0 in /Library/Python/2.7/site-packages (from update_checker->tpot)
➜  python git:(jifen) ✗ python /Users/tonylibai/code/python/genetic_algorithm_challenge/demo.py
Traceback (most recent call last):
  File "/Users/tonylibai/code/python/genetic_algorithm_challenge/demo.py", line 1, in <module>
    from tpot import TPOT
ImportError: cannot import name TPOT

Cannot Import tpot

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')

Error:

Traceback (most recent call last):
File "test.py", line 1, in
from tpot import TPOTClassifier
ModuleNotFoundError: No module named 'tpot'

Pip3 list shows tpot installed

skrebate (0.4)
spacy (1.8.0)
stopit (1.1.1)
tensorflow (1.2.1)
termcolor (1.1.0)
thinc (6.5.2)
toolz (0.8.2)
TPOT (0.9.2)
tqdm (4.19.4)
ujson (1.35)
update-checker (0.16)
urllib3 (1.22)
Werkzeug (0.12.2)
wheel (0.30.0)
wrapt (1.10.11)

This is mac OS

UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

runfile('C:/Users/Shrishti D Hore/OneDrive/Documents/Genetic_algorithms.py', wdir='C:/Users/Shrishti D Hore/OneDrive/Documents')
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Shrishti D Hore/OneDrive/Documents/Genetic_algorithms.py', wdir='C:/Users/Shrishti D Hore/OneDrive/Documents')

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Shrishti D Hore/OneDrive/Documents/Genetic_algorithms.py", line 14, in
telescope=pd.read_csv('‪C:\Users\Shrishti D Hore\OneDrive\Documents\gamma.csv')

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 895, in init
self._make_engine(self.engine)

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)

File "C:\Users\Shrishti D Hore\Miniconda3\lib\site-packages\pandas\io\parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)

File "pandas_libs\parsers.pyx", line 387, in pandas._libs.parsers.TextReader.cinit

File "pandas_libs\parsers.pyx", line 686, in pandas._libs.parsers.TextReader._setup_parser_source

UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

Didn't undertand the meaning of this error. How to solve it ???

Here's the code for the youtube video's example. (Coz, if you run the code uploaded here, it won't work. I correceted the code)

Precondition : I assume that you got 'MAGIC Gamma Telescope Data.csv' file from my previous posting. Just copy and paste in Notepad and save it as .csv file.


from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split
from pandas import *

# load teh data
df=read_csv('MAGIC Gamma Telescope Data.csv')

# clean the data
features = df.drop('Class', axis=1).values # = X

df['Class'] = df['Class'].map({'g':0, 'h':1})  # changing 'g','h' into 0 and 1
target = df['Class'].values # = y


# Split the data
X_train, X_test, y_train, y_test = train_test_split(features, target, train_size=0.8, test_size=0.2)

# Let Genetic Programming find best ML model and hyperparameters
tpot = TPOTRegressor( generations=5,  verbosity=2 )
# usually, for generations, population_size, offspring_size, the default options are good.
# If you want to get a result quickly, adjust only generation.
tpot.fit(X_train, y_train)

# Score the accuracy
tpot.score(X_test, y_test)
print("Cross Validation(CV) score : {}  /  0<= CV score <= 1(perfectly accurate) ".format(tpot.score(X_test, y_test)))

# Export the generated code
tpot.export('tpot_test1_pipeline.py')

Optimization Progress: 13%|█▎ | 78/600 [27:19<3:55:49, 27.11s/pipeline]
Optimization Progress: 34%|███▎ | 202/600 [1:06:22<2:12:22, 19.96s/pipeline]Generation 1 - Current best internal CV score: 0.09374578856689733
Optimization Progress: 39%|███▉ | 233/600 [1:21:01<39:46, 6.50s/pipeline]

  • It's being processed but you already can see that it almost converges to 0. (Since TPOTRegressor's default scoring is MSE(Mean Squared Estimation). In MSE, the number 0 means the most accurate.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.