diabetes-mellitus-prediction-in-pima-indians's Introduction

Diabetes-mellitus-prediction-in-Pima-Indians

This repository was created to show the workshop for "Switch Fb developer circle", which is going to take place on 05-08-2017.

Títle: "Proceso CRISP-DM aplicado a la predicción de la diabetes mellitus con datasets públicos"

The content of the workshop is divided into two stages:

Theoretical

1.-The CRISP-DM and BAB process.

2.-Scrum Agile and how to mix it with Data Science from my experience

2.-Explanation of the problem to solve, structure and problems that we face in the dataset.

Hands on

1.-Understanding and characterization of the data.

2.-EDA for the Data Understanding

3.-Data preparation

4.-Application of logistic,GridSearch algortihm and Random Forest models

5.-Performance analysis

6.-Conclusions

In the future: TBD according to the feedback of the first audience in the Dev Circle Ago/2017 in SCL-CL

the force be with you

diabetes-mellitus-prediction-in-pima-indians's People

Contributors

Stargazers

Watchers

diabetes-mellitus-prediction-in-pima-indians's Issues

Ejemplo de Test Anova

import pandas as pd
from pandas import read_csv
import numpy as np
import scipy as sp
import matplotlib as plt
get_ipython().magic(u'matplotlib inline') 
get_ipython().magic(u"config InlineBackend.figure_format='retina'")
#import plotly
#import plotly.plotly as py
#import plotly.graph_objs as go
#from plotly.tools import FigureFactory as FF
import seaborn as sbs
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
# Evaluate using Cross Validation
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

#El fragmento siguiente carga el conjunto de datos de inicio de diabetes de los indios Pima
#Link a los datos https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes
url = "https://goo.gl/vhm1eU"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = read_csv(url, names=names)
df.head()

import scipy.stats as stats

voter_frame = df[['pres','class']]
groups = voter_frame.groupby("class").groups

keys = list(groups.keys())

c0 = voter_frame[voter_frame.index.isin(groups[keys[0]])]['pres']
c1 = voter_frame[voter_frame.index.isin(groups[keys[1]])]['pres']

stats.f_oneway(c0, c1)