Code Monkey home page Code Monkey logo

simplebiostats's Introduction

Simple Biostatistics in Python

Motivation

This package was intended as a set of conveniency functions (mostly) based StatsModels (and SciPy.Stats) since it might be a bit challenging for a new comer to Python to find the tools to do simple biostatistics.

This package presuppose some knowledge of pandas and is intended to work within a Jupyter notebooks.

For advanced users, we definitely advise them to use the full packages instead:

How to use

For Jupyter notebooks, Download the folder and place it in your notebooks folder. The folder structure should look like that:

project 
│
└───notebooks
    │   notebook1.ipynb
    |   notebook2.ipynb
    |   ...
    │
    └───data
    |   │   data.csv
    |   │   ...
    |
    └───simplebiostats
        │   simplebiostats.py
        │   ...

Once in there, you can import the function by inserting from simplebiostats.simplebiostats import <name of the function>.

A typical usecase will look like that

import pandas
from simplebiostats.simplebiostats import check_normality

data_df = pandas.read_csv('data/data.csv')

check_normality(data_df, 'variable')

You can also import .dta or .xslx files with pandas.read_stata(...) and pandas.read_excel(...).

Functions

In the following data_df is a pandas dataframe containing the data each variable being a column. variable is a string containing the name of the variable of interest containted in the data_df.

summarize(data_df)

Provide some basic data about all the variables in the data. This contains the mean, the standard deviation, the inter-quantile ratio (IQR), the standard error, the skewness, the kurtosis, the median and the 2.5%, 25%, 75% and 97.5% percentiles.

check_normality(data_df, variable)

Function to plot a boxplot, an histogram with fitted normal distribution and a Q-Q plot.

get_ci(data_df, variable, alpha=0.05)

Function to get the confidence interval of the mean assuming a normal distribution. You can adjust the alpha if needed.

get_pi(data_df, variable, alpha=0.05)

Same as above but for the confidence interval of the prediction.

ttest_1samp(data_df, variable, hypothesis=0)

T-test for a variable compared to an hypothesis which is 0 by default.

ttest_ind_by_group(data_df, group_variable, variable)

T-test for a variable where the data is splitted in 2 groups according to the group_variable. The 2 variables are assumed to be independant.

ftest_std_by_group(data_df, group_variable, variable)

F-test of equal variance between 2 groups as define by the group_variable.

bootstrap_by_group(data_df, group_variable, variable, n_iter)

Bootstrap method to evaluate the distribution of the mean.

ranksums_by_group(data_df, group_variable, variable)

Non-parametric test of equality between two groups.

power_calculation_ttest(mean_diff, std, alpha=0.05, power=None, n_participants=None)

To calculate either the power or the number of participants needed based on t-test, according to which variable between power and n_participants is set.

plot_scatter_blandalt(input_df, var1, var2)

Plot a scatter plot of var1 vs var2 and a Bland-Altman plot for these variable (which is diff vs mean). This is used for paired data to evaluate the "same distribution" assumption.

ttest_paired(data_df, var1, var2)

T-test but for paired data.

anova_by_group(data_df, resp_var, group_var)

ANalysis Of VAriance (ANOVA) in the case of multiple means between groups. It includes a Bartlett's test of "same distribution".

kruskal_by_group(data_df, resp_var, group_var):

Kruskal-Wallis test of "same distribution". Same idea as the Bartlett's test but non-parametric.

LinearRegression(data_df, resp_var, cont_var=None, cat_var=None, with_interactions=False)

To perform a linear regression. You need first to create the object before you can use on it the method below. It's thought to be used in very simple case where there is a continuous variable (cont_var) and a categorical one(cat_var). resp_var is the response variable (or the outcome). The with_interactions flag tells if you want to include the interaction between the 2 variables or not.

fit()

To fit the model. It will return the coefficients of the fit as well as an ANOVA.

check_model()

This will plot a scatter plot of the residuals vs the predictions and one for the the residuals vs the continous variable. The goal is to make sure than the residuals are independant of the rest. This will furthermore plot the figures from the check_normality method on the residuals to make sure they are also normally distributed.

plot_predictions(alpha=0.05)

This function will plot the data with the prediction intervals along the continous variable.

predict(data_df, alpha=0.05)

It will calculate the predictions for an input data set, including confidence intervals.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.