Comments (11)
Hi jijo7 -
Sure, lets' take a look at this example. Let's import libraries:
import pandas as pd
from dython import nominal
Next, let's create some test data. We'll use a DataFrame
with four columns: Month, Day, Temperature and the length of a working day at the office. The first two are nominal (categorical), the latter two are numerical:
data = {'Month':['August','August','August','August','August','August','August','August','August','August','August','August',
'February','February','February','February','February','February','February','February','February','February','February','February'],
'Day':['Sunday','Monday','Tuesday','Sunday','Monday','Tuesday','Sunday','Monday','Tuesday','Sunday','Monday','Tuesday',
'Sunday','Monday','Tuesday','Sunday','Monday','Tuesday','Sunday','Monday','Tuesday','Sunday','Monday','Tuesday',],
'Temperature':[34,32,33,36,37,35,29,32,33,32,36,30,
19,22,21,17,15,14,19,20,22,20,19,18],
'WorkingHours':[0,9.5,8.5,0,9,8.5,0,10,9.5,0,8,8.5,
0,8.5,9,0,9,9,0,10,8,0,8.5,9.5]}
df = pd.DataFrame(data)
Now all you need to do is use the associations
function, and state which columns are the nominal (categorical) ones. That's it, no additional conversions are required:
nominal.associations(df, nominal_columns=['Month','Day'])
This will yield the following heat-map:
The associations between the different features are different:
- The association between Month and Day is computed using Cramer's V (This could be replaced with Theil's U by adding
theil_u=True
to the parameters ofnominal.associations
) - The association between Month and Temperature is computed using Correlation Ratio (same for Day and WorkingHours)
- The association between Temperature and WorkingHours is computed using Pearson's R (correlation)
So, if you would like to have separate plots, one only for categorical and one only for numeric, simply filter the columns you pass to the function:
nominal.associations(df[['Month,'Day']], nominal_columns='all')
nominal.associations(df[['Temperature','WorkingHours']])
Hope this clears things for you!
from dython.
@shakedzy Hi!
I am so grateful for sharing your time and knowledge.
Really, I learned a lot from your code.
Sorry, could you please let me know if is it possible to plot two separate plots for categorical features i.e., one for correlation ratio and another one for Cramer's V? Maybe some extra cells need to be masked each time.
Thank you in advance.
Warmest regards,
from dython.
@jijo7 -
It is not possible to plot Correlation Ratio for categorical features only, as by definition Correlation Ratio is computed for a categorical feature and a numerical feature.
If you wish to plot Cramer's V for categorical features only, simply pass only the categorical columns to the function, like I posted at the bottom of my previous comment:
nominal.associations(df[['Month,'Day']], nominal_columns='all')
Where ['Month,'Day']
are the only categorical columns in df
.
from dython.
@shakedzy Hi
Thank you very much. Could you please take a look at the following code?
corrmat = df.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corrmat, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
hm = sns.heatmap(cm, mask=mask, cbar=True, annot=True,
square=True, fmt='.2f', annot_kws={'size': 7},
yticklabels=cols.values,
xticklabels=cols.
values)
or something like this:
# get the correlation coefficient between the different columns
corr = df3.iloc[:, 1:].corr()
arr_corr = corr.as_matrix()
# mask out the top triangle
arr_corr[np.triu_indices_from(arr_corr)] = np.nan
Surely, the numeric variables and all categorical variables should be passed in order to get correlation ratio and Cramer's V, but is it possible to mask the correlation matrix before passing it into the sns.heatmap
?
Thank you in advance.
Warmest regards,
from dython.
@jijo7 I cannot understand what are you trying to do.. I don't know what you have in df
or df3
and what is the general purpose.. Also, you don't use any of my code, so I don't understand how is this related to my library..
from dython.
@shakedzy Sorry, df
or df3
is just provided as an example to show what we could do if we want to plot the lower triangle of the correlation matrix.
Now, I want to know if it is possible to use such a method to plot correlation ratio and Cramer's V separately.
Surely, I want to use your library but I need Pearson's correlation, correlation ratio and Cramer's v to be separately plotted (i.e., 3 plots).
Thanks in advance.
from dython.
@jijo7 As I've replied before, if you want Cramer's V separately, pass only the categorical columns. Correlation Ratio is for categorical and numerical together.
If you want to mask one of the triangles, yes you can, Cramer's V and Correlation Ratio are symmetrical (note that Thiel's U isn't).
from dython.
@shakedzy Could you please let me know how to mask the correlation matrix to plot just the following part (correlation ratio) in your example?
from dython.
assoc = nominal.associations(df, nominal_columns=['Month','Day'], plot=False, return_results=True)
sns.heatmap(assoc.loc[['Day','Month'],['Temperature','WorkingHours']], annot=True, fmt='.2f')
from dython.
@shakedzy how can one increase the plot size using nominal
from dython.
Use figsize
. Please see the documentation at shakedzy.xyz/dython
from dython.
Related Issues (20)
- FAILED tests/test_nominal/test_associations.py::test_datetime_data - AssertionError: datetime associations are incorrect. HOT 6
- TypeError: associations() got an unexpected keyword argument 'theil_u' HOT 1
- No heatmap shown HOT 2
- Add option to drop nan values in each pair of columns independently
- Use Black for code formatting
- (docs) documentation for `nominal` module not updated on website HOT 2
- Allow re-plotting of associations heat-map HOT 1
- Run tests per each major Python version HOT 2
- Pandas must be limited to <1.5.0 HOT 4
- dython.nominal.associations handling fillna with dtype="category" HOT 3
- Issue with plotting heatmap using Dython associations HOT 2
- Cramer vs. Theil HOT 2
- ks_abc when run with plot=False still plots the graph HOT 13
- TypeError Traceback (most recent call last) HOT 1
- assotications function from pip or conda does not have multiprocessing or max_cpu_core ?? HOT 2
- associations function's nan_strategy not working?? HOT 2
- ks_abc when run with plot=False still plots the graph HOT 1
- Add type hints to functions
- speed
- Add official support for Python 3.12 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dython.