Hi there, Thanks for this handy set of tools and the excellent artic

Ah ok, many thanks! Yes sorry- that

Hi, two things: You're passing several columns to <code class=

correlation_ratio produces key error in flatten to cat_measures,about shakedzy/dython

tpfd commented on May 30, 2024 1

Ah ok, many thanks!
Yes sorry- that is due to it being the first 10 rows only as example data. All 20,000 rows have 7 classes across them.

Will have a go in light of 1) and report back! Many thanks.

from dython.

madhur0006 commented on May 30, 2024 1

Hi dere,
please help on above doubts.

from dython.

shakedzy commented on May 30, 2024

Hi, thanks for the feedback! Could you please post the array you're sending? I don't know what df_dython = df_prep.drop(['Bedrock value'], axis = 1) is..

from dython.

tpfd commented on May 30, 2024

Apologies, I should have attached that from the start. That drop is just me removing the pre-encoded data from the df as your function does that its self. Example data from the array as input attached.

example_df.zip

from dython.

shakedzy commented on May 30, 2024

Hi, two things:

You're passing several columns to correlation_ratio as the second argument, which is not how you use it. Check the function's documentation - you should be passing A sequence of continuous measurements. I see that there could be a confusion, as I accidentally write that you can pass a DataFrame - I fixed that in the documentation, so thanks.
The Bedrock column only has a single value in it (at least the in the data you uploaded), so there's no real meaning to a Correlation Ratio, as there is only one class
Perhaps you could elaborate on what exactly is it you're trying to achieve, and I can try and guide you through

from dython.

henanksha commented on May 30, 2024

Hi, forstly, thank you so much for the brilliant article! I cant tell you how helpful your article has been with regards to clarifying my doubts on correlations. Appreciate all your work.

I am facing an issue while implementing the Correlation Ratio.
~ Dataset: Kaggle's Titanic Train dataset (https://www.kaggle.com/startupsci/titanic-data-science-solutions/data)
~ Aim: Calculate Correlation Ratio between 2 Categorical features ('Survived' & 'Gender') and 2 continous features ('Fare' & 'Age')
~ Code I tried:
numcols = ['Fare','Age']
catcols = ['Survived','Sex']

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = numerator/denominator
return eta

correlation_ratio(catcols, numcols)
~ Error in line: cat_measures = measurements[np.argwhere(fcat == i).flatten()]
~ Error: TypeError: only integer scalar arrays can be converted to a scalar index

Can you please help me understand where am I going wrong? I would really appreciate any help in this regard. Thanks in advance.

from dython.

shakedzy commented on May 30, 2024

If the code you pasted is exactly what you run, then numcols and catcols are simply lists of strings, not the columns of the data.. you didn't extract the actual columns..
Also, you can't pass two columns categories and measurements, each hold only one column. Please read the function's documentation.

from dython.

madhur0006 commented on May 30, 2024

Thanks for the great article ..
I getting below error while using this code..

REPLACE = 'replace'
DROP = 'drop'
DROP_SAMPLES = 'drop_samples'
DROP_FEATURES = 'drop_features'
SKIP = 'skip'
DEFAULT_REPLACE_VALUE = 0.0

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = np.sqrt(numerator/denominator)
return eta

correlation_ratio(car_sales_cat.columns,car_sales_num.columns)

error:
TypeError :
TypeError: unsupported operand type(s) for /: 'str' and 'int'

from dython.

madhur0006 commented on May 30, 2024

please help

from dython.

shakedzy commented on May 30, 2024

where's the data you're using?? you just pasted my function

from dython.

madhur0006 commented on May 30, 2024

this function of urs m using:

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = np.sqrt(numerator/denominator)
return eta

data
categories: car_sales_cat.columns(categorical column)
measurements: car_sales_num.columns(numerical column)

from dython.

shakedzy commented on May 30, 2024

what is car_sales_cat?? How do you expect me to debug this without the data?

from dython.

madhur0006 commented on May 30, 2024

Car_sales.zip

Hi I m really sorry ,I should attach the file in beginning only, from the attached file I seperated the categorical and numerical data and m trying to pass in function.
like this:
categories: car_sales_cat.columns (categorical column)
measurements: car_sales_num.columns (numerical column)

from dython.

shakedzy commented on May 30, 2024

Dude, you're making this super hard to help you, as I still don't know how you split the data. You might be doing it wrong.
Anyway, if I assume that car_sales_cat and car_sales_num are DataFrames of pandas. That means you're passing the columns names, not the actual data. I answered this exact same thing in the comment right above your question.

from dython.

madhur0006 commented on May 30, 2024

Hi Thanks,yes thats a data frame of pandas with car_sales_cat is having only categorical data,car_sales_num is having only numerical data.
Please suggest a way to pass in the function..
I tried car_sales[car_sales_cat] this also not working...please help as I m new to python...

from dython.

shakedzy commented on May 30, 2024

I answered your question on my last comment:

you're passing the columns names, not the actual data

Refer to the Pandas documentation and DataFrame API.

from dython.

correlation_ratio produces key error in flatten to cat_measures about dython HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent