Code Monkey home page Code Monkey logo

Comments (16)

tpfd avatar tpfd commented on May 30, 2024 1
  1. Ah ok, many thanks!

  2. Yes sorry- that is due to it being the first 10 rows only as example data. All 20,000 rows have 7 classes across them.

Will have a go in light of 1) and report back! Many thanks.

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024 1

Hi dere,
please help on above doubts.

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

Hi, thanks for the feedback! Could you please post the array you're sending? I don't know what df_dython = df_prep.drop(['Bedrock value'], axis = 1) is..

from dython.

tpfd avatar tpfd commented on May 30, 2024

Apologies, I should have attached that from the start. That drop is just me removing the pre-encoded data from the df as your function does that its self. Example data from the array as input attached.

example_df.zip

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

Hi, two things:

  1. You're passing several columns to correlation_ratio as the second argument, which is not how you use it. Check the function's documentation - you should be passing A sequence of continuous measurements. I see that there could be a confusion, as I accidentally write that you can pass a DataFrame - I fixed that in the documentation, so thanks.
  2. The Bedrock column only has a single value in it (at least the in the data you uploaded), so there's no real meaning to a Correlation Ratio, as there is only one class
    Perhaps you could elaborate on what exactly is it you're trying to achieve, and I can try and guide you through

from dython.

henanksha avatar henanksha commented on May 30, 2024

Hi, forstly, thank you so much for the brilliant article! I cant tell you how helpful your article has been with regards to clarifying my doubts on correlations. Appreciate all your work.

I am facing an issue while implementing the Correlation Ratio.
~ Dataset: Kaggle's Titanic Train dataset (https://www.kaggle.com/startupsci/titanic-data-science-solutions/data)
~ Aim: Calculate Correlation Ratio between 2 Categorical features ('Survived' & 'Gender') and 2 continous features ('Fare' & 'Age')
~ Code I tried:
numcols = ['Fare','Age']
catcols = ['Survived','Sex']

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = numerator/denominator
return eta

correlation_ratio(catcols, numcols)
~ Error in line: cat_measures = measurements[np.argwhere(fcat == i).flatten()]
~ Error: TypeError: only integer scalar arrays can be converted to a scalar index

Can you please help me understand where am I going wrong? I would really appreciate any help in this regard. Thanks in advance.

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

If the code you pasted is exactly what you run, then numcols and catcols are simply lists of strings, not the columns of the data.. you didn't extract the actual columns..
Also, you can't pass two columns categories and measurements, each hold only one column. Please read the function's documentation.

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024

Thanks for the great article ..
I getting below error while using this code..

REPLACE = 'replace'
DROP = 'drop'
DROP_SAMPLES = 'drop_samples'
DROP_FEATURES = 'drop_features'
SKIP = 'skip'
DEFAULT_REPLACE_VALUE = 0.0

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = np.sqrt(numerator/denominator)
return eta

correlation_ratio(car_sales_cat.columns,car_sales_num.columns)

error:
TypeError :
TypeError: unsupported operand type(s) for /: 'str' and 'int'

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024

please help

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

where's the data you're using?? you just pasted my function

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024

this function of urs m using:

def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = np.sqrt(numerator/denominator)
return eta

data
categories: car_sales_cat.columns(categorical column)
measurements: car_sales_num.columns(numerical column)

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

what is car_sales_cat?? How do you expect me to debug this without the data?

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024

Car_sales.zip

Hi I m really sorry ,I should attach the file in beginning only, from the attached file I seperated the categorical and numerical data and m trying to pass in function.
like this:
categories: car_sales_cat.columns (categorical column)
measurements: car_sales_num.columns (numerical column)

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

Dude, you're making this super hard to help you, as I still don't know how you split the data. You might be doing it wrong.
Anyway, if I assume that car_sales_cat and car_sales_num are DataFrames of pandas. That means you're passing the columns names, not the actual data. I answered this exact same thing in the comment right above your question.

from dython.

madhur0006 avatar madhur0006 commented on May 30, 2024

Hi Thanks,yes thats a data frame of pandas with car_sales_cat is having only categorical data,car_sales_num is having only numerical data.
Please suggest a way to pass in the function..
I tried car_sales[car_sales_cat] this also not working...please help as I m new to python...

from dython.

shakedzy avatar shakedzy commented on May 30, 2024

I answered your question on my last comment:

you're passing the columns names, not the actual data

Refer to the Pandas documentation and DataFrame API.

from dython.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.