/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/smogn/phi.py:81: Run

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

+1 Same issue here. Issue <a class="issue-link js-issue-link" data-error-text="Failed

redefine phi relevance function: all points are 0,about nickkunz/smogn

gauraviiita commented on June 3, 2024 8

I was facing the same problem but by reducing the value of rel_coef = 0.50 or less than 0.50 helped me.
Thank you nick.

from smogn.

devGichanLee commented on June 3, 2024 7

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.

Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

I checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.

from smogn.

jruots commented on June 3, 2024 4

I'm experiencing the same issue as well.
With rel_method = 'auto' I have not, for the life of me, managed to overcome the "all points are 0" issue. What's even weirder is that with a subset of my dataset (100 rows) this has worked fine, but with the original data set (500k rows) I get this issue. I've painstakingly checked that the subset is a good representation of the original data set, but can't spot the issue.

With rel_method = 'manual' I've had better luck, but it's still not great. The array for rel_ctrl_pts_rg becomes huge with a large dataset, because you need to define a lot of values that you are interested in oversampling and a lot of values you are interested in undersampling. This then makes smogn.smoter() very slow. With the 500k rows of data I'm looking at about 36 hours to complete the operation. With the manual method it would be nice if you could still set a simple threshold, e.g. assign relevance of 1 to all values equal to or greater than 5 and relevance 0 to all values less than 5.

Of course, there is the possibility that regardless of whether rel_method is 'auto' or 'manual', that for large datasets smogn.smoter() will be very slow. It would be nice to be able to confirm this though by trying both methods for rel_method.

from smogn.

nickkunz commented on June 3, 2024 2

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.

Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

from smogn.

Bahar1978 commented on June 3, 2024 2

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.
Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

I checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.

Hi, could you please let me know how did you calculated the rel_coef ?

from smogn.

SafetyMary commented on June 3, 2024 2

Similar issue here. "redefine phi relevance function: all points are 1"

I have dig into the code and found the problem.

TLDR: update line 71-81 in box_plot_stats.py to make sure boxplot_stats["stats"] and boxplot_stats["xtrms"] is not an empty array

The root cause seems to be smogn.box_plot_stats() not generating a valid dictionary about the distribution of y. A valid dictionary should contain 'stats' and 'xtrms'. "all points are 1" error is due to empty 'xtrms' array and "all points are 0" error is due to empty 'stats' array. This in turn leads to smogn.phi_ctrl_pts() not generating a valid phi_params dictionary (missing under-sample or over-sample ctrl_pts) and thus the relevance function phi will not be valid.

from smogn.

parisaazimaee commented on June 3, 2024 1

@nickkunz hi Nick, i am encountering the same issue and i was wondering how you did specify the range of values to over-sample and under-sample in the example

from smogn.

Bahar1978 commented on June 3, 2024 1

Hello, thanks for SMOGN. Unfortunately I have the same issue. Could you please guide us how should we solve it.

from smogn.

dptrsa-300 commented on June 3, 2024 1

+1 Same issue here. Issue #13 is also a duplicate of this.

from smogn.

faridelya commented on June 3, 2024 1

i am facing the issue : every thing is same as you mention in example of this library please help us out. my file is

#12 KeyError Traceback (most recent call last)
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:

C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'VOLUME'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_12836\1609191792.py in
3
4 data= df,
----> 5 y ="VOLUME"
6 )

C:\Anaconda\envs\datasynthetic\lib\site-packages\smogn\smoter.py in smoter(data, y, k, pert, samp_method, under_samp, drop_na_col, drop_na_row, replace, rel_thres, rel_method, rel_xtrm_type, rel_coef, rel_ctrl_pts_rg)
135
136 ## determine column position for response variable y
--> 137 y_col = data.columns.get_loc(y)
138
139 ## move response variable y to last column

C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'VOLUME'

from smogn.

Saumyadav commented on June 3, 2024 1

Issue resolved. Using
"## conduct smogn
train_smogn =smogn.smoter(data=t_data.reset_index(drop=True), y="Labels")"
resetting its index

from smogn.

Saumyadav commented on June 3, 2024

I am getting error "ValueError: redefine phi relevance function: all points are 1"
I made changes in rel_coef, not working for me. Is there any specific method to find rel_coef value?

from smogn.

imprasukjain commented on June 3, 2024

Do you happen to have any updates regarding this error? How do I resolve this as I am also getting the same error.......

from smogn.

redefine phi relevance function: all points are 0 about smogn HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent