Code Monkey home page Code Monkey logo

Comments (13)

gauraviiita avatar gauraviiita commented on June 3, 2024 8

I was facing the same problem but by reducing the value of rel_coef = 0.50 or less than 0.50 helped me.
Thank you nick.

from smogn.

devGichanLee avatar devGichanLee commented on June 3, 2024 7

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.

Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

I checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.

from smogn.

jruots avatar jruots commented on June 3, 2024 4

I'm experiencing the same issue as well.
With rel_method = 'auto' I have not, for the life of me, managed to overcome the "all points are 0" issue. What's even weirder is that with a subset of my dataset (100 rows) this has worked fine, but with the original data set (500k rows) I get this issue. I've painstakingly checked that the subset is a good representation of the original data set, but can't spot the issue.

With rel_method = 'manual' I've had better luck, but it's still not great. The array for rel_ctrl_pts_rg becomes huge with a large dataset, because you need to define a lot of values that you are interested in oversampling and a lot of values you are interested in undersampling. This then makes smogn.smoter() very slow. With the 500k rows of data I'm looking at about 36 hours to complete the operation. With the manual method it would be nice if you could still set a simple threshold, e.g. assign relevance of 1 to all values equal to or greater than 5 and relevance 0 to all values less than 5.

Of course, there is the possibility that regardless of whether rel_method is 'auto' or 'manual', that for large datasets smogn.smoter() will be very slow. It would be nice to be able to confirm this though by trying both methods for rel_method.

from smogn.

nickkunz avatar nickkunz commented on June 3, 2024 2

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.

Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

from smogn.

Bahar1978 avatar Bahar1978 commented on June 3, 2024 2

Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.
Please consider either reducing the rel_coef argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb

I checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.

Hi, could you please let me know how did you calculated the rel_coef ?

from smogn.

SafetyMary avatar SafetyMary commented on June 3, 2024 2

Similar issue here. "redefine phi relevance function: all points are 1"

I have dig into the code and found the problem.

TLDR: update line 71-81 in box_plot_stats.py to make sure boxplot_stats["stats"] and boxplot_stats["xtrms"] is not an empty array

The root cause seems to be smogn.box_plot_stats() not generating a valid dictionary about the distribution of y. A valid dictionary should contain 'stats' and 'xtrms'. "all points are 1" error is due to empty 'xtrms' array and "all points are 0" error is due to empty 'stats' array. This in turn leads to smogn.phi_ctrl_pts() not generating a valid phi_params dictionary (missing under-sample or over-sample ctrl_pts) and thus the relevance function phi will not be valid.

from smogn.

parisaazimaee avatar parisaazimaee commented on June 3, 2024 1

@nickkunz hi Nick, i am encountering the same issue and i was wondering how you did specify the range of values to over-sample and under-sample in the example

from smogn.

Bahar1978 avatar Bahar1978 commented on June 3, 2024 1

Hello, thanks for SMOGN. Unfortunately I have the same issue. Could you please guide us how should we solve it.

from smogn.

dptrsa-300 avatar dptrsa-300 commented on June 3, 2024 1

+1 Same issue here. Issue #13 is also a duplicate of this.

from smogn.

faridelya avatar faridelya commented on June 3, 2024 1

i am facing the issue : every thing is same as you mention in example of this library please help us out. my file is
image

#12 KeyError Traceback (most recent call last)
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:

C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'VOLUME'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_12836\1609191792.py in
3
4 data= df,
----> 5 y ="VOLUME"
6 )

C:\Anaconda\envs\datasynthetic\lib\site-packages\smogn\smoter.py in smoter(data, y, k, pert, samp_method, under_samp, drop_na_col, drop_na_row, replace, rel_thres, rel_method, rel_xtrm_type, rel_coef, rel_ctrl_pts_rg)
135
136 ## determine column position for response variable y
--> 137 y_col = data.columns.get_loc(y)
138
139 ## move response variable y to last column

C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'VOLUME'

from smogn.

Saumyadav avatar Saumyadav commented on June 3, 2024 1

Issue resolved. Using
"## conduct smogn
train_smogn =smogn.smoter(data=t_data.reset_index(drop=True), y="Labels")"
resetting its index

from smogn.

Saumyadav avatar Saumyadav commented on June 3, 2024

I am getting error "ValueError: redefine phi relevance function: all points are 1"
I made changes in rel_coef, not working for me. Is there any specific method to find rel_coef value?

from smogn.

imprasukjain avatar imprasukjain commented on June 3, 2024

Do you happen to have any updates regarding this error? How do I resolve this as I am also getting the same error.......

from smogn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.