Comments (13)
I was facing the same problem but by reducing the value of rel_coef = 0.50 or less than 0.50 helped me.
Thank you nick.
from smogn.
Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.
Please consider either reducing the
rel_coef
argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb
I checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.
from smogn.
I'm experiencing the same issue as well.
With rel_method = 'auto' I have not, for the life of me, managed to overcome the "all points are 0" issue. What's even weirder is that with a subset of my dataset (100 rows) this has worked fine, but with the original data set (500k rows) I get this issue. I've painstakingly checked that the subset is a good representation of the original data set, but can't spot the issue.
With rel_method = 'manual' I've had better luck, but it's still not great. The array for rel_ctrl_pts_rg becomes huge with a large dataset, because you need to define a lot of values that you are interested in oversampling and a lot of values you are interested in undersampling. This then makes smogn.smoter() very slow. With the 500k rows of data I'm looking at about 36 hours to complete the operation. With the manual method it would be nice if you could still set a simple threshold, e.g. assign relevance of 1 to all values equal to or greater than 5 and relevance 0 to all values less than 5.
Of course, there is the possibility that regardless of whether rel_method is 'auto' or 'manual', that for large datasets smogn.smoter() will be very slow. It would be nice to be able to confirm this though by trying both methods for rel_method.
from smogn.
Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.
Please consider either reducing the rel_coef
argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynb
from smogn.
Thank you for using this Python implementation of SMOGN. I apologize for the delay. It appears that perhaps the distribution of your y response variable does not contain box plot extremes in order for the Φ function to automatically determine which range of values to over-sample.
Please consider either reducing therel_coef
argument's default value or manually specifying the range of values to over-sample and under-sample, as exhibited here: https://github.com/nickkunz/smogn/blob/master/examples/smogn_example_3_adv.ipynbI checked and re-define rel_coef and rg_mtrx, but it doesn't work. I saw so many issues opened about this. is there any update plan for this issue? thx.
Hi, could you please let me know how did you calculated the rel_coef ?
from smogn.
Similar issue here. "redefine phi relevance function: all points are 1"
I have dig into the code and found the problem.
TLDR: update line 71-81 in box_plot_stats.py to make sure boxplot_stats["stats"] and boxplot_stats["xtrms"] is not an empty array
The root cause seems to be smogn.box_plot_stats() not generating a valid dictionary about the distribution of y. A valid dictionary should contain 'stats' and 'xtrms'. "all points are 1" error is due to empty 'xtrms' array and "all points are 0" error is due to empty 'stats' array. This in turn leads to smogn.phi_ctrl_pts() not generating a valid phi_params dictionary (missing under-sample or over-sample ctrl_pts) and thus the relevance function phi will not be valid.
from smogn.
@nickkunz hi Nick, i am encountering the same issue and i was wondering how you did specify the range of values to over-sample and under-sample in the example
from smogn.
Hello, thanks for SMOGN. Unfortunately I have the same issue. Could you please guide us how should we solve it.
from smogn.
+1 Same issue here. Issue #13 is also a duplicate of this.
from smogn.
i am facing the issue : every thing is same as you mention in example of this library please help us out. my file is
#12 KeyError Traceback (most recent call last)
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'VOLUME'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_12836\1609191792.py in
3
4 data= df,
----> 5 y ="VOLUME"
6 )
C:\Anaconda\envs\datasynthetic\lib\site-packages\smogn\smoter.py in smoter(data, y, k, pert, samp_method, under_samp, drop_na_col, drop_na_row, replace, rel_thres, rel_method, rel_xtrm_type, rel_coef, rel_ctrl_pts_rg)
135
136 ## determine column position for response variable y
--> 137 y_col = data.columns.get_loc(y)
138
139 ## move response variable y to last column
C:\Anaconda\envs\datasynthetic\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'VOLUME'
from smogn.
Issue resolved. Using
"## conduct smogn
train_smogn =smogn.smoter(data=t_data.reset_index(drop=True), y="Labels")"
resetting its index
from smogn.
I am getting error "ValueError: redefine phi relevance function: all points are 1"
I made changes in rel_coef, not working for me. Is there any specific method to find rel_coef value?
from smogn.
Do you happen to have any updates regarding this error? How do I resolve this as I am also getting the same error.......
from smogn.
Related Issues (20)
- Using Smogn only reducing number of observations
- IndexError: positional indexers are out-of-bounds HOT 1
- Take input as numpy arrays HOT 2
- SMOGN with `under_samp`=False fails to return original data
- some features are missing after resampling
- Cuda availability HOT 2
- Could you explain what exactly is the `rel_coef` argument? HOT 2
- How to specify resampling range? HOT 2
- Reducing verboseness HOT 2
- Handling categorical features
- Error during running advanced ex3
- SMOGN is creating a new class for target HOT 2
- Resampling with label uniformity and user uniformity
- Hyperparameter optimization
- Reproduceability of smoter HOT 1
- The possibility of applying this method in the field of images HOT 1
- Over-sampling HOT 1
- Binary label
- Documentation on the relevance value matrix HOT 3
- IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smogn.