Comments (7)
@shakedzy Yeah, read my comment above yours :p.
from dython.
Any chance you can paste here the data and code you used that caused this? I thought I tested it, but perhaps I missed something..
from dython.
The minimal example would be
pd.crosstab(['test1', 'test2'], ['somethignelse', 'somethingelse2'])
which already returns the error. And both replace functions always return lists.
The code I was using used the call associations(real, cat_cols=cat_cols)
, for which real is a dataframe from this file: https://github.com/Baukebrenninkmeijer/Table_Evaluator/blob/master/data/real_test_sample.csv
The traceback is this:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-eb6b6d47f27c> in <module>
----> 1 assoc = associations(real, cat_cols=cat_cols)
~\Projects\TableEvaluator\table_evaluator\helpers.py in associations(dataset, cat_cols, mark_columns, theil_u, plot, return_results, **kwargs)
316 corr[columns[i]][columns[j]] = theils_u(dataset[columns[j]], dataset[columns[i]])
317 else:
--> 318 cell = cramers_v(dataset[columns[i]], dataset[columns[j]])
319 corr[columns[i]][columns[j]] = cell
320 corr[columns[j]][columns[i]] = cell
~\AppData\Local\Continuum\anaconda3\lib\site-packages\dython\nominal.py in cramers_v(x, y, nan_strategy, nan_replace_value)
78 elif nan_strategy == DROP:
79 x, y = remove_incomplete_samples(x, y)
---> 80 confusion_matrix = pd.crosstab(x,y)
81 chi2 = ss.chi2_contingency(confusion_matrix)[0]
82 n = confusion_matrix.sum().sum()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
509
510 from pandas import DataFrame
--> 511 df = DataFrame(data, index=common_idx)
512 if values is None:
513 df['__dummy__'] = 0
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
390 dtype=dtype, copy=copy)
391 elif isinstance(data, dict):
--> 392 mgr = init_dict(data, index, columns, dtype=dtype)
393 elif isinstance(data, ma.MaskedArray):
394 import numpy.ma.mrecords as mrecords
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
210 arrays = [data[k] for k in keys]
211
--> 212 return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
213
214
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
49 # figure out the index, if necessary
50 if index is None:
---> 51 index = extract_index(arrays)
52 else:
53 index = ensure_index(index)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
306
307 if not indexes and not raw_lengths:
--> 308 raise ValueError('If using all scalar values, you must pass'
309 ' an index')
310
ValueError: If using all scalar values, you must pass an index
from dython.
FYI, this used to work in my tests as well. So it might be a pandas API change or something.
from dython.
I'll dive into this. thanks!
from dython.
So I did a bit more research. Since you pass the nan_strategy=SKIP by default to cramers_v (and others) in associations, it doesn't currently triggers in your code. However, I was using an adapted older version of associations that didn't have this. I'll update my code which should make it work.
However, I still think this should not work as intended in your code, since the output of the other nan strategies always results in a list, which crosstab cannot handle. Do you test for these specific cases? (Cramers V on two categorical lists with strategy other than SKIP)
from dython.
I ran the code with your CSV and no error occurred:
import pandas as pd
from dython.nominal import associations
df = pd.read_csv("data.csv", header=0)
associations(df, nominal_columns=["trans_type","trans_operation","trans_k_symbol"])
Also, I re-checked Pandas crosstab
documentation and this function accepts lists too, so that's not the issue as well. So I guess the issue is with your version of the code..
from dython.
Related Issues (20)
- FAILED tests/test_nominal/test_associations.py::test_datetime_data - AssertionError: datetime associations are incorrect. HOT 6
- TypeError: associations() got an unexpected keyword argument 'theil_u' HOT 1
- No heatmap shown HOT 2
- Add option to drop nan values in each pair of columns independently
- Use Black for code formatting
- (docs) documentation for `nominal` module not updated on website HOT 2
- Allow re-plotting of associations heat-map HOT 1
- Run tests per each major Python version HOT 2
- Pandas must be limited to <1.5.0 HOT 4
- dython.nominal.associations handling fillna with dtype="category" HOT 3
- Issue with plotting heatmap using Dython associations HOT 2
- Cramer vs. Theil HOT 2
- ks_abc when run with plot=False still plots the graph HOT 13
- TypeError Traceback (most recent call last) HOT 1
- assotications function from pip or conda does not have multiprocessing or max_cpu_core ?? HOT 2
- associations function's nan_strategy not working?? HOT 2
- ks_abc when run with plot=False still plots the graph HOT 1
- Add type hints to functions
- speed
- Add official support for Python 3.12 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dython.