quantipy / quantipy3 Goto Github PK

Python 3 version of Quantipy

License: MIT License

Python 97.63% Shell 0.01% HTML 0.83% C 0.62% Scheme 0.11% Visual Basic .NET 0.80% Cython 0.02%

quantipy3's Introduction

Quantipy3

Python for people data

Quantipy is an open-source data processing, analysis and reporting software project that builds on the excellent pandas and numpy libraries. Aimed at people data, Quantipy offers support for native handling of special data types like multiple choice variables, statistical analysis using case or observation weights, DataFrame metadata and pretty data exports.

Quantipy for Python 3

This repository is a port of Quantipy from Python 2.x to Python 3.

Key features

Reads plain .csv, converts from Dimensions, SPSS, Decipher, or Ascribe
Open metadata format to describe and manage datasets
Powerful, metadata-driven cleaning, editing, recoding and transformation of datasets
Computation and assessment of data weights
Easy-to-use analysis interface

Features not yet supported in Python 3 version

Structured analysis and reporting via Chain and Cluster containers
Exports to SPSS, Dimensions ddf/mdd, MS Excel and Powerpoint with flexible layouts and various options
Python 3.8 is not yet fully supported, but 3.5, 3.6, and 3.7 are.

Origins

Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com

Contributors

Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk
Datasmoothie’s Birgir Hrafn Sigurðsson and Geir Freysson: http://www.datasmoothie.com

Installation

pip install quantipy3

python3 -m pip install quantipy3

Note that the package is called quantipy3 on pip.

Create a virtual envirionment

If you want to create a virtual environment when using Quantipy:

conda

conda create -n envqp python=3

with venv

python -m venv [your_env_name]

5-minutes to Quantipy

Get started

If you are working with SPSS, import your sav file.

import quantipy as qp
dataset = qp.DataSet("My dataset, wave 1")
dataset.read_spss('my_file.sav')

You can start straight away by exploring what variables are in your file.

dataset.variables()

['gender',
 'agecat',
 'price_satisfaction',
 'numitems_satisfaction',
 'org_satisfaction',
 'service_satisfaction',
 'quality_satisfaction',
 'overall_satisfaction',
 'weight']

If you want more details on a variable, explore it's meta data.

dataset.meta('agecat')

single	codes	texts	missing
agecat: Age category
1	1	18-24	None
2	2	25-34	None
3	3	35-49	None
4	4	50-64	None
5	5	64+	None

Quantipy knows out-of-the-box what SPSS's meta data means and uses it correctly. All codes and labels are the same as in the sav file.

Calculate some results, counts or percentages

dataset.crosstab('price_satisfaction', 'gender')

	Question	agecat. Age category
	Values	All	18-24	25-34	35-49	50-64	64+
Question	Values
price_satisfaction. Price satisfaction	All	582.0	46.0	127.0	230.0	147.0	32.0
	Strongly Negative	72.0	8.0	20.0	22.0	17.0	5.0
	Somewhat Negative	135.0	10.0	30.0	52.0	38.0	5.0
	Neutral	140.0	9.0	32.0	59.0	36.0	4.0
	Somewhat Positive	145.0	12.0	25.0	63.0	33.0	12.0
	Strongly Positive	90.0	7.0	20.0	34.0	23.0	6.0

You can also filter

dataset.crosstab('price_satisfaction', 'agecat', f={'gender':1})

and use a weight column

dataset.crosstab('price_satisfaction', 'agecat', f={'gender':1}, w="weight")

Variables can be created, recoded or edited with DataSet methods, e.g. derive():

mapper = [(1,  '18-35 year old', {'agecat': [1,2]}),
          (2, '36 and older', {'agecat': [3,4,5]})]

dataset.derive('two_age_groups', 'single', dataset.text("Older or younger than 35"), mapper)
dataset.meta('two_age_groups')

single                                              codes     texts              missing
two_age_groups: "Older or youngar than 35"
1                                                       1     18-35 years old    None
2                                                       2     36 and older       None

The DataSet case data component can be inspected with the []-indexer, as known from a pd.DataFrame:

dataset[['gender', 'age']].head(5)

        gender  age
0       1.0    1.0
1       2.0    1.0
2       2.0    2.0
3       1.0    NaN
4       NaN    1.0

Weighting

If your data hasn't been weighted yet, you can use Quantipy's RIM weighting algorithm.

Assuming we have the same variables as before, gender and agecat we can weight the dataset with these two variables:

from quantipy.core.weights.rim import Rim

age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}
gender_targets = {'gender':{0:49, 1:51}}
scheme = Rim('gender_and_age')
scheme.set_targets(targets=[age_targets, gender_targets])
dataset.weight(scheme,unique_key='respondentId',
               weight_name="my_weight",
               inplace=True)

Quantipy will show you a weighting report:

Weight variable       weights_gender_and_age
Weight group                  _default_name_
Weight filter                           None
Total: unweighted                 582.000000
Total: weighted                   582.000000
Weighting efficiency               60.009826
Iterations required                14.000000
Mean weight factor                  1.000000
Minimum weight factor               0.465818
Maximum weight factor               6.187700
Weight factor ratio                13.283522

And you can test whether the weighting has worked by running crosstabs:

dataset.crosstab('agecat', ci=['c%'], w='my_new_weight')

	Question	agecat. Age category
Question	Values
agecat. Age category	All	100.0
	18-24	5.0
	25-34	30.0
	35-49	26.0
	50-64	19.0
	64+	20.0

dataset.crosstab('gender', ci=['c%'], w='my_new_weight')

	Question	gender. Gender
Question	Values
gender. Gender	All	100.0
	Male	49.0
	Female	51.0

Contributing

The test suite for Quantipy can be run with the command

python3 -m pytest tests

But when developing a specific aspect of Quantipy, it might be quicker to run (e.g. for the DataSet)

python3 -m unittest tests.test_dataset

Tests for unsupported features are skipped, see here for what tests are supported.

We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.

quantipy3's People

Contributors

Stargazers

Watchers

quantipy3's Issues

'IndexError: positional indexers are out-of-bounds' after cloning and filtering DataSet object

I'm trying to loop through different values for s

        keep = {'S1': [s]}
        copy_ds.filter(alias=name, condition=keep, inplace=True)

but after doing this, I am getting the following error report.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2129         try:
-> 2130             return self.obj.take(key, axis=axis)
   2131         except IndexError:

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
   3603         new_data = self._data.take(
-> 3604             indices, axis=self._get_block_manager_axis(axis), verify=True
   3605         )

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in take(self, indexer, axis, verify, convert)
   1388         if convert:
-> 1389             indexer = maybe_convert_indices(indexer, n)
   1390 

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexers.py in maybe_convert_indices(indices, n)
    200     if mask.any():
--> 201         raise IndexError("indices are out-of-bounds")
    202     return indices

IndexError: indices are out-of-bounds
IndexError                                Traceback (most recent call last)
<ipython-input-14-0085af4bc4bf> in <module>
     25         # loop questions
     26         if (c in qtype_dict['with_m']) or (c in qtype_dict['only_m']):
---> 27             t_cnt=copy_ds.crosstab(c,stats=True)
     28             t_pct=copy_ds.crosstab(c,ci='c%',stats=True)
     29 

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-312> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _to_list(func, *args, **kwargs)
    167                 args = tuple(a if not x == val_index else v
    168                              for x, a in enumerate(args))
--> 169         return func(*args, **kwargs)
    170 
    171     if to_list:

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-311> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _deco(func, *args, **kwargs)
    141             if arg is None: continue
    142             func = dec(func)
--> 143         return func(*args, **kwargs)
    144 
    145     if categorical and not isinstance(categorical, list): categorical = [categorical]

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-4920> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _var_in_ds(func, *args, **kwargs)
     41             var = kwargs.get(variable, args[v_index])
     42             if var is None:
---> 43                 return func(*args, **kwargs)
     44             if not isinstance(var, list):
     45                 var = [var]

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/dataset.py in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)
   1904         else:
   1905             idx = self.take(f)
-> 1906         data = self._data.copy().iloc[idx]
   1907         stack = qp.Stack(name='ct', add_data={'ct': (data, self._meta)})
   1908         if xtotal or not y:

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2146         # a list of integers
   2147         elif is_list_like_indexer(key):
-> 2148             return self._get_list_axis(key, axis=axis)
   2149 
   2150         # a single integer

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2131         except IndexError:
   2132             # re-raise with different error message
-> 2133             raise IndexError("positional indexers are out-of-bounds")
   2134 
   2135     def _getitem_axis(self, key, axis: int):

IndexError: positional indexers are out-of-bounds

Thank you so much in advance for helping me troubleshoot this!

dataset.tabulate() - new method based on dataset.crosstab()

We need a new function, that calls dataset.crosstab(), which is friendlier to use that the crosstab function.

Currently there are only two display options for the result of dataset.crosstab(). Users can select pct=True/False (see screenshots below).

DataSet.tabulate() will have the following parameters:

Passed directly to crosstab

x
y
w (weight)
f (filter)

Additional parameters:

show - string ('all') or list of strings ('count', 'base', 'pct', 'ubase', 'ucount') for count, base percentages, unweighted base and unweighted count (only valid if the w parameter is supplied)
include_varname - boolean that indicates whether variables should have the varialbe name in front of them, if it is true the columns are shown as they are in the screenshot below, if they are false "gender. Gender" will become "Gender", i.e. the variable name is only shown if the variable has no labe.

In Dimensions/reader.py levels.ix is deprecated

levels.ix call in ddf_to_pandas, is deprecated and does not work for Python 3.7.
This should be replaced by levels.loc

Line:
new_table_name = levels.ix[table_name,'DSCTableName']
Should be replaced with:
new_table_name = levels.loc[table_name,'DSCTableName']

reader.zip

data key ct: x: [___] not found. But column exists in both data and meta

Hi I would love to understand how I can resolve this issue - I have inspected both the data object and the meta object, and the column in question does exist in both, when when using ds.crosstab(), quantipy throws an error saying the data key is not found. Thank you!

Some non-english characters in open ended questions cause SPSS read failure

When the data itself contains characters, for example Japanese, this causes the SPSS reader to fail, even if the SPSS file is UTF-8 encoded.

Installation

I followed the instructions for installing but it fails every time. I was able to get all dependencies loaded and can import the Quantipy3 library, but there doesn't seem to be any methods associated with it. Just an empty module. Any help would be much appreciated! I am working on Windows 10.

Allow package to be installable via pip

Error while installing quantipy

@geirfreysson @KushagraPan
I am having this error while installing quantipy in python 2.7 environment.. kindly help

ERROR: Could not find a version that satisfies the requirement ftfy==5.5.1 (from quantipy==0.2.1->-r requirements_dev.txt (line 4)) (from versions: 1.0, 2.0, 2.0.1, 2.0.2, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.3.0, 3.4.0, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.3.1, 4.4, 4.4.1, 4.4.2, 4.4.3, 5.0, 5.0.1, 5.0.2, 5.1, 5.1.1, 5.2.0, 5.3.0)
ERROR: No matching distribution found for ftfy==5.5.1 (from quantipy==0.2.1->-r requirements_dev.txt (line 4))

significance testing results different than SPSS

In comparing results from SPSS and Quantipy, we discovered that using the dataset.crosstab() function gives different significance results for comparing categorical distribution than SPSS. I see in the sandbox.py document that chisq is used for calculating significance, which should give us the same results. Could this be because Quantipy does not recognize multivariates that are ordinal (Likert scale)?

Windows Compatibility?

Hi Developers,

I have been trying to give a shot at quantipy3, but something is not working for me. Based on Geir's post on towardsdatascience I see he is running this on mac.
If you have tried this on Windows can you tell me the exact version of anaconda, python, and windows that you have tried this build on?
I get the following error while executing "dataset.read_spss('SelectOnlyUK.sav',ioLocale=None)"
"TypeError: LoadLibrary() argument 1 must be str, not None." Here is the full stack trace
"Traceback (most recent call last):
File "test.py", line 13, in
dataset.read_spss('SelectOnlyUK.sav',ioLocale=None)
File "C:\Users\Ari\quantipy3\quantipy\core\dataset.py", line 601, in read_spss
self._meta, self.data = r_spss(path_sav+'.sav', **kwargs)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\io.py", line 328, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 36, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 44, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "C:\Users\Ari\quantipy3\savReaderWriter\savReader.py", line 57, in init
ioUtf8, ioLocale)
File "C:\Users\Ari\quantipy3\savReaderWriter\header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "C:\Users\Ari\quantipy3\savReaderWriter\generic.py", line 30, in init
self.libc = cdll.LoadLibrary(ctypes.util.find_library("c"))
File "C:\ProgramData\Anaconda3\envs\envqp\lib\ctypes_init.py", line 442, in LoadLibrary
return self.dlltype(name)
File "C:\ProgramData\Anaconda3\envs\envqp\lib\ctypes_init.py", line 364, in init
self._handle = _dlopen(self._name, mode)
TypeError: LoadLibrary() argument 1 must be str, not None"

If I try to load mdd/ddf file with the following line "meta, data = read_dimensions('S19029641.mdd', 'S19029641.ddf')"
I get the following error "Traceback (most recent call last):
File "AA.py", line 13, in
meta, data = read_dimensions('S19029641.mdd', 'S19029641.ddf')
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\io.py", line 297, in read_dimensions
meta, data = quantipy_from_dimensions(path_mdd, path_ddf)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 992, in quantipy_from_dimensions
meta, ddf = mdd_to_quantipy(path_mdd, data=L1)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 774, in mdd_to_quantipy
meta, columns, data = get_columns_meta(xml, meta, data, map_values=True)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 737, in get_columns_meta
xml, column, data, map_values
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 414, in get_meta_values
values = [int(v) for v in byProperty_values]
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 414, in
values = [int(v) for v in byProperty_values]
ValueError: invalid literal for int() with base 10: 'UKC11'"

Can you give me a hint about what I do wrong? I have re-installed anaconda several times and followed the readme me closely. I have also tried different versions of anaconda as the spss reading error seems to happen because the python cannot find some libraries. But I had no luck.
Thanks!

Issue loading in SPSS file

Hi,
I'm having issues loading in an SPSS file using quanitpy. I have quantipy running on python 3.6 and have already tested that I can load the file into notebook using pyreadstat. Not sure why it is erroring here. Any help would be much appreciated!

Code:
dataset = qp.DataSet("test_dataset")
dataset.read_spss('Softlaunch data_CLIENT.sav')

Error:

TypeError Traceback (most recent call last)
in
1 dataset = qp.DataSet("test_dataset")
----> 2 dataset.read_spss('Softlaunch data_CLIENT.sav')

~\Anaconda3\lib\site-packages\quantipy\core\dataset.py in read_spss(self, path_sav, **kwargs)
642 """
643 if path_sav.endswith('.sav'): path_sav = path_sav.replace('.sav', '')
--> 644 self._meta, self._data = r_spss(path_sav+'.sav', **kwargs)
645 self._set_file_info(path_sav)
646 self._rename_blacklist_vars()

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\io.py in read_spss(path_sav, **kwargs)
375 def read_spss(path_sav, **kwargs):
376
--> 377 meta, data = parse_sav_file(path_sav, **kwargs)
378 return meta, data
379

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\spss\reader.py in parse_sav_file(filename, path, name, ioLocale, ioUtf8, dichot, dates_as_strings, text_key)
35 filepath = os.path.abspath(filepath)
36 data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
---> 37 meta, data = extract_sav_meta(filepath, name="", data=data, ioLocale=ioLocale,
38 ioUtf8=ioUtf8, dichot=dichot, dates_as_strings=dates_as_strings,
39 text_key=text_key)

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\spss\reader.py in extract_sav_meta(sav_file, name, data, ioLocale, ioUtf8, dichot, dates_as_strings, text_key)
170 # into data, which is immediately prior to the start of the
171 # dichotomous set columns
--> 172 dls_idx = data.columns.tolist().index(varNames[0])
173 if metadata.multRespDefs[mrset]['setType'] == 'C':
174 # Raise if value object of columns is not equal

TypeError: 'map' object is not subscriptable

Reading SAV file from your repository does not work :(

Hi!

Seems the reading SAV file is the showstopper - did workaround mentioned somewhere:
used pyreadstat and imported dataframe with dataset.from_components(df)

Unfortunately I miss SPSS meta data goodies. BTW is it possible to somehow import pyreadstat META part?

What I get with:
import quantipy as qp
dataset = qp.DataSet("My dataset, wave 1")
dataset.read_spss('"D:/tmp/Example Data (A).sav"')

is an error (using PyCharm and Python 3.7 venv):

Traceback (most recent call last):
File "C:/Users/zbatagelj/PycharmProjects/Valiconer/t.py", line 10, in
dataset.read_spss("D:/tmp/Example Data (B).sav")
File "C:\Users\zbatagelj\quantipy3\quantipy\core\dataset.py", line 601, in read_spss
self._meta, self.data = r_spss(path_sav+'.sav', **kwargs)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\io.py", line 331, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 40, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 48, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\savReader.py", line 60, in init
ioUtf8, ioLocale)
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\generic.py", line 30, in init
self.libc = cdll.LoadLibrary(ctypes.util.find_library("c"))
File "C:\Users\zbatagelj\AppData\Local\Programs\Python\Python37\lib\ctypes_init.py", line 442, in LoadLibrary
return self.dlltype(name)
File "C:\Users\zbatagelj\AppData\Local\Programs\Python\Python37\lib\ctypes_init.py", line 364, in init
self._handle = _dlopen(self._name, mode)
TypeError: LoadLibrary() argument 1 must be str, not None

Not able to install Quantipy in Google Colab/Python 3

@geirfreysson I am having this error when I ran pip install quantipy

"Collecting quantipy
Downloading https://files.pythonhosted.org/packages/5c/cf/4c31a2054e045a553afa3adf5037476fa7c73ae170753b999f6af5a851af/QuantiPy-0.0.0.dev3.tar.gz (287kB)
|████████████████████████████████| 296kB 2.8MB/s
Building wheels for collected packages: quantipy
Building wheel for quantipy (setup.py) ... error
ERROR: Failed building wheel for quantipy
Running setup.py clean for quantipy
Failed to build quantipy
Installing collected packages: quantipy
Running setup.py install for quantipy ... error
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-so92uhnx/quantipy/setup.py'"'"'; file='"'"'/tmp/pip-install-so92uhnx/quantipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-besc6ke4/install-record.txt --single-version-externally-managed --compile Check the logs for full command output."

Kindly help

requirements for pandas and numpy + version number is 0.1.1

The requirements for pandas (pandas==0.25.3) and numpy ( numpy==1.14.5) are really old and affect other elements of my projects. Is there a way to move requirements to the latest version of those packages please?

Can't install quantipy3

Hi,

I install quantipy3 in Anaconda powershell prompt，but pandas and numpy used old version，so fail to install quantipy3，have any solution methods can solve it? thanks!

Variables shouldn't have to have "_" in name to be part of a delimited set

This works:
dataset.to_delimited_set('candy_choice', 'My delimited set', ['candy_1', 'candy_2', 'candy_3'])

While this doesn't
dataset.to_delimited_set('candy_choice', 'My delimited set', ['mars', 'snickers', 'twix'])

We need to fix to_delimited_set so that it doesn't rely on this naming convention.

Error using read_spss

Hello,

Attempted to use quantipy and read in a sav file. Received this error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/dataset.py", line 644, in read_spss
self._meta, self._data = r_spss(path_sav+'.sav', **kwargs)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/io.py", line 377, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/spss/reader.py", line 36, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/spss/reader.py", line 44, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/savReader.py", line 57, in init
ioUtf8, ioLocale)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 35, in init
self.spssio = self.loadLibrary()
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 117, in loadLibrary
spssio = self._loadLibs("macos")
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 89, in _loadLibs
return [load(os.path.join(path, lib)) for lib in libs][-1]
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 89, in
return [load(os.path.join(path, lib)) for lib in libs][-1]
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/spssio/macos/libicuuc48.1.dylib, 6): Library not loaded: @executable_path/../lib/libicudata48.1.dylib
Referenced from: /Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/spssio/macos/libicuuc48.1.dylib
Reason: image not found

My environment specs:

Name	Version	Build Channel
ca-certificates	2021.1.19	hecd8cb5_0
certifi	2020.12.5	py36hecd8cb5_0
chardet	4.0.0	pypi_0 pypi
decorator	4.4.2	pypi_0 pypi
ftfy	5.5.1	pypi_0 pypi
idna	2.10	pypi_0 pypi
importlib-metadata	3.7.0	pypi_0 pypi
libcxx	10.0.0	1
libedit	3.1.20191231	h1de35cc_1
libffi	3.3	hb1e8313_2
lxml	4.6.2	pypi_0 pypi
ncurses	6.2	h0a44026_1
numpy	1.14.5	pypi_0 pypi
openssl	1.1.1j	h9ed2024_0
pandas	0.25.3	pypi_0 pypi
pillow	8.1.1	pypi_0 pypi
pip	21.0.1	py36hecd8cb5_0
prettytable	2.1.0	pypi_0 pypi
python	3.6.13	h88f2d9e_0
python-dateutil	2.8.1	pypi_0 pypi
python-pptx	0.6.18	pypi_0 pypi
pytz	2021.1	pypi_0 pypi
quantipy3	0.2.3	pypi_0 pypi
readline	8.1	h9ed2024_0
requests	2.25.1	pypi_0 pypi
scipy	1.2.1	pypi_0 pypi
setuptools	52.0.0	py36hecd8cb5_0
six	1.15.0	pypi_0 pypi
sqlite	3.33.0	hffcf06c_0
tk	8.6.10	hb0a8c7a_0
typing-extensions	3.7.4.3	pypi_0 pypi
urllib3	1.26.3	pypi_0 pypi
watchdog	2.0.2	pypi_0 pypi
wcwidth	0.2.5	pypi_0 pypi
wheel	0.36.2	pyhd3eb1b0_0
xlsxwriter	1.3.7	pypi_0 pypi
xmltodict	0.12.0	pypi_0 pypi
xz	5.2.5	h1de35cc_0
zipp	3.4.0	pypi_0 pypi
zlib	1.2.11	h1de35cc_3

My code:

import quantipy as qp
dataset = qp.DataSet("Test")
dataset.read_spss('Data.sav')

ds.populate() not populating meta data

As I understand, ds.populate() should populate the stack with meta data. When I use qp.Stack() to create a stack it works fine

but when I try to create a stack by calling ds.populate(),this happens

Any idea on why/how to work around it?

ds.crosstab() doesn't show sig when sig_level<0.01

A bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

Example code:

sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

This would only give the cross-tabulation of counts and not the significant groups

Reading dimensions mdd and ddf fails for 'field' type variable in Dimensions

Current Quantipy logic assumes '.' only exist in grid/loop variables. This understanding is not true for field type variable especially when field may have loop variable in it.

For example following is valid in Dimensions MDD.

MyField "My top level field variable"
Block fields
{
InnnerLoop "Inner Loop variable" loop {_1, _2}
(
innermostvariable "Inner most simple variable"
{
_X "Answer X",
Y "Answer Y"
}
)
}

Above will generate column in data like
MyField .InnnerLoop [_1].innermostvariable
MyField .InnnerLoop [_2].innermostvariable

This variable name will fail 'get_columns_meta()' in turn failing overall reading of Dimensions input.

Option 1: Skip such variable : which is not the best solution but makes rest of the data usable.
Option 2: Rename Variable up front
Option 3: Change current logic to read more information from MDD XML as all required information is available in XML as such.

Dimensionize() and Undimensionze() introduces error when reading DImensions data

What is the purpose of calling undimensionze() and dimensionize()?

Currently I can see Undimensionze mappper creates havoc in the metadata as it does not understand more than one variable may have same iterator name. In attached example plan_order and order both have iterator first, second etc.

Dimensions.zip

Some tests aren't passing or being skipped

Running

python3 -m pytest test

is throwing some errors. All tests should either test or be skipped (and the skipped ones marked as not supported yet).

Excel output example

Hi! Could you please provide an example for creating some simple output (e.g. a crosstab) in Excel?

Thanks!

Reading sav files work, but writing them doesn't

The DataSet method write_spss doesn't work.

Is this the opportunity to switch to pyreadstat instead of savReaderWriter?

Reading CSV

Hey, I have survey data recorded in a CSV(via google sheets) file, don't have a metadata(json) file for the same.
How to go about finding weights using Quantipy?

Only run tests for aspects of Quantipy that are supported in the python3 port

Currently the test suite has a lot of fails, because only part of Quantipy has been ported to Python 3.

We need to remove tests for aspects of Quantipy that aren't yet supported in python 3, so that we have a test suite that passes and is therefore usable.

A list of the tests that are not passing is linked to from the README. These should possibly be moved to another folder so that unittest in "discover" mode doesn't pick them up.

Compound crosstabs supported in quantipy3?

I know this was an open issue in quantipy for py2, but was it resolved for quantipy 3? Does dataset.crosstab() support lists now? So that we can have multiple category breakdowns on either axis?

Here was the issue from quantipy for py2

Quantipy/quantipy#1216

pip install quantipy3 fails on 3.9, numpy problem

Hi,

I'm interested in trying this out, but I am having build problems on MacOS, using Python from PyEnv.

 ~  pyenv version                                       1 ↵  3039  20:52:21
geo3 (set by PYENV_VERSION environment variable)
 ~  python -V                                             ✔  3040  20:52:25
Python 3.9.13
 ~  pip install quantipy3 >> ~/Desktop/error.txt          ✔  3041  20:52:28

error.txt

Can anyone make a suggestion?

Dimensions cannot load the data when subquestions of loops have similar names

When trying to load data from DIM which contains several LOOP questions which contains the same subquestions, the method quantipy_clean(ddf) fails.

As you can see from the screenshot, the parent is not str object, because there are multiple parents for the same element.
I'm not familiar enough with the quantipy, but I believe some recursion must be implemented into this method in order to handle that.

Error.. kindly help please

@geirfreysson Everything was fine till I ran pip install quantipy command mentioned in the instructions when this error came:

Using legacy setup.py install for scipy, since package 'wheel' is not installed.
Using legacy setup.py install for numpy, since package 'wheel' is not installed.
Using legacy setup.py install for pandas, since package 'wheel' is not installed.
Using legacy setup.py install for watchdog, since package 'wheel' is not installed.
Using legacy setup.py install for pathtools, since package 'wheel' is not installed.
Installing collected packages: numpy, scipy, six, python-dateutil, pytz, pandas, wcwidth, ftfy, xmltodict, lxml, xlsxwriter, pillow, prettytable, decorator, pathtools, watchdog, certifi, chardet, idna, urllib3, requests, python-pptx
Running setup.py install for numpy ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\hp[quantipy]\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\setup.py'"'"'; file='"'"'C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\hp\AppData\Local\Temp\pip-record-33wvbjxf\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\hp[quantipy]\include\site\python3.8\numpy'
cwd: C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\

Dimensions reading does not understand recognize multiple context within Dimensions MDD

Reader.py --> quantipy_from_dimensions()

Above method does not consider the possibility of multiple context within mdd and hence ends up duplicate categories. It's happening by reading elements and need additional input of what context to read from.

Support for XPT Files

I realise that XPT files are not supported with the read_spss function. It attempts to add the '.sav' extension as a suffix when a path is sent through. Would be great if we could get functionality that adds xpt support and parsing format / meta data for through a '.SAS' file.

Rim Weighting error: Variable "category" is unsuitable for Weighting.

Hey All,

I am trying to do some rim weighting. Here is my code...

    ##create quantipy dataset from pandas dataframe
    dataset = qp.DataSet(name='example', dimensions_comp=False)
    dataset.from_components(df_srv)

    ##create rim scheme
    scheme = qp.Rim('my_first_scheme')

    #create targets for rim weighting from population dataset
    category_targets = {}
    temp = df_apps['category'].value_counts(normalize=True).reset_index(drop=True)
    temp.index += 1 # I thought the zero-index might be the issue so this starts index from 1
    category_targets['category'] = temp.to_dict()
    scheme.set_targets(targets=category_targets, group_name='basic weights')
    dataset.weight(scheme, weight_name='weights_new', unique_key='cvrp_id') ##cvrp_id is unique

Here are the contents of category_targets:
{'category': {1: 0.619027907888717, 2: 0.34975999070463243, 3: 0.031212101406650546}}

I am getting the following error message:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\dataset.py", line 3025, in weight
    engine.add_scheme(weight_scheme, key=unique_key, verbose=verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\weight_engine.py", line 202, in add_scheme
    scheme._minimize_columns(self._df, key, verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\rim.py", line 311, in _minimize_columns
    self._check_targets(verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\rim.py", line 433, in _check_targets
    raise ValueError(vartype_err.format(self.name, group, target_col))
ValueError: *** Stopping: Scheme "my_first_scheme", group "basic weights" ***
Variable "category" is unsuitable for Weighting.
Target variables must be of type integer (convertable) / single categorical.

Any help would be much appreciated!

Missing label in Dimensions MDD fails reading dimensions

Current Quantipy reader.py->begin_column expect dimensions to always have label. This is not a requirement as per Dimensions MDD spec and could be missing.

    `column['text'] = get_text_dict(xml.xpath(xpath__col_text)[0].getchildren())`

Above xml.xpath will return empty when label is missing and data read will fail.

dimensions file not loading properly

Traceback (most recent call last):
File "F:\Python\data.py", line 5, in
meta, data = qp.read_dimensions('KAP2.mdd','KAP2.ddf')
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\io.py", line 300, in read_dimensions
meta, data = quantipy_from_dimensions(path_mdd, path_ddf)
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\dimensions\reader.py", line 976, in quantipy_from_dimensions
ddf, levels = quantipy_clean(ddf_to_pandas(path_ddf))
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\dimensions\reader.py", line 147, in quantipy_clean
if parent=='None':
File "C:\Program Files\Python37\lib\site-packages\pandas\core\generic.py", line 1555, in nonzero
self.class.name
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

dataset.crosstab() should optinally return prettier output

Currently, the crosstab method always returns a dataframe with multiindex columns and indexes.

We should allow parameters that affect this, although two multiindexes should always be available for reusability in different methods.

I propose adding a pretty=True/False parameter (true by default) that is intelligent and

removes the "@" and replaces with the word "Total"
If only one variable is being calculated, don't show the same info twice as is done in the second screenshot
Removes the variable name so that "price. Price satisfaction" becomes "Price satisfaction"

See below for clarification:

The below image makes sense and should not change (although the variable name ahead of the label should disappear when pretty=True, so that "gender. Gender" becomes "Gender")

The below image does not make sense. If there is only one variable being shown, we don't need to show it's name in the columns index. So the top level column index should be dropped and "@" replaced with "Total"

new crosstab() function doesn't show significance if arg ci='%c'

First of all, fantastic that crosstab() now allows adding of significance view. However, when we assign '%c' to the argument ci, specifying sig_level no longer gives us the sig view. Also, when ci='%c' and xtotal=True, shouldn't the base now be 100.0? Thank you!

quantipy / quantipy3 Goto Github PK

quantipy3's Introduction

Quantipy3

Python for people data

Quantipy for Python 3

Key features

Features not yet supported in Python 3 version

Origins

Contributors

Installation

Create a virtual envirionment

5-minutes to Quantipy

Weighting

Contributing

quantipy3's People

Contributors

Stargazers

Watchers

Forkers

quantipy3's Issues

Error:

Recommend Projects

Recommend Topics

Recommend Org