Code Monkey home page Code Monkey logo

quantipy3's Introduction

Quantipy3

Python for people data

Quantipy is an open-source data processing, analysis and reporting software project that builds on the excellent pandas and numpy libraries. Aimed at people data, Quantipy offers support for native handling of special data types like multiple choice variables, statistical analysis using case or observation weights, DataFrame metadata and pretty data exports.

Quantipy for Python 3

This repository is a port of Quantipy from Python 2.x to Python 3.

Key features

  • Reads plain .csv, converts from Dimensions, SPSS, Decipher, or Ascribe
  • Open metadata format to describe and manage datasets
  • Powerful, metadata-driven cleaning, editing, recoding and transformation of datasets
  • Computation and assessment of data weights
  • Easy-to-use analysis interface

Features not yet supported in Python 3 version

  • Structured analysis and reporting via Chain and Cluster containers
  • Exports to SPSS, Dimensions ddf/mdd, MS Excel and Powerpoint with flexible layouts and various options
  • Python 3.8 is not yet fully supported, but 3.5, 3.6, and 3.7 are.

Origins

Contributors

Installation

pip install quantipy3

or

python3 -m pip install quantipy3

Note that the package is called quantipy3 on pip.

Create a virtual envirionment

If you want to create a virtual environment when using Quantipy:

conda

conda create -n envqp python=3

with venv

python -m venv [your_env_name]

5-minutes to Quantipy

Get started

If you are working with SPSS, import your sav file.

import quantipy as qp
dataset = qp.DataSet("My dataset, wave 1")
dataset.read_spss('my_file.sav')

You can start straight away by exploring what variables are in your file.

dataset.variables()
['gender',
 'agecat',
 'price_satisfaction',
 'numitems_satisfaction',
 'org_satisfaction',
 'service_satisfaction',
 'quality_satisfaction',
 'overall_satisfaction',
 'weight']

If you want more details on a variable, explore it's meta data.

dataset.meta('agecat')
single codes texts missing
agecat: Age category
1 1 18-24 None
2 2 25-34 None
3 3 35-49 None
4 4 50-64 None
5 5 64+ None

Quantipy knows out-of-the-box what SPSS's meta data means and uses it correctly. All codes and labels are the same as in the sav file.

Calculate some results, counts or percentages

dataset.crosstab('price_satisfaction', 'gender')
Question agecat. Age category
Values All 18-24 25-34 35-49 50-64 64+
Question Values
price_satisfaction. Price satisfaction All 582.0 46.0 127.0 230.0 147.0 32.0
Strongly Negative 72.0 8.0 20.0 22.0 17.0 5.0
Somewhat Negative 135.0 10.0 30.0 52.0 38.0 5.0
Neutral 140.0 9.0 32.0 59.0 36.0 4.0
Somewhat Positive 145.0 12.0 25.0 63.0 33.0 12.0
Strongly Positive 90.0 7.0 20.0 34.0 23.0 6.0

You can also filter

dataset.crosstab('price_satisfaction', 'agecat', f={'gender':1})

and use a weight column

dataset.crosstab('price_satisfaction', 'agecat', f={'gender':1}, w="weight")

Variables can be created, recoded or edited with DataSet methods, e.g. derive():

mapper = [(1,  '18-35 year old', {'agecat': [1,2]}),
          (2, '36 and older', {'agecat': [3,4,5]})]

dataset.derive('two_age_groups', 'single', dataset.text("Older or younger than 35"), mapper)
dataset.meta('two_age_groups')
single                                              codes     texts              missing
two_age_groups: "Older or youngar than 35"
1                                                       1     18-35 years old    None
2                                                       2     36 and older       None

The DataSet case data component can be inspected with the []-indexer, as known from a pd.DataFrame:

dataset[['gender', 'age']].head(5)
        gender  age
0       1.0    1.0
1       2.0    1.0
2       2.0    2.0
3       1.0    NaN
4       NaN    1.0

Weighting

If your data hasn't been weighted yet, you can use Quantipy's RIM weighting algorithm.

Assuming we have the same variables as before, gender and agecat we can weight the dataset with these two variables:

from quantipy.core.weights.rim import Rim

age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}}
gender_targets = {'gender':{0:49, 1:51}}
scheme = Rim('gender_and_age')
scheme.set_targets(targets=[age_targets, gender_targets])
dataset.weight(scheme,unique_key='respondentId',
               weight_name="my_weight",
               inplace=True)

Quantipy will show you a weighting report:

Weight variable       weights_gender_and_age
Weight group                  _default_name_
Weight filter                           None
Total: unweighted                 582.000000
Total: weighted                   582.000000
Weighting efficiency               60.009826
Iterations required                14.000000
Mean weight factor                  1.000000
Minimum weight factor               0.465818
Maximum weight factor               6.187700
Weight factor ratio                13.283522

And you can test whether the weighting has worked by running crosstabs:

dataset.crosstab('agecat', ci=['c%'], w='my_new_weight')
Question agecat. Age category
Question Values
agecat. Age category All 100.0
18-24 5.0
25-34 30.0
35-49 26.0
50-64 19.0
64+ 20.0
dataset.crosstab('gender', ci=['c%'], w='my_new_weight')
Question gender. Gender
Question Values
gender. Gender All 100.0
Male 49.0
Female 51.0

Contributing

The test suite for Quantipy can be run with the command

python3 -m pytest tests

But when developing a specific aspect of Quantipy, it might be quicker to run (e.g. for the DataSet)

python3 -m unittest tests.test_dataset

Tests for unsupported features are skipped, see here for what tests are supported.

We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations.

quantipy3's People

Contributors

alasdaire avatar alexbuchhammer avatar alextanski avatar andersfreund avatar biggihs avatar directions-dev avatar g3org3mb avatar geirfreysson avatar jamesrkg avatar majeed-sahebzadha avatar nitsrek avatar pablincho avatar roxanamarianeagu avatar roxananeagu avatar tokensalad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quantipy3's Issues

'IndexError: positional indexers are out-of-bounds' after cloning and filtering DataSet object

I'm trying to loop through different values for s

        keep = {'S1': [s]}
        copy_ds.filter(alias=name, condition=keep, inplace=True)

but after doing this, I am getting the following error report.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2129         try:
-> 2130             return self.obj.take(key, axis=axis)
   2131         except IndexError:

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
   3603         new_data = self._data.take(
-> 3604             indices, axis=self._get_block_manager_axis(axis), verify=True
   3605         )

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in take(self, indexer, axis, verify, convert)
   1388         if convert:
-> 1389             indexer = maybe_convert_indices(indexer, n)
   1390 

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexers.py in maybe_convert_indices(indices, n)
    200     if mask.any():
--> 201         raise IndexError("indices are out-of-bounds")
    202     return indices

IndexError: indices are out-of-bounds
IndexError                                Traceback (most recent call last)
<ipython-input-14-0085af4bc4bf> in <module>
     25         # loop questions
     26         if (c in qtype_dict['with_m']) or (c in qtype_dict['only_m']):
---> 27             t_cnt=copy_ds.crosstab(c,stats=True)
     28             t_pct=copy_ds.crosstab(c,ci='c%',stats=True)
     29 

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-312> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _to_list(func, *args, **kwargs)
    167                 args = tuple(a if not x == val_index else v
    168                              for x, a in enumerate(args))
--> 169         return func(*args, **kwargs)
    170 
    171     if to_list:

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-311> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _deco(func, *args, **kwargs)
    141             if arg is None: continue
    142             func = dec(func)
--> 143         return func(*args, **kwargs)
    144 
    145     if categorical and not isinstance(categorical, list): categorical = [categorical]

</opt/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-4920> in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/tools/qp_decorators.py in _var_in_ds(func, *args, **kwargs)
     41             var = kwargs.get(variable, args[v_index])
     42             if var is None:
---> 43                 return func(*args, **kwargs)
     44             if not isinstance(var, list):
     45                 var = [var]

/opt/anaconda3/lib/python3.7/site-packages/quantipy/core/dataset.py in crosstab(self, x, y, w, f, ci, stats, sig_level, rules, decimals, xtotal, painted)
   1904         else:
   1905             idx = self.take(f)
-> 1906         data = self._data.copy().iloc[idx]
   1907         stack = qp.Stack(name='ct', add_data={'ct': (data, self._meta)})
   1908         if xtotal or not y:

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2146         # a list of integers
   2147         elif is_list_like_indexer(key):
-> 2148             return self._get_list_axis(key, axis=axis)
   2149 
   2150         # a single integer

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_list_axis(self, key, axis)
   2131         except IndexError:
   2132             # re-raise with different error message
-> 2133             raise IndexError("positional indexers are out-of-bounds")
   2134 
   2135     def _getitem_axis(self, key, axis: int):

IndexError: positional indexers are out-of-bounds

Thank you so much in advance for helping me troubleshoot this!

dataset.tabulate() - new method based on dataset.crosstab()

We need a new function, that calls dataset.crosstab(), which is friendlier to use that the crosstab function.

Currently there are only two display options for the result of dataset.crosstab(). Users can select pct=True/False (see screenshots below).

DataSet.tabulate() will have the following parameters:

Passed directly to crosstab

  • x
  • y
  • w (weight)
  • f (filter)

Additional parameters:

  • show - string ('all') or list of strings ('count', 'base', 'pct', 'ubase', 'ucount') for count, base percentages, unweighted base and unweighted count (only valid if the w parameter is supplied)
  • include_varname - boolean that indicates whether variables should have the varialbe name in front of them, if it is true the columns are shown as they are in the screenshot below, if they are false "gender. Gender" will become "Gender", i.e. the variable name is only shown if the variable has no labe.

Screenshot 2019-10-09 16 31 24
Screenshot 2019-10-09 16 31 35

In Dimensions/reader.py levels.ix is deprecated

levels.ix call in ddf_to_pandas, is deprecated and does not work for Python 3.7.
This should be replaced by levels.loc

Line:
new_table_name = levels.ix[table_name,'DSCTableName']
Should be replaced with:
new_table_name = levels.loc[table_name,'DSCTableName']

reader.zip

Installation

I followed the instructions for installing but it fails every time. I was able to get all dependencies loaded and can import the Quantipy3 library, but there doesn't seem to be any methods associated with it. Just an empty module. Any help would be much appreciated! I am working on Windows 10.

Error while installing quantipy

@geirfreysson @KushagraPan
I am having this error while installing quantipy in python 2.7 environment.. kindly help

ERROR: Could not find a version that satisfies the requirement ftfy==5.5.1 (from quantipy==0.2.1->-r requirements_dev.txt (line 4)) (from versions: 1.0, 2.0, 2.0.1, 2.0.2, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.3.0, 3.4.0, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.3.1, 4.4, 4.4.1, 4.4.2, 4.4.3, 5.0, 5.0.1, 5.0.2, 5.1, 5.1.1, 5.2.0, 5.3.0)
ERROR: No matching distribution found for ftfy==5.5.1 (from quantipy==0.2.1->-r requirements_dev.txt (line 4))

significance testing results different than SPSS

In comparing results from SPSS and Quantipy, we discovered that using the dataset.crosstab() function gives different significance results for comparing categorical distribution than SPSS. I see in the sandbox.py document that chisq is used for calculating significance, which should give us the same results. Could this be because Quantipy does not recognize multivariates that are ordinal (Likert scale)?

Windows Compatibility?

Hi Developers,

I have been trying to give a shot at quantipy3, but something is not working for me. Based on Geir's post on towardsdatascience I see he is running this on mac.
If you have tried this on Windows can you tell me the exact version of anaconda, python, and windows that you have tried this build on?
I get the following error while executing "dataset.read_spss('SelectOnlyUK.sav',ioLocale=None)"
"TypeError: LoadLibrary() argument 1 must be str, not None." Here is the full stack trace
"Traceback (most recent call last):
File "test.py", line 13, in
dataset.read_spss('SelectOnlyUK.sav',ioLocale=None)
File "C:\Users\Ari\quantipy3\quantipy\core\dataset.py", line 601, in read_spss
self._meta, self.data = r_spss(path_sav+'.sav', **kwargs)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\io.py", line 328, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 36, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 44, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "C:\Users\Ari\quantipy3\savReaderWriter\savReader.py", line 57, in init
ioUtf8, ioLocale)
File "C:\Users\Ari\quantipy3\savReaderWriter\header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "C:\Users\Ari\quantipy3\savReaderWriter\generic.py", line 30, in init
self.libc = cdll.LoadLibrary(ctypes.util.find_library("c"))
File "C:\ProgramData\Anaconda3\envs\envqp\lib\ctypes_init
.py", line 442, in LoadLibrary
return self.dlltype(name)
File "C:\ProgramData\Anaconda3\envs\envqp\lib\ctypes_init
.py", line 364, in init
self._handle = _dlopen(self._name, mode)
TypeError: LoadLibrary() argument 1 must be str, not None"

If I try to load mdd/ddf file with the following line "meta, data = read_dimensions('S19029641.mdd', 'S19029641.ddf')"
I get the following error "Traceback (most recent call last):
File "AA.py", line 13, in
meta, data = read_dimensions('S19029641.mdd', 'S19029641.ddf')
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\io.py", line 297, in read_dimensions
meta, data = quantipy_from_dimensions(path_mdd, path_ddf)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 992, in quantipy_from_dimensions
meta, ddf = mdd_to_quantipy(path_mdd, data=L1)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 774, in mdd_to_quantipy
meta, columns, data = get_columns_meta(xml, meta, data, map_values=True)
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 737, in get_columns_meta
xml, column, data, map_values
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 414, in get_meta_values
values = [int(v) for v in byProperty_values]
File "C:\Users\Ari\quantipy3\quantipy\core\tools\dp\dimensions\reader.py", line 414, in
values = [int(v) for v in byProperty_values]
ValueError: invalid literal for int() with base 10: 'UKC11'"

Can you give me a hint about what I do wrong? I have re-installed anaconda several times and followed the readme me closely. I have also tried different versions of anaconda as the spss reading error seems to happen because the python cannot find some libraries. But I had no luck.
Thanks!

Issue loading in SPSS file

Hi,
I'm having issues loading in an SPSS file using quanitpy. I have quantipy running on python 3.6 and have already tested that I can load the file into notebook using pyreadstat. Not sure why it is erroring here. Any help would be much appreciated!

Code:
dataset = qp.DataSet("test_dataset")
dataset.read_spss('Softlaunch data_CLIENT.sav')

Error:

TypeError Traceback (most recent call last)
in
1 dataset = qp.DataSet("test_dataset")
----> 2 dataset.read_spss('Softlaunch data_CLIENT.sav')

~\Anaconda3\lib\site-packages\quantipy\core\dataset.py in read_spss(self, path_sav, **kwargs)
642 """
643 if path_sav.endswith('.sav'): path_sav = path_sav.replace('.sav', '')
--> 644 self._meta, self._data = r_spss(path_sav+'.sav', **kwargs)
645 self._set_file_info(path_sav)
646 self._rename_blacklist_vars()

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\io.py in read_spss(path_sav, **kwargs)
375 def read_spss(path_sav, **kwargs):
376
--> 377 meta, data = parse_sav_file(path_sav, **kwargs)
378 return meta, data
379

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\spss\reader.py in parse_sav_file(filename, path, name, ioLocale, ioUtf8, dichot, dates_as_strings, text_key)
35 filepath = os.path.abspath(filepath)
36 data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
---> 37 meta, data = extract_sav_meta(filepath, name="", data=data, ioLocale=ioLocale,
38 ioUtf8=ioUtf8, dichot=dichot, dates_as_strings=dates_as_strings,
39 text_key=text_key)

~\Anaconda3\lib\site-packages\quantipy\core\tools\dp\spss\reader.py in extract_sav_meta(sav_file, name, data, ioLocale, ioUtf8, dichot, dates_as_strings, text_key)
170 # into data, which is immediately prior to the start of the
171 # dichotomous set columns
--> 172 dls_idx = data.columns.tolist().index(varNames[0])
173 if metadata.multRespDefs[mrset]['setType'] == 'C':
174 # Raise if value object of columns is not equal

TypeError: 'map' object is not subscriptable

Reading SAV file from your repository does not work :(

Hi!

Seems the reading SAV file is the showstopper - did workaround mentioned somewhere:
used pyreadstat and imported dataframe with dataset.from_components(df)

Unfortunately I miss SPSS meta data goodies. BTW is it possible to somehow import pyreadstat META part?

What I get with:
import quantipy as qp
dataset = qp.DataSet("My dataset, wave 1")
dataset.read_spss('"D:/tmp/Example Data (A).sav"')

is an error (using PyCharm and Python 3.7 venv):

Traceback (most recent call last):
File "C:/Users/zbatagelj/PycharmProjects/Valiconer/t.py", line 10, in
dataset.read_spss("D:/tmp/Example Data (B).sav")
File "C:\Users\zbatagelj\quantipy3\quantipy\core\dataset.py", line 601, in read_spss
self._meta, self.data = r_spss(path_sav+'.sav', **kwargs)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\io.py", line 331, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 40, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "C:\Users\zbatagelj\quantipy3\quantipy\core\tools\dp\spss\reader.py", line 48, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\savReader.py", line 60, in init
ioUtf8, ioLocale)
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "C:\Users\zbatagelj\quantipy3\savReaderWriter\generic.py", line 30, in init
self.libc = cdll.LoadLibrary(ctypes.util.find_library("c"))
File "C:\Users\zbatagelj\AppData\Local\Programs\Python\Python37\lib\ctypes_init
.py", line 442, in LoadLibrary
return self.dlltype(name)
File "C:\Users\zbatagelj\AppData\Local\Programs\Python\Python37\lib\ctypes_init
.py", line 364, in init
self._handle = _dlopen(self._name, mode)
TypeError: LoadLibrary() argument 1 must be str, not None

Not able to install Quantipy in Google Colab/Python 3

@geirfreysson I am having this error when I ran pip install quantipy

"Collecting quantipy
Downloading https://files.pythonhosted.org/packages/5c/cf/4c31a2054e045a553afa3adf5037476fa7c73ae170753b999f6af5a851af/QuantiPy-0.0.0.dev3.tar.gz (287kB)
|████████████████████████████████| 296kB 2.8MB/s
Building wheels for collected packages: quantipy
Building wheel for quantipy (setup.py) ... error
ERROR: Failed building wheel for quantipy
Running setup.py clean for quantipy
Failed to build quantipy
Installing collected packages: quantipy
Running setup.py install for quantipy ... error
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-so92uhnx/quantipy/setup.py'"'"'; file='"'"'/tmp/pip-install-so92uhnx/quantipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-besc6ke4/install-record.txt --single-version-externally-managed --compile Check the logs for full command output."

Kindly help

Can't install quantipy3

Hi,

I install quantipy3 in Anaconda powershell prompt,but pandas and numpy used old version,so fail to install quantipy3,have any solution methods can solve it? thanks!

Variables shouldn't have to have "_" in name to be part of a delimited set

This works:
dataset.to_delimited_set('candy_choice', 'My delimited set', ['candy_1', 'candy_2', 'candy_3'])

While this doesn't
dataset.to_delimited_set('candy_choice', 'My delimited set', ['mars', 'snickers', 'twix'])

We need to fix to_delimited_set so that it doesn't rely on this naming convention.

Error using read_spss

Hello,

Attempted to use quantipy and read in a sav file. Received this error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/dataset.py", line 644, in read_spss
self._meta, self._data = r_spss(path_sav+'.sav', **kwargs)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/io.py", line 377, in read_spss
meta, data = parse_sav_file(path_sav, **kwargs)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/spss/reader.py", line 36, in parse_sav_file
data = extract_sav_data(filepath, ioLocale=ioLocale, ioUtf8=ioUtf8)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/quantipy/core/tools/dp/spss/reader.py", line 44, in extract_sav_data
with sr.SavReader(sav_file, returnHeader=True, ioLocale=ioLocale, ioUtf8=ioUtf8) as reader:
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/savReader.py", line 57, in init
ioUtf8, ioLocale)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/header.py", line 31, in init
super(Header, self).init(savFileName, ioUtf8, ioLocale)
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 35, in init
self.spssio = self.loadLibrary()
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 117, in loadLibrary
spssio = self._loadLibs("macos")
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 89, in _loadLibs
return [load(os.path.join(path, lib)) for lib in libs][-1]
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/generic.py", line 89, in
return [load(os.path.join(path, lib)) for lib in libs][-1]
File "/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/spssio/macos/libicuuc48.1.dylib, 6): Library not loaded: @executable_path/../lib/libicudata48.1.dylib
Referenced from: /Users/ckheinz/Miniconda/miniconda3/envs/envqp/lib/python3.6/site-packages/savReaderWriter/spssio/macos/libicuuc48.1.dylib
Reason: image not found

My environment specs:

Name Version Build Channel
ca-certificates 2021.1.19 hecd8cb5_0
certifi 2020.12.5 py36hecd8cb5_0
chardet 4.0.0 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
ftfy 5.5.1 pypi_0 pypi
idna 2.10 pypi_0 pypi
importlib-metadata 3.7.0 pypi_0 pypi
libcxx 10.0.0 1
libedit 3.1.20191231 h1de35cc_1
libffi 3.3 hb1e8313_2
lxml 4.6.2 pypi_0 pypi
ncurses 6.2 h0a44026_1
numpy 1.14.5 pypi_0 pypi
openssl 1.1.1j h9ed2024_0
pandas 0.25.3 pypi_0 pypi
pillow 8.1.1 pypi_0 pypi
pip 21.0.1 py36hecd8cb5_0
prettytable 2.1.0 pypi_0 pypi
python 3.6.13 h88f2d9e_0
python-dateutil 2.8.1 pypi_0 pypi
python-pptx 0.6.18 pypi_0 pypi
pytz 2021.1 pypi_0 pypi
quantipy3 0.2.3 pypi_0 pypi
readline 8.1 h9ed2024_0
requests 2.25.1 pypi_0 pypi
scipy 1.2.1 pypi_0 pypi
setuptools 52.0.0 py36hecd8cb5_0
six 1.15.0 pypi_0 pypi
sqlite 3.33.0 hffcf06c_0
tk 8.6.10 hb0a8c7a_0
typing-extensions 3.7.4.3 pypi_0 pypi
urllib3 1.26.3 pypi_0 pypi
watchdog 2.0.2 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
wheel 0.36.2 pyhd3eb1b0_0
xlsxwriter 1.3.7 pypi_0 pypi
xmltodict 0.12.0 pypi_0 pypi
xz 5.2.5 h1de35cc_0
zipp 3.4.0 pypi_0 pypi
zlib 1.2.11 h1de35cc_3

My code:

import quantipy as qp
dataset = qp.DataSet("Test")
dataset.read_spss('Data.sav')

ds.populate() not populating meta data

As I understand, ds.populate() should populate the stack with meta data. When I use qp.Stack() to create a stack it works fine
image
but when I try to create a stack by calling ds.populate(),this happens
image
Any idea on why/how to work around it?

ds.crosstab() doesn't show sig when sig_level<0.01

A bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

Example code:

sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

This would only give the cross-tabulation of counts and not the significant groups

Reading dimensions mdd and ddf fails for 'field' type variable in Dimensions

Current Quantipy logic assumes '.' only exist in grid/loop variables. This understanding is not true for field type variable especially when field may have loop variable in it.

For example following is valid in Dimensions MDD.

MyField "My top level field variable"
Block fields
{
InnnerLoop "Inner Loop variable" loop {_1, _2}
(
innermostvariable "Inner most simple variable"
{
_X "Answer X",
Y "Answer Y"
}
)
}

Above will generate column in data like
MyField .InnnerLoop [_1].innermostvariable
MyField .InnnerLoop [_2].innermostvariable

This variable name will fail 'get_columns_meta()' in turn failing overall reading of Dimensions input.

Option 1: Skip such variable : which is not the best solution but makes rest of the data usable.
Option 2: Rename Variable up front
Option 3: Change current logic to read more information from MDD XML as all required information is available in XML as such.

Excel output example

Hi! Could you please provide an example for creating some simple output (e.g. a crosstab) in Excel?

Thanks!

Reading CSV

Hey, I have survey data recorded in a CSV(via google sheets) file, don't have a metadata(json) file for the same.
How to go about finding weights using Quantipy?

Only run tests for aspects of Quantipy that are supported in the python3 port

Currently the test suite has a lot of fails, because only part of Quantipy has been ported to Python 3.

We need to remove tests for aspects of Quantipy that aren't yet supported in python 3, so that we have a test suite that passes and is therefore usable.

A list of the tests that are not passing is linked to from the README. These should possibly be moved to another folder so that unittest in "discover" mode doesn't pick them up.

pip install quantipy3 fails on 3.9, numpy problem

Hi,

I'm interested in trying this out, but I am having build problems on MacOS, using Python from PyEnv.

 ~  pyenv version                                       1 ↵  3039  20:52:21
geo3 (set by PYENV_VERSION environment variable)
 ~  python -V                                             ✔  3040  20:52:25
Python 3.9.13
 ~  pip install quantipy3 >> ~/Desktop/error.txt          ✔  3041  20:52:28

error.txt

Can anyone make a suggestion?

Dimensions cannot load the data when subquestions of loops have similar names

When trying to load data from DIM which contains several LOOP questions which contains the same subquestions, the method quantipy_clean(ddf) fails.

image

As you can see from the screenshot, the parent is not str object, because there are multiple parents for the same element.
I'm not familiar enough with the quantipy, but I believe some recursion must be implemented into this method in order to handle that.

Error.. kindly help please

@geirfreysson Everything was fine till I ran pip install quantipy command mentioned in the instructions when this error came:

Using legacy setup.py install for scipy, since package 'wheel' is not installed.
Using legacy setup.py install for numpy, since package 'wheel' is not installed.
Using legacy setup.py install for pandas, since package 'wheel' is not installed.
Using legacy setup.py install for watchdog, since package 'wheel' is not installed.
Using legacy setup.py install for pathtools, since package 'wheel' is not installed.
Installing collected packages: numpy, scipy, six, python-dateutil, pytz, pandas, wcwidth, ftfy, xmltodict, lxml, xlsxwriter, pillow, prettytable, decorator, pathtools, watchdog, certifi, chardet, idna, urllib3, requests, python-pptx
Running setup.py install for numpy ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\hp[quantipy]\scripts\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\setup.py'"'"'; file='"'"'C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\hp\AppData\Local\Temp\pip-record-33wvbjxf\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\hp[quantipy]\include\site\python3.8\numpy'
cwd: C:\Users\hp\AppData\Local\Temp\pip-install-53n5qc8x\numpy\

Support for XPT Files

I realise that XPT files are not supported with the read_spss function. It attempts to add the '.sav' extension as a suffix when a path is sent through. Would be great if we could get functionality that adds xpt support and parsing format / meta data for through a '.SAS' file.

Rim Weighting error: Variable "category" is unsuitable for Weighting.

Hey All,

I am trying to do some rim weighting. Here is my code...

    ##create quantipy dataset from pandas dataframe
    dataset = qp.DataSet(name='example', dimensions_comp=False)
    dataset.from_components(df_srv)

    ##create rim scheme
    scheme = qp.Rim('my_first_scheme')

    #create targets for rim weighting from population dataset
    category_targets = {}
    temp = df_apps['category'].value_counts(normalize=True).reset_index(drop=True)
    temp.index += 1 # I thought the zero-index might be the issue so this starts index from 1
    category_targets['category'] = temp.to_dict()
    scheme.set_targets(targets=category_targets, group_name='basic weights')
    dataset.weight(scheme, weight_name='weights_new', unique_key='cvrp_id') ##cvrp_id is unique

Here are the contents of category_targets:
{'category': {1: 0.619027907888717, 2: 0.34975999070463243, 3: 0.031212101406650546}}

I am getting the following error message:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\dataset.py", line 3025, in weight
    engine.add_scheme(weight_scheme, key=unique_key, verbose=verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\weight_engine.py", line 202, in add_scheme
    scheme._minimize_columns(self._df, key, verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\rim.py", line 311, in _minimize_columns
    self._check_targets(verbose)
  File "C:\Users\james.tamerius\Desktop\qc_jt\quantipy3\quantipy\core\weights\rim.py", line 433, in _check_targets
    raise ValueError(vartype_err.format(self.name, group, target_col))
ValueError: *** Stopping: Scheme "my_first_scheme", group "basic weights" ***
Variable "category" is unsuitable for Weighting.
Target variables must be of type integer (convertable) / single categorical.

Any help would be much appreciated!

Missing label in Dimensions MDD fails reading dimensions

Current Quantipy reader.py->begin_column expect dimensions to always have label. This is not a requirement as per Dimensions MDD spec and could be missing.

    `column['text'] = get_text_dict(xml.xpath(xpath__col_text)[0].getchildren())`

Above xml.xpath will return empty when label is missing and data read will fail.

dimensions file not loading properly

Traceback (most recent call last):
File "F:\Python\data.py", line 5, in
meta, data = qp.read_dimensions('KAP2.mdd','KAP2.ddf')
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\io.py", line 300, in read_dimensions
meta, data = quantipy_from_dimensions(path_mdd, path_ddf)
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\dimensions\reader.py", line 976, in quantipy_from_dimensions
ddf, levels = quantipy_clean(ddf_to_pandas(path_ddf))
File "C:\Program Files\Python37\lib\site-packages\quantipy\core\tools\dp\dimensions\reader.py", line 147, in quantipy_clean
if parent=='None':
File "C:\Program Files\Python37\lib\site-packages\pandas\core\generic.py", line 1555, in nonzero
self.class.name
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

dataset.crosstab() should optinally return prettier output

Currently, the crosstab method always returns a dataframe with multiindex columns and indexes.

We should allow parameters that affect this, although two multiindexes should always be available for reusability in different methods.

I propose adding a pretty=True/False parameter (true by default) that is intelligent and

  • removes the "@" and replaces with the word "Total"
  • If only one variable is being calculated, don't show the same info twice as is done in the second screenshot
  • Removes the variable name so that "price. Price satisfaction" becomes "Price satisfaction"

See below for clarification:

The below image makes sense and should not change (although the variable name ahead of the label should disappear when pretty=True, so that "gender. Gender" becomes "Gender")
Screenshot 2019-10-09 16 24 40

The below image does not make sense. If there is only one variable being shown, we don't need to show it's name in the columns index. So the top level column index should be dropped and "@" replaced with "Total"
Screenshot 2019-10-09 16 20 15

new crosstab() function doesn't show significance if arg ci='%c'

First of all, fantastic that crosstab() now allows adding of significance view. However, when we assign '%c' to the argument ci, specifying sig_level no longer gives us the sig view. Also, when ci='%c' and xtotal=True, shouldn't the base now be 100.0? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.