In comparing results from SPSS and Quantipy, we discovered that using the dataset.cros

Hi Geir, two new issues regarding the sig results: In the foll

The A means that columns B,D and E are significantly higher than

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

significance testing results different than SPSS about quantipy3 HOT 19 CLOSED

quantipy commented on September 3, 2024

significance testing results different than SPSS

from quantipy3.

Comments (19)

geirfreysson commented on September 3, 2024

Quantipy should most certainly work with Likert scale-type variables. The significance code has been tested quite a bit using Unicom/Dimensions and the default code replicates the Dimensions settings. The parameters can be tweaked quite a bit which can affect the results - could you share the exact parameters you use when you run the sig-diff in SPSS?

from quantipy3.

tracyyuqichen commented on September 3, 2024

Thanks for responding so quickly! Someone else actually created the references in SPSS as I am not very familiar with SPSS myself, what parameters would they be looking for?

from quantipy3.

geirfreysson commented on September 3, 2024

I'm not familiar with sig-diff testing in SPSS. All I can do is get the chi-square results in the Analyze>Crosstabs menu and that doesn't test each category, it just tests the overall number (I think).

I've added an SPSS file to make testing/comparison easier with SPSS: tests/Example Data (A).sav

This is the exact same data we use to test Quantipy.

Ask your colleague to recreate what they are seeing with that file and post here so maybe we can figure out what parameters SPSS uses for their tests and then maybe we can see if the SPSS results are replicated with Quantipy.

from quantipy3.

tracyyuqichen commented on September 3, 2024

Ok so I think I have some idea of what might not be working. I recoded locality in both Python and SPSS, and I think the issue lies in the sig calculation after recoding? Here's the recode in SPSS for locality -> Region

and then I run the syntax

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /CRITERIA CILEVEL=95
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

and get

replicating the process in python:

meta['columns']['Region']={
    'type': 'delimited set',
    'text': {'en-GB': 'Locality Un-duped'},
    'values': [
        {'value': 1, 'text': {'en-GB': '1'}},
        {'value': 2, 'text': {'en-GB': '2'}},
        {'value': 3, 'text': {'en-GB': '3'}}
    ]
}
# recode
data['Region']=recode(
    meta, data,
    target='Region',
    mapper={
        1: {'locality': 1},
        2: {'locality': frange('2-3')},
        3: {'locality': frange('4-5')}
    },
    append=False
)
ds.crosstab('q2b','Region',sig_level=0.05)

and I get

Although the difference is slight here, when I apply it to multiple undup's to 20+ x's, I get a lot more significant Test-IDs than I do in SPSS. It seems that you do have Bonferroni correction in place somewhere in sandbox.py, so I'm not sure why the discrepancies are happening.

from quantipy3.

geirfreysson commented on September 3, 2024

Thanks for the very detailed report. I don't think this has anything to do with the recode itself, but the Bonferroni correction, which isn't implemented in Quantipy.

If I run your SPSS script without the Bonferroni correction, the result matches Quantipy.

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CRITERIA CILEVEL=95
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

I didn't write the sig-testing code myself, but I imagine the Bonferroni correction would happen somewhere here:
https://github.com/Quantipy/quantipy3/blob/master/quantipy/core/quantify/engine.py#L1977

and could be done with statsmodels
https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

I'll probably close this ticket for now but create a new one called "implement bonferroni corrections in sig-tests" or something like that.

from quantipy3.

tracyyuqichen commented on September 3, 2024

Great, that sounds great! thank you so much for pointing me to the engine.py doc

from quantipy3.

tracyyuqichen commented on September 3, 2024

Hi Geir, two new issues regarding the sig results:

In the following screenshot you'll see that group A has a count of zero, but the sig test returns group A as one of the groups that has a significant difference compared to other groups, which should not be the case
I tried applying bonferroni correction in the sig_level argument in the crosstab() function, since what bonferroni correction does is that it changes the alpha level at which the null hypothesis is rejected; the new p-value is 0.00625, which should work as far as I can see, but the crosstab() function just returns the crosstab without the sig view, and I'm genuinely stumped. Do you know whom I can consult with on this specific issue? Thanks!

from quantipy3.

geirfreysson commented on September 3, 2024

The A means that columns B,D and E are significantly higher than column A. This doesn't have to be incorrect, if you do a political poll and 0 people say they're voting for a fringe party and 100 people are voting for a mainstream party, the mainstream party has a significantly higher following than the one with 0 counts.
Can you send me a code example of what exactly you are doing?

from quantipy3.

tracyyuqichen commented on September 3, 2024

sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

from quantipy3.

geirfreysson commented on September 3, 2024

Thanks for that.

I can see now that there is a bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

The result doesn't look as nice, but you can use the following to get the sig-test results

x = 'q5_3'
y = 'gender'
stack = qp.Stack(name='sig', 
                 add_data={'sig': {'meta': dataset.meta(), 
                                   'data': dataset.data()}})
stack.add_link(data_keys=['sig'], 
               x=x, 
               y=y, 
               views=['c%', 'counts'])
link = stack['sig']['no_filter'][x][y]
test = qp.Test(link, 'x|f|:|||counts')

test = test.set_params(level=sig_level)
df = test.run()

	Question	gender
	Values	1	2
Question	Values
q5_3	1	NaN	NaN
	2	NaN	NaN
	3	NaN	NaN
	4	NaN	NaN
	5	NaN	NaN
	97	NaN	NaN
	98	NaN	NaN

from quantipy3.

tracyyuqichen commented on September 3, 2024

Fantastic, thank you for your help! Should I open up a new issue for this bug?

from quantipy3.

geirfreysson commented on September 3, 2024

Yes please, that would be great!

from quantipy3.

tracyyuqichen commented on September 3, 2024

hi @geirfreysson, sorry to bother again, but is there a book or reference I can check out for the methodologies behind the sig test? Or at least a name of the methodology? Also, is there a way I can specify that I want to perform a chi sq test instead of the default t test? Thank you!

from quantipy3.

geirfreysson commented on September 3, 2024

Hi @tc423, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand.

The SPSS documentation is here: COMPARETEST.

You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here).

from quantipy3.

tracyyuqichen commented on September 3, 2024

I see, and does the sig-test take into consideration if the question is a multiple response? I know that there is data processing for data type == delimited sets, but I'm wondering if it currently affects the sig-test logic?

…

On Fri, Apr 23, 2021 at 5:21 PM Geir Freysson ***@***.***> wrote: Hi @tc423 <https://github.com/tc423>, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand. The SPSS documentation is here: COMPARETEST <https://www.ibm.com/docs/en/spss-statistics/24.0.0?topic=stcc-pairwise-comparisons-proportions-means-comparetest-subcommand-ctables-command> . You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here <https://github.com/Quantipy/quantipy3/blob/8a9cc67d10d08e279143333afddb18b6e789be85/quantipy/sandbox/sandbox.py#L6389> ). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACDDLR4AVUYAKELXW63UHD3TKE32XANCNFSM4ZSQOVEA> .

from quantipy3.

tracyyuqichen commented on September 3, 2024

I see that there's a weight engine, and I assume this is to help adjust the data overlap issue in delimited sets data type, and gives the weight of each individual response? On a somewhat related note, is there a way I can see the p-values between each group?

from quantipy3.

geirfreysson commented on September 3, 2024

The sig-tests deal correctly with multiple response variables.
The weight engine is the library that runs the RIM weighting algorithm. The sig-tests can correct for data overlap.
You can't display them easily in the output, you'd have to go into the engine library itself and output the values there.

Hope that helps!

from quantipy3.

tracyyuqichen commented on September 3, 2024

I see, I also just realized that all of this was commented in the engine.py file that I am now trying to edit, so my apologies. Looks like we're back to the problem of not wanting to compare A to B if B is zero, I tried tweaking the source code but it doesn't seem to be working:

def set_params():
...
if self.metric == 'proportions':
    ...
    self.valdiffs = np.array([p1 - p2 if (p1!=0) & (p2!=0) else 0 for p1, p2 in combinations(props, 2)]).T

The only difference is that I added an if... else... statement when calculating p1-p2, but now when I run crosstab() with sig_level=0.05 it simply does not show any sig. I don't think it's a math error because plenty of my data has a count of 0, hence a difference of zero and there was never a problem.

from quantipy3.

geirfreysson commented on September 3, 2024

I'm glad you're making progress with this! If you manage to make it work you can add tests for it and a pull request and the additions will then be available to everyone.

I would try and see if your code is working first with the code I mentioned in a previous comment in this thread.

The crosstab method itself uses the "paint" mechanism that is used to make results pretty, so there are a few more steps where things can go wrong if you use crosstab than if you use the code mentioned above. Try that first and if you get that to work, then the next step is to see why the crosstab mechanism isn't showing the results.

from quantipy3.

significance testing results different than SPSS about quantipy3 HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent