Code Monkey home page Code Monkey logo

Comments (19)

geirfreysson avatar geirfreysson commented on September 3, 2024

Quantipy should most certainly work with Likert scale-type variables. The significance code has been tested quite a bit using Unicom/Dimensions and the default code replicates the Dimensions settings. The parameters can be tweaked quite a bit which can affect the results - could you share the exact parameters you use when you run the sig-diff in SPSS?

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

Thanks for responding so quickly! Someone else actually created the references in SPSS as I am not very familiar with SPSS myself, what parameters would they be looking for?

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

I'm not familiar with sig-diff testing in SPSS. All I can do is get the chi-square results in the Analyze>Crosstabs menu and that doesn't test each category, it just tests the overall number (I think).

I've added an SPSS file to make testing/comparison easier with SPSS: tests/Example Data (A).sav

This is the exact same data we use to test Quantipy.

Ask your colleague to recreate what they are seeing with that file and post here so maybe we can figure out what parameters SPSS uses for their tests and then maybe we can see if the SPSS results are replicated with Quantipy.

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

Ok so I think I have some idea of what might not be working. I recoded locality in both Python and SPSS, and I think the issue lies in the sig calculation after recoding? Here's the recode in SPSS for locality -> Region
image
and then I run the syntax

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /CRITERIA CILEVEL=95
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

and get
image

replicating the process in python:

meta['columns']['Region']={
    'type': 'delimited set',
    'text': {'en-GB': 'Locality Un-duped'},
    'values': [
        {'value': 1, 'text': {'en-GB': '1'}},
        {'value': 2, 'text': {'en-GB': '2'}},
        {'value': 3, 'text': {'en-GB': '3'}}
    ]
}
# recode
data['Region']=recode(
    meta, data,
    target='Region',
    mapper={
        1: {'locality': 1},
        2: {'locality': frange('2-3')},
        3: {'locality': frange('4-5')}
    },
    append=False
)
ds.crosstab('q2b','Region',sig_level=0.05)

and I get
image
Although the difference is slight here, when I apply it to multiple undup's to 20+ x's, I get a lot more significant Test-IDs than I do in SPSS. It seems that you do have Bonferroni correction in place somewhere in sandbox.py, so I'm not sure why the discrepancies are happening.

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

Thanks for the very detailed report. I don't think this has anything to do with the recode itself, but the Bonferroni correction, which isn't implemented in Quantipy.

If I run your SPSS script without the Bonferroni correction, the result matches Quantipy.

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CRITERIA CILEVEL=95
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

Screen Shot 2021-03-25 at 17 06 50

I didn't write the sig-testing code myself, but I imagine the Bonferroni correction would happen somewhere here:
https://github.com/Quantipy/quantipy3/blob/master/quantipy/core/quantify/engine.py#L1977

and could be done with statsmodels
https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

I'll probably close this ticket for now but create a new one called "implement bonferroni corrections in sig-tests" or something like that.

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

Great, that sounds great! thank you so much for pointing me to the engine.py doc

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

Hi Geir, two new issues regarding the sig results:

  1. In the following screenshot you'll see that group A has a count of zero, but the sig test returns group A as one of the groups that has a significant difference compared to other groups, which should not be the case
    image
  2. I tried applying bonferroni correction in the sig_level argument in the crosstab() function, since what bonferroni correction does is that it changes the alpha level at which the null hypothesis is rejected; the new p-value is 0.00625, which should work as far as I can see, but the crosstab() function just returns the crosstab without the sig view, and I'm genuinely stumped. Do you know whom I can consult with on this specific issue? Thanks!

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024
  1. The A means that columns B,D and E are significantly higher than column A. This doesn't have to be incorrect, if you do a political poll and 0 people say they're voting for a fringe party and 100 people are voting for a mainstream party, the mainstream party has a significantly higher following than the one with 0 counts.

  2. Can you send me a code example of what exactly you are doing?

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024
sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

Thanks for that.

I can see now that there is a bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

The result doesn't look as nice, but you can use the following to get the sig-test results

x = 'q5_3'
y = 'gender'
stack = qp.Stack(name='sig', 
                 add_data={'sig': {'meta': dataset.meta(), 
                                   'data': dataset.data()}})
stack.add_link(data_keys=['sig'], 
               x=x, 
               y=y, 
               views=['c%', 'counts'])
link = stack['sig']['no_filter'][x][y]
test = qp.Test(link, 'x|f|:|||counts')

test = test.set_params(level=sig_level)
df = test.run()
Question gender
Values 1 2
Question Values
q5_3 1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
97 NaN NaN
98 NaN NaN

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

Fantastic, thank you for your help! Should I open up a new issue for this bug?

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

Yes please, that would be great!

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

hi @geirfreysson, sorry to bother again, but is there a book or reference I can check out for the methodologies behind the sig test? Or at least a name of the methodology? Also, is there a way I can specify that I want to perform a chi sq test instead of the default t test? Thank you!

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

Hi @tc423, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand.

The SPSS documentation is here: COMPARETEST.

You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here).

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

I see that there's a weight engine, and I assume this is to help adjust the data overlap issue in delimited sets data type, and gives the weight of each individual response? On a somewhat related note, is there a way I can see the p-values between each group?

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024
  1. The sig-tests deal correctly with multiple response variables.
  2. The weight engine is the library that runs the RIM weighting algorithm. The sig-tests can correct for data overlap.
  3. You can't display them easily in the output, you'd have to go into the engine library itself and output the values there.

Hope that helps!

from quantipy3.

tracyyuqichen avatar tracyyuqichen commented on September 3, 2024

I see, I also just realized that all of this was commented in the engine.py file that I am now trying to edit, so my apologies. Looks like we're back to the problem of not wanting to compare A to B if B is zero, I tried tweaking the source code but it doesn't seem to be working:

def set_params():
...
if self.metric == 'proportions':
    ...
    self.valdiffs = np.array([p1 - p2 if (p1!=0) & (p2!=0) else 0 for p1, p2 in combinations(props, 2)]).T

The only difference is that I added an if... else... statement when calculating p1-p2, but now when I run crosstab() with sig_level=0.05 it simply does not show any sig. I don't think it's a math error because plenty of my data has a count of 0, hence a difference of zero and there was never a problem.

from quantipy3.

geirfreysson avatar geirfreysson commented on September 3, 2024

I'm glad you're making progress with this! If you manage to make it work you can add tests for it and a pull request and the additions will then be available to everyone.

I would try and see if your code is working first with the code I mentioned in a previous comment in this thread.

The crosstab method itself uses the "paint" mechanism that is used to make results pretty, so there are a few more steps where things can go wrong if you use crosstab than if you use the code mentioned above. Try that first and if you get that to work, then the next step is to see why the crosstab mechanism isn't showing the results.

from quantipy3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.