Code Monkey home page Code Monkey logo

cta_new-main-2's Introduction

Computational Thematic Analysis Toolkit

IMPORTANT

May 1st 2023 we learned that Redit has informed it's moderator community that pushshift.io was in violation with Reddit API's new terms and has had it's access revoked. https://www.reddit.com/r/modnews/comments/134tjpe/reddit_data_api_update_changes_to_pushshift_access/

As such Pushshift.io functionality in the toolkit should be used with caution and careful consideration. The source code will have a warning regarding this situation added to the data collection functionality that was leveraging the Pushshift API.

Reference

Robert P. Gauthier and James R. Wallace. 2022. The Computational Thematic Analysis Toolkit. Proc. ACM Hum.-Comput. Interact. 6, GROUP, Article 25 (January 2022), 15 pages. https://doi.org/10.1145/3492844

Installation Instructions

To Access most recent version: https://github.com/rpgauthier/ComputationalThematicAnalysisToolkit/releases/latest

Installers available for Windows 10 x64 and OSX

Toolkit Functionality:

The Toolkit is made up of interconnected modules.

Data Collection

Is used by the researcher to import data into the toolkit. Once imported the module visualizes the data's content so that the user can interact with the data at scale and become more familiar with the data and begin forming ideas about for their analysis.

Data Cleaning & Filtering

Provide the researcher the ability to: (1) see what rules are being used to include and remove words by the toolkit's internal application of computational techniques; (2) review which words are included and removed by the rules; and (3) to tune the rules to search for signals. During this process researchers can become more familiar with general dataset by seeing how different words are used by clicking on any word in the included or removed list.

Modelling & Sampling

Provides the ability for researcher to create a variety of purposive samples, using iterative topic models the seek to group data based on signals such as common word groupings in the comments, to provide a diverse set of models that capture samples of different sets of data. The researcher can use these samples to help them both further familiarize with the data as well as continue forming their inductive analytical framework.

Coding

Provides the researcher with a place where data can be coded and reviewed in an iterative manner to develop, refine, and apply their analytical framework to sampled data in the form of a concrete set of codes.

Reviewing

Provides the researcher a place to create themes, group codes within the themes and visualize connections between codes and themes.

Reporting

Provides an interface to help the researcher choose quotes and keep track of which piece of data they came from for each code and theme and, if desired for ethical reasons, keep track of paraphrasing of these quotations to enable review with the research team about whether the paraphrase captured the original quotation properly.

To Modify or Build a New version

Download or Fork repository Open src folder in an IDE (tested in VS Code on Windows and OSX)

Build Commands

Windows:

  1. pyinstaller pyinstaller-Windows10x64.spec --additional-hooks-dir=.
  2. run & compile innosetup_Windows10x64.iss

OSX running an intel chip:

  1. change paths in pyinstaller-OSX.spec to where your python site-packages are installed
  2. python -m PyInstaller --windowed --additional-hooks-dir=. pyinstaller-OSX.spec
  3. run & build packages_OSX_x86_64.pkgproj

OSX running an M1 chip:

  1. change paths in pyinstaller-OSX.spec to where your python site-packages are installed
  2. python -m PyInstaller --windowed --additional-hooks-dir=. pyinstaller-OSX.spec
  3. run & build packages_OSX_arm64.pkgproj

Needed applications

  • python 3.10
  • pyinstaller 4.5.1 - For Windows
  • Inno Setup Compiler - For Windows
  • packages - For OSX

Needed Packages (there may be others)

  • pip install psutil
  • pip install wxPython
  • pip install pandas
  • pip install gensim
  • pip install bitermplus
  • pip install spacy
  • pip install nltk
  • pip install tweepy
  • pip install chardet
  • pip install dateparser
  • pip install jsonpickle
  • pip install wordcloud
  • pip install squarify
  • pip install networkx
  • pip install pyinstaller
  • pip install pytz
  • pip install lxml
  • pip install xmlschema
  • pip install scikit-learn
  • pip install top2vec
  • pip install bertopic
  • pip install squarify

Additional Steps

  • python -m spacy download fr_core_news_sm
  • python -m spacy download en_core_web_sm

cta_new-main-2's People

Contributors

amandeep25 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.