Code Monkey home page Code Monkey logo

masters-thesis's Introduction

Installation

  • pip install -m requirements.txt
  • Spacy requires the following:
    • python -m spacy download en_core_web_sm
  • Download the main dataset here. Replace it in the data directory.
  • Finally, you can run jupyter lab in order to view the code.

Data exploration

NOTE: make sure to run all scripts (i.e., python files) from their specific path in the terminal.

  • First, you need to run the retrieve_OCM_labels.py script. It will create two json files that are needed for data exploration.
  • The Data Exploration & Processing notebook is ready to be executed. This may take some time, since we validate the the rows' language is English. You don't need to run it, since I ran all cells and saved the en_data.csv in the data directory.
    • In case you want to run it, please make sure to download the fpsc3 dataset and place it in the data directory.
  • The Chosen Categories and Cultures Distribution notebooks explore the categories that would potentiall be chosen in my thesis, and the cultures within the eHRAF database given these categories.
  • All images will be saved within the exploration directory.

Training

NOTE: both models are executed and the results are the same as listed in my paper. So, no need to run them and wait for training, however, feel free to do so.

  • The Model 112 (Training102) notebook contains all needed code to train the models. The dataset specified is the data/en_data.csv.
    • The results will be saved as pickle files.
  • In the Model 113 (Analysis, hidden cues, text) notebook, I analyze the results of the model by reading the results saved during training.

masters-thesis's People

Contributors

hasan-sh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.