Code Monkey home page Code Monkey logo

colab-gensim-mallet's Introduction

Colab + Gensim + Mallet

September 14, 2021
Geoff Ford
https://polsci.github.io/
https://github.com/polsci/

See also: Binder + Gensim + Mallet

Introduction

This repository is designed for students in DIGI405 at the University of Canterbury to do topic modeling through their browser using Google Colab. It is relevant for others who want to do topic modeling through a browser with their own corpus.

Note: The notebook has been updated to enforce Gensim v3.8 (the last version to support running topic models via Mallet).

A note to DIGI405 students

Make sure you are saving your notebook regularly as Google Colab times out (pretty sure this is after 90 minutes - if you can find the official Google documentation to confirm this please let me know!).

Steps for DIGI405:

  1. Launch the notebook in Google Colab (see below)
  2. Run the first cells to upgrade Gensim and install Java and Mallet.
  3. Run the cell to upload and extract the corpus zip file. Warning: uploads are quite slow.
  4. Use the notebook to create your topic model.

A note to everyone

Before running the notebook, please read the Google Colab FAQ.

Launch the notebook in Google Colab

Click here to run the notebook:
Launch on Google Colab

Not in DIGI405?

If you are not from this course, you can of course upload your own corpus as a zip. Your corpus should consist of a single directory of txt files (one document per txt file). This isn't the fastest way to run topic models, but allows you to create a topic model through your browser without installing any software.

A note about pyLDAvis

The environment should support pyLDAvis, however this is not implemented in the sample notebook. Add a cell like this to install it:

!pip install pyLDAvis

Add a cell like this to run it (note: this is sloooowwww and not recommended!):

import pyLDAvis.gensim as gensimvis
import pyLDAvis
vis_data30 = gensimvis.prepare(gensimmodel30, doc_term_matrix, dictionary)
pyLDAvis.display(vis_data30)

colab-gensim-mallet's People

Contributors

polsci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

colab-gensim-mallet's Issues

Help with "Functions to load and preprocess the corpus..."

Hello! When I get to the "Functions to load and preprocess the corpus and create the document-term matrix" step I get stuck. I am just getting into coding through my digital humanities class, and know next to nothing. I am trying to run a topic model on a data set, but once I get to this step I am not sure what to add/change in the code. Any help would be greatly appreciated!

Copy Mallet Folder into '/content'

Hi, Could you please help in terms of how can I copy the mallet folder to '/content/' folder under google drive. As I can only see till 'My Drive'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.