Code Monkey home page Code Monkey logo

functions-python-ldamodeling's Introduction

page_type languages products description
sample
python
azure
azure-functions
azure-storage
The sample uses a HttpTrigger to accept a dataset from a blob and performs a set of tasks.

Topic Classification using Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA) is a statistical model that classifies a document as a mixture of topics.

The sample uses a HttpTrigger to accept a dataset from a blob and performs the following tasks:

  • Tokenization of the entire set of documents using NLTK
  • Removes stop words and performs lemmatization on the documents using NLTK.
  • Classifies documents into topics using LDA API's from gensim Python library
  • Returns a visualization of topics from the dataset using PyLDAVis Python library

Getting Started

Deploy to Azure

Prerequisites

  • Install Python 3.6+
  • Install Functions Core Tools
  • Install Docker
  • Note: If run on Windows, use Ubuntu WSL to run deploy script

Steps

  • Click Deploy to Azure Button to deploy resources

Deploy to Azure

or

  • Deploy through Azure CLI

    • Open AZ CLI and run az group create -l [region] -n [resourceGroupName] to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.)
    • Run az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json
  • Run pip install nltk to install the NLTK Python package

  • Run python3 deploy/download.py to download dataset, tokenizers and stopwords from NLTK. Typically this will get downloaded to $HOME/nltk_data

  • Make sure you have a service principal created. Follow instructions here

  • Run sh deploy/deploy.sh (in Ubuntu WSL or any shell) to deploy function code and content to blob containers.

  • Deploy Function App

Test

  • Send the following body in a HTTP POST request
{
    "container_name" : "dataset",
    "num_topics" : "5" 
}
  • Sample response
{
    "lda_model_url": "https://ldamdlstore.blob.core.windows.net/ldamodel/ldamodel",
    "token_data_url": "https://ldamdlstore.blob.core.windows.net/ldamodel/token_data"
}
  • Visualizing topics through PyLDAVis

    • Open the jupyter notebook VisualizeTopics.ipynb file using instructions here

    • In the notebook, plugin values from sample response for LDA_MODEL_BLOB_URL and TOKEN_DATA_URL

Inline-style: alt text

References

functions-python-ldamodeling's People

Contributors

microsoftopensource avatar msftgits avatar msha1026 avatar priyaananthasankar avatar supernova-eng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

functions-python-ldamodeling's Issues

Need to install NLTK before running python3 download.py

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Run download.py without nltk installed in python environment

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

FUNCTIONS_WORKER_RUNTIME needs to be set to python locally before function deployment

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Run func azure functionapp publish ldamodeling101 --build-native-deps without FUNCTIONS_WORKER_RUNTIME being set to python locally

Any log messages given by the failure

Your Azure Function App has 'FUNCTIONS_WORKER_RUNTIME' set to 'python' while your local project is set to 'None'. You can pass --force to update your Azure app with 'None' as a 'FUNCTIONS_WORKER_RUNTIME'

Expected/desired behavior

Successful deployment of function app

OS and Version?

Windows 10

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Unify to one deploy.sh

After ARM template is used to deploy resources to Azure, unify all the below steps into one deploy.sh

This script must include all the steps and check for prerequisites.

Unify all of the below steps into one deploy.sh to make things easier.

  • Open AZ CLI and run az group create -l [region] -n [resourceGroupName] to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.)

  • Run az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json --parameters parameters.json

  • Run python3 download.py to download dataset, tokenizers and stopwords from NLTK. Typically this will get downloaded to $HOME/nltk_data

  • Run deploy.sh to deploy function code and content to blob containers

  • Create/Activate virtual environment

  • Run func azure functionapp publish [functionAppName] --build-native-deps

Running deploy.sh on windows

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [ x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Try to run deploy.sh on Windows.

Expected/desired behavior

Either instructions on how to use or a Windows friendly file.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Win10

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Asavari Review

  1. Remove virtual environment creation for azure deployment.

Host.json required to deploy function app

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [X] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Run func azure functionapp publish ldamodeling101 --build-native-deps without host.json

Any log messages given by the failure

'Unable to find project root. Expecting to find one of host.json in project root.'

Expected/desired behavior

Successful deployment of Azure function

OS and Version?

Windows 10

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Understanding deploy.sh

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Who is the target audience for this? Is there a way to make it easier/more descriptive to get Service Principal App Id / Password /Tenant Id?

Thanks! We'll be in touch soon.

Running sample HTTP POST fails

Please provide us with the following information:

This issue is for a: (mark with an x)

- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

HTTP POST to deployed function in Azure portal results in 500 error and below logs

Any log messages given by the failure

2019-04-19T22:05:53.597 [Error] Executed 'Functions.TrainLDA' (Failed, Id=c57c5a4c-2e7f-4235-a8b0-7ec9733d61b1)
Result: Failure
Exception: LookupError:


Resource �[93mwordnet�[0m not found.
Please use the NLTK Downloader to obtain the resource:

�[31m>>> import nltk

nltk.download('wordnet')
�[0m
Searched in:
- '/home/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/root/.pyenv/versions/3.6.8/nltk_data'
- '/root/.pyenv/versions/3.6.8/share/nltk_data'
- '/root/.pyenv/versions/3.6.8/lib/nltk_data'


Stack: File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/azure/functions_worker/dispatcher.py", line 288, in _handle__invocation_request
self.__run_sync_func, invocation_id, fi.func, args)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/azure/functions_worker/dispatcher.py", line 347, in __run_sync_func
return func(**params)
File "/home/site/wwwroot/TrainLDA/init.py", line 16, in main
lda_blob_url = classifier.classify(container_name,num_topics)
File "/home/site/wwwroot/TrainLDA/topic_classify.py", line 55, in classify
tokens = clean_text(doc)
File "/home/site/wwwroot/TrainLDA/topic_classify.py", line 106, in clean_text
tokens = [lemmatize(token) for token in tokens]
File "/home/site/wwwroot/TrainLDA/topic_classify.py", line 106, in
tokens = [lemmatize(token) for token in tokens]
File "/home/site/wwwroot/TrainLDA/topic_classify.py", line 88, in lemmatize
lemma = wn.morphy(word)
File "/home/site/wwwroot/worker_venv/lib/python3.6/site-packages/nltk/corpus/util.py", line 116, in getattr
self.__load()
File "/home/site/wwwroot/worker_venv/lib/python3.6/site-packages/nltk/corpus/util.py", line 81, in __load
except LookupError: raise e
File "/home/site/wwwroot/worker_venv/lib/python3.6/site-packages/nltk/corpus/util.py", line 78, in __load
root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
File "/home/site/wwwroot/worker_venv/lib/python3.6/site-packages/nltk/data.py", line 675, in find
raise LookupError(resource_not_found)

Expected/desired behavior

Successful HTTP POST and reply with HTML link

OS and Version?

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Why install Docker?

This issue is for a: (mark with an x)

- [x ] documentation issue or request

Why is Docker listed as a requirement?

Parameters.json missing

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

The deployment by CLI instructions has --parameters parameters.json but I do not see a corresponding file.

Expected/desired behavior

Include the parameters.json file or indicate where to get it.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Win10

Mention any other details that might be useful

This is in the section on deploying by CLI.


Thanks! We'll be in touch soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.