amsterdam-internships / venue-accessibility-google-reviews Goto Github PK

View Code? Open in Web Editor NEW

NLP pipeline for extracting insights related to venue accessibility in Amsterdam.

Shell 7.21% Python 91.88% TeX 0.91%

venue-accessibility-google-reviews's Introduction

Venue Accessibility with Google Reviews

This project aims to highlight the perspective of people with Reduced Mobility (RM) living in Amsterdam via analysis of public venue reviews. This is done using Natural Language Processing (NLP) techniques such as Aspect Based Sentiment Analysis and Opinion Summarisation.

It is an extension of the work carried out by L. Da Rocha Bazilio, to understand how different models are able to extract aspects in reference to accessibility with data that is noisy e.g. a mixture of topics, instead of accessibility focused only.

In addition, understanding the impact of Opinion Summarisation on the reviews about accessibility. Also, how this can make activity and journey planning for those with RM easier.

This is an example of the UI of the application that pipeline will be connected to:

Project Folder Structure

There are the following folders in the structure:

data: This folder includes data for the following purposes:
1. external: This includes data from third party sources.
2. interim: Intermediate data that has been transformed.
3. processed: Finalised datasets for modelling.
4. raw: The original immutable data.
datasets: This is where you should place your data for training and testing.
media: This is where results of each step of the pipeline are stored as images.
models:Trained and serialized models, model predictions, or model summaries
notebooks: This contains the notebooks of the pipeline.
reports: Generated analysis as HTML, PDF, LaTeX, etc.
1. figures: Generated graphics and figures to be used in reporting.
results: Here you will find the txt form of the results.
src: Folder for all source files specific to this project
scripts: Folder with example scripts for performing different tasks (could serve as usage documentation)
tests Here I store all of the tests for project

Installation

Clone this repository:

git clone [email protected]:Amsterdam-Internships/Venue-Accessibility-Google-Reviews.git

Install all dependencies:

conda install environment.yml or pip install requirements.txt

Setup

In order to setup the pipeline you need to download the google test data from here and the euan's guide data here

place the google test data in the test folder and the euans dataset in the train folder.

To setup the environment please create your own .env file with a variable called $LOCAL_ENV this is where you should keep the file path of your home directory. Then you will be able to connect this to the relative paths of this repo.

E.g. $LOCAL_ENV = /Users/yourname/Venue-Accessibility-Google-Reviews

To run this pipeline run the bash script titled 'full_pipeline.sh'

How it works

The image below displays the workflow of the pipeline, these steps correspond to the folder structure of the pipeline.

Acknowledgements

This repository was created in collaboration with Amsterdam Intelligence for the City of Amsterdam. It is based on the prior work of Lizzy Da Rocha Bazilio.

venue-accessibility-google-reviews's People

Contributors

Stargazers

Watchers

venue-accessibility-google-reviews's Issues

Start Recording references

Turn the references that you collect into IEEE format so that you can use it directly in LaTEX

Ask Xander about drafts

Test new issueTest new issuehttps://www.notion.so/Test-new-issue-692248b8c522499e989ce0bbf327df8d

Untitled Page

Add back charts for the results section

Run the pre-trained pipeline (as is, without modifications) on the google reviews Aspect classification

Start Litearture Review

Refresh your fundamentals

Refining Introduction

Write the discussion section

Python Scripting Improvements

Modify the code for the pipeline

Sentiment Analysis Evaluation Metrics Issues

The accuracy, precision and recall are suspiciously high. There could be an issue with how you have split the training and test set that impacts prediction performance.

Refining & Checking Labelling

Review 2023-03-14

update readme

update description (can be 1 sentence)
update image (e.g. use general image from this post)
update folder structure
update installation (e.g. the link)

Other

add/update required packages (pip requirements, some conda yml, whatever). Requirements slide around here (for conda - something like conda env export > env.yml )
move requirements out of src
keep usage in mind (no need to have all args figured out right away though)
remove sections/content from the template that you definitely not need
slowly remove template content (example images, tests?)
! please rename functions and arguments

Data

remove all data
possibly keep small files with sample data so people know what is expected (e.g. just a few FAKE reviews)
add in installation / setup section remark that you expect these files under that folder (feel free to add a remark that people can contact me or someone form the gemeente for the full dataset)
a warning for missing dataset before the code has crashed? :D

Random

metrics to plots and tables instead of prints (e.g. straight to latex
Iva to send set path example

Have a look at Lizzie’s code.Have a look at Lizzie’s code.https://www.notion.so/Have-a-look-at-Lizzie-s-code-bda4152444be426e8c9c9a0b9aa82e2c

Have a look at Lizzie’s code

Create your own environment in Visual Studio Code so that you can run your code.
Look at the parts of Lizzie’s code that you think is useful.

Review 2023-05-02

update readme with setting up and usage instructions (.env + paths + scripts + args)
update steps -> make_dataset -> train -> ???
clean up / document dataset making/cleaning - no need for the 1line functions, improve naming (e.g. select aspects?!, removing nans does more than removing nans, etc)
no spaces in folder/file names (in processed)