#SF Open Dataset: Download from Kaggle: https://www.kaggle.com/datasets/san-francisco/sf-registered-business-locations-san-francisco
Original source & license infos: https://data.sfgov.org/Economy-and-Community/Registered-Business-Locations-San-Francisco/g8m3-pdis
Requires a Machine with at least 8GB of RAM
wsl --shutdown
notepad "$env:USERPROFILE/.wslconfig"
.wslconfig file:
[wsl2]
memory=4GB # Limits VM memory in WSL 2 up to 4GB
sudo sysctl -w vm.max_map_count=262144
install the loader pip install elasticsearch-loader[parquet]
Execute the loader from your WSL on Windows
This takes about 3.5 minutes on my machine
elasticsearch_loader --index my_app_scans --type scans parquet /mnt/c/Users/Andreas/Documents/GitHub/ElasticSearch-contact-tracing/data/sf_appscans.parquet.gzip
pip install streamlit-folium
- before creating the parquet file change the data types of the dataframe so that they fit
- Change the datatype for the postal code to int when you create it, load the data into Elasticsearch and make sure it's int
- Use a group query instead of scanning 1000 documents and then remove the duplicates in the Streamlit Dataframe
- Create a client that writes new scans to Elasticsearch whenever you create a scan
- create a dashboard on Kibana with stats about locations or people