🏠 Similar NYC Airbnb Listings Finder

Note

This is a take-home assignment completed in ~3 hours.

This is an app for searching for similar NYC Airbnb listings in different neighborhoods, given a listing ID. It provides a FastAPI backend for serving the similar listings and a Streamlit web app for a user-friendly interface.

Demo

Check out the live demo of the Streamlit app

Features

Search for similar Airbnb listings based on a given listing ID (the API and app both provide ways to retrieve a random ID if you don't have one)
Filter similar listings by price tolerance and accommodation capacity
Access through a FastAPI backend or Streamlit interface

Getting Started

Prerequisites

Python 3.10
Make (for running setup commands)

Installation

Clone the repository:

git clone https://github.com/naingthet/similar-listings.git
cd similar-listings

Create a virtual environment:

make venv

Activate the virtual environment:

For Windows:
```
venv\Scripts\activate
```
For macOS and Linux:
```
source venv/bin/activate
```

Install the dependencies:

make install

Usage

Set the Pinecone API key (ask Thet for this key):

export PINECONE_API_KEY={PINECONE_API_KEY}

Start the FastAPI server (main.py):

make start-api

The API will be available at http://localhost:8000.

Start the Streamlit app (app.py):

make start-app

The Streamlit app will open in your default web browser.

How It Works

The app uses instruction-tuned embeddings from the hkunlp/instructor-large model to encode custom text representations of key information about Airbnb listings. These embeddings are generated for all listings and uploaded to a Pinecone index for efficient similarity search.

When searching for similar listings, the app considers the overall characteristics of the listings, such as the property type, room type, amenities, and location. Additionally, it allows users to set limitations on the price and the number of people the listing accommodates to refine the search results. E.g. by setting a price_tol of 0.1, all similar listings will be within +/-10% of the original listing.

The FastAPI backend handles the similarity search requests and returns the top similar listings. The Streamlit web app provides a user-friendly interface for inputting the listing ID and search parameters, displaying the search results, and navigating through the similar listings.

Design Decisions and Assumptions

Data Filtering

The app filters for listings that have non-null values for each of the selected attributes to ensure data quality and consistency.

Similarity Criteria

Listings from the same neighborhood are filtered out.
The primary assumption is that people are typically driven by price constraints and necessary headcount when searching for similar listings. Therefore, the app uses these criteria to filter down the similar listings.
After applying the price and headcount filters, the app focuses on surfacing listings that are similar in essence, by considering numerical features such as beds and review_scores_rating as well as textual features such as name and description.

Embedding Model

The app uses instructor embeddings (hkunlp/instructor-large) because they allow encoding multiple fields of different types in the context of the task, while using a relatively small model without the need for fine-tuning or building a complex search and rerank system.
Instructor embeddings provide a fast and efficient way to encode the relevant information about listings for similarity search.

Alternative Approaches

Composite Score

An alternative approach could have been to create a composite score based on multiple features, such as multiple sets of embeddings and numerical features for each listing.
However, this approach would require returning a much larger top-k result set, running multiple encodings, and reranking the results, which can be computationally expensive compared to the implemented approach.

Additional Fields

The fields used in this project are: name, description, host_is_superhost, price, accommodates, room_type, beds, bathrooms, review_scores_rating
The app could have incorporated many more fields to capture additional aspects of the listings, but it was kept simple for demonstration purposes.

Different Embedding Techniques

Different embedding techniques, such as ColBERT (Contextualized Late Interaction over BERT), could have been explored for generating contextual embeddings.

Limitations and Potential Improvements

The app currently relies on a single embedding model and a limited set of attributes for similarity search. Expanding the set of attributes and exploring additional embedding techniques could potentially enhance the quality of similar listing recommendations.
The app assumes that the selected attributes are sufficient for capturing the essential characteristics of listings. Further analysis and user feedback could help identify additional attributes that are important for similarity search.
The app focuses on similarity search within the same city. Extending the app to support cross-city or even cross-country similar listing recommendations could be a valuable addition.
Incorporating user feedback and learning from user interactions could enable personalized and more accurate similar listing suggestions over time.

naingthet / similar-listings Goto Github PK

similar-listings's Introduction