Code Monkey home page Code Monkey logo

similar-listings's Introduction

๐Ÿ  Similar NYC Airbnb Listings Finder

CI

Note

This is a take-home assignment completed in ~3 hours.

This is an app for searching for similar NYC Airbnb listings in different neighborhoods, given a listing ID. It provides a FastAPI backend for serving the similar listings and a Streamlit web app for a user-friendly interface.

Demo

Check out the live demo of the Streamlit app

Features

  • Search for similar Airbnb listings based on a given listing ID (the API and app both provide ways to retrieve a random ID if you don't have one)
  • Filter similar listings by price tolerance and accommodation capacity
  • Access through a FastAPI backend or Streamlit interface

Getting Started

Prerequisites

  • Python 3.10
  • Make (for running setup commands)

Installation

  1. Clone the repository:
git clone https://github.com/naingthet/similar-listings.git
cd similar-listings
  1. Create a virtual environment:
make venv
  1. Activate the virtual environment:
  • For Windows:

    venv\Scripts\activate
    
  • For macOS and Linux:

    source venv/bin/activate
    
  1. Install the dependencies:
make install

Usage

  1. Set the Pinecone API key (ask Thet for this key):
export PINECONE_API_KEY={PINECONE_API_KEY}
  1. Start the FastAPI server (main.py):
make start-api

The API will be available at http://localhost:8000.

  1. Start the Streamlit app (app.py):
make start-app

The Streamlit app will open in your default web browser.

How It Works

The app uses instruction-tuned embeddings from the hkunlp/instructor-large model to encode custom text representations of key information about Airbnb listings. These embeddings are generated for all listings and uploaded to a Pinecone index for efficient similarity search.

When searching for similar listings, the app considers the overall characteristics of the listings, such as the property type, room type, amenities, and location. Additionally, it allows users to set limitations on the price and the number of people the listing accommodates to refine the search results. E.g. by setting a price_tol of 0.1, all similar listings will be within +/-10% of the original listing.

The FastAPI backend handles the similarity search requests and returns the top similar listings. The Streamlit web app provides a user-friendly interface for inputting the listing ID and search parameters, displaying the search results, and navigating through the similar listings.

Design Decisions and Assumptions

Data Filtering

  • The app filters for listings that have non-null values for each of the selected attributes to ensure data quality and consistency.

Similarity Criteria

  • Listings from the same neighborhood are filtered out.
  • The primary assumption is that people are typically driven by price constraints and necessary headcount when searching for similar listings. Therefore, the app uses these criteria to filter down the similar listings.
  • After applying the price and headcount filters, the app focuses on surfacing listings that are similar in essence, by considering numerical features such as beds and review_scores_rating as well as textual features such as name and description.

Embedding Model

  • The app uses instructor embeddings (hkunlp/instructor-large) because they allow encoding multiple fields of different types in the context of the task, while using a relatively small model without the need for fine-tuning or building a complex search and rerank system.
  • Instructor embeddings provide a fast and efficient way to encode the relevant information about listings for similarity search.

Alternative Approaches

Composite Score

  • An alternative approach could have been to create a composite score based on multiple features, such as multiple sets of embeddings and numerical features for each listing.
  • However, this approach would require returning a much larger top-k result set, running multiple encodings, and reranking the results, which can be computationally expensive compared to the implemented approach.

Additional Fields

  • The fields used in this project are: name, description, host_is_superhost, price, accommodates, room_type, beds, bathrooms, review_scores_rating
  • The app could have incorporated many more fields to capture additional aspects of the listings, but it was kept simple for demonstration purposes.

Different Embedding Techniques

  • Different embedding techniques, such as ColBERT (Contextualized Late Interaction over BERT), could have been explored for generating contextual embeddings.

Limitations and Potential Improvements

  • The app currently relies on a single embedding model and a limited set of attributes for similarity search. Expanding the set of attributes and exploring additional embedding techniques could potentially enhance the quality of similar listing recommendations.
  • The app assumes that the selected attributes are sufficient for capturing the essential characteristics of listings. Further analysis and user feedback could help identify additional attributes that are important for similarity search.
  • The app focuses on similarity search within the same city. Extending the app to support cross-city or even cross-country similar listing recommendations could be a valuable addition.
  • Incorporating user feedback and learning from user interactions could enable personalized and more accurate similar listing suggestions over time.

similar-listings's People

Contributors

naingthet avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.