This repository contains Python code to retrieve Steam games with similar store banners, using OpenAI's CLIP.
Image similarity is assessed by the cosine similarity between image features encoded by CLIP.
- Install the latest version of Python 3.X.
- Install the required packages:
python -m pip install --upgrade pip
pip install -r requirements.txt
Data is available in download-steam-banners-data/
.
The most recent data snapshot was downloaded with this Colab notebook on January 9, 2021.
This snapshot is shared as an archive (original_vertical_steam_banners.tar
, 1.5 GB) on Google Drive.
It consists of vertical Steam banners (300x450 resolution), available for 29982 out of 48792 games, i.e. 61.4% of games.
Resized images are provided in the same repository for resolutions 256, 224, 128, 64, etc.
The list of appIDs (before any potential filtering) is from steam-store-snapshots
.
Information is also provided in .txt
logs about a possible filtering out of images based on:
- image size (before resizing images):
- there is 1 image with resolution 600x900,
- this is not a big issue as the image ratio is equal to the expected ratio for 300x450 images,
- image channels (before and after resizing images):
- most images are 'RGB' (for true color images) ; total: 29642 images,
- a few images are 'L' ('luminance' for greyscale images) ; total: 306 images,
- very few images are 'CMYK' (for pre-press images) ; total: 34 images,
- blank images:
- there are 2 images either totally black (appID: 603280) or totally white (appID: 1076060),
- these specific images are not reported about since they already appear in the log w.r.t. image channels.
It is up to the reader to filter out the dataset based on these logs. Logs can be reproduced with this Colab notebook.
Run match_steam_banners_with_CLIP.ipynb
.
This will:
- compute and store the 512 features corresponding to each banner,
- find the 10 most similar store banners to curated query appIDs,
- find the one most similar store banner to all appIDs available on the store, then display the most unique games.
NB: by default, query appIDs consist of:
- the top 100 most played games during the past two weeks, according to SteamSpy,
- a few manually curated games.
NB: unique games are ones which are the most dissimilar (low similarity score) to others to their first neighbor.
Results can be interactively explored with web apps:
The CLIP embedding for the ~30k banners is shared on Google Drive.
Results obtained with OpenAI's CLIP are shown on the Wiki.
The linked pages contain a lot of images and might be slow to load depending on your Internet bandwidth.
Direct links to similarity results are available below:
- for each game, find the 10 most similar games.
Direct links to similarity results are available below:
- for each unique game, display the 1 most similar game,
- a grid of unique games.
- Google's ViT (Vision Transformer):
- Open AI's CLIP (Contrastive Language-Image Pre-Training):
- My usage of CLIP:
steam-CLIP
: retrieve games with similar banners, using OpenAI's CLIP (resolution 224),steam-image-search
: retrieve games using natural language queries,heroku-flask-api
: serve the matching results through an API built with Flask on Heroku,heroku-clip
: deploy CLIP on Heroku,
- MobileNet v3:
match-steam-banners
: retrieve games with similar banners, using MobileNet v3 (resolution 256),
- MobileNet v1:
download-steam-banners
: retrieve games with similar banners, using MobileNet v1 (resolution 128),download-steam-screenshots
: retrieve games with similar screenshots, using MobileNet v1 (resolution 128).