vicco-group / thingsvision Goto Github PK

View Code? Open in Web Editor NEW

145.0 6.0 19.0 2.87 MB

Python package for extracting representations from state-of-the-art computer vision models

Home Page: https://vicco-group.github.io/thingsvision/

License: MIT License

Python 81.09% Shell 0.38% Jupyter Notebook 18.53%

pytorch neural-networks computer-vision tensorflow deep-learning representations alignment cognitive-science

thingsvision's Introduction

📔 Table of Contents

About the Project
- Functionality
- Model collection
Getting Started
- Setting up your environment
- Basic usage
Contributing
License
Citation
Contributions

🌟 About the Project

thingsvision is a Python package for extracting (image) representations from many state-of-the-art computer vision models. Essentially, you provide thingsvision with a directory of images and specify the neural network you're interested in. Subsequently, thingsvision returns the representation of the selected neural network for each image, resulting in one feature map (vector or matrix, depending on the layer) per image. These features, used interchangeably with image representations, can then be used for further analyses.

🚨 NOTE: some function calls mentioned in the original paper have been deprecated. To use this package successfully, exclusively follow this README and the documentation! 🚨

(back to top)

🦾 Functionality

With thingsvision, you can:

extract features for any imageset from many popular networks.
extract features for any imageset from your custom networks.
extract features for >26,000 images from the THINGS image database.
align the extracted features with human object perception (e.g., using gLocal).
extract features from HDF5 datasets directly (e.g., NSD stimuli)
conduct basic Representational Similarity Analysis (RSA) after feature extraction.
perform efficient Centered Kernel Alignment (CKA) to compare image features across model-module combinations.

(back to top)

🗄️ Model collection

Neural networks come from different sources. With thingsvision, you can extract image representations of all models from:

torchvision
Keras
timm
ssl (self-supervised learning models)
- simclr-rn50, mocov2-rn50, barlowtwins-rn50, pirl-rn50
- jigsaw-rn50, rotnet-rn50, swav-rn50, vicreg-rn50
- dino-rn50, dino-xcit-{small/medium}-{12/24}-p{8/16}
- dino-vit-{tiny/small/base}-p{8/16}
- dinov2-vit-{small/base/large/giant}-p14
- mae-vit-{base/large}-p16, mae-vit-huge-p14
OpenCLIP models (CLIP trained on LAION-{400M/2B/5B})
CLIP models (CLIP trained on WiT)
a few custom models (Alexnet, VGG-16, Resnet50, and Inception_v3) trained on Ecoset rather than ImageNet and one Alexnet model pretrained on ImageNet and fine-tuned on SalObjSub
each of the many CORnet versions (recurrent vision models)
Harmonization models (see Harmonization repo). The default variant is ViT_B16. Other available models are ResNet50, VGG16, EfficientNetB0, tiny_ConvNeXT, tiny_MaxViT, and LeViT_small
DreamSim models (see DreamSim repo). The default variant is open_clip_vitb32. Other available models are clip_vitb32, dino_vitb16, and an ensemble. See the docs for more information
FAIR's Segment Anything (SAM) model
Kakaobrain's ALIGN implementation

(back to top)

🏃 Getting Started

💻 Setting up your environment

Working locally

First, create a new conda environment with Python version 3.8, 3.9, or 3.10 e.g. by using conda:

$ conda create -n thingsvision python=3.9
$ conda activate thingsvision

Then, activate the environment and simply install thingsvision via running the following pip command in your terminal.

$ pip install --upgrade thingsvision
$ pip install git+https://github.com/openai/CLIP.git

If you want to extract features for harmonized models from the Harmonization repo, you have to additionally run the following pip command in your thingsvision environment (FYI: as of now, this seems to be working smoothly on Ubuntu only but not on macOS),

$ pip install git+https://github.com/serre-lab/Harmonization.git
$ pip install keras-cv-attention-models>=1.3.5

If you want to extract features for DreamSim from the DreamSim repo, you have to additionally run the following pip command in your thingsvision environment,

$ pip install dreamsim==0.1.2

See the docs for which DreamSim models are available in thingsvision.

Google Colab

Alternatively, you can use Google Colab to play around with thingsvision by uploading your image data to Google Drive (via directory mounting). You can find the jupyter notebook using PyTorch here and the TensorFlow example here.

(back to top)

🔍 Basic usage

Command Line Interface (CLI)

thingsvision was designed to simplify feature extraction. If you have some folder of images (e.g., ./images) and want to extract features for each of these images without opening a Jupyter Notebook instance or writing a Python script, it's probably easiest to use our CLI. The interface includes two options,

thingsvision show-model
thingsvision extract-features

Example calls might look as follows:

thingsvision show-model --model-name "alexnet" --source "torchvision"
thingsvision extract-features --image-root "./data" --model-name "alexnet" --module-name "features.10" --batch-size 32 --device "cuda" --source "torchvision" --file-format "npy" --out-path "./features"

See thingsvision show-model -h and thingsvision extract-features -h for a list of all possible arguments. Note that the CLI provides just the basic extraction functionalities but is probably enough for most users that don't want to dive too deep into various models and modules. If you need more fine-grained control over the extraction itself, we recommend to use the python package directly and write your own Python script.

Python commands

To do this start by importing all the necessary components and instantiating a thingsvision extractor. Here we're using CLIP from the official clip repo as the model to extract features from and also load the model to GPU for faster inference,

import torch
from thingsvision import get_extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

model_name = 'clip'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
    'variant': 'ViT-L/14'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters=model_parameters,
)

As a next step, create both dataset and dataloader for your images. We assume that all of your images are in a single root directory which can contain subfolders (e.g., for individual classes). Therefore, we leverage the ImageDataset class.

root='path/to/your/image/directory' # (e.g., './images/)
batch_size = 32

dataset = ImageDataset(
    root=root,
    out_path='path/to/features',
    backend=extractor.get_backend(), # backend framework of model
    transforms=extractor.get_transformations(resize_dim=256, crop_dim=224) # set the input dimensionality to whichever values are required for your pretrained model
)

batches = DataLoader(
    dataset=dataset,
    batch_size=batch_size,
    backend=extractor.get_backend() # backend framework of model
)

Now all that is left is to extract the image features and store them on disk! Here we're extracting features from the image encoder module of CLIP (visual), but if you don't know which modules are available for a given model, just call extractor.show_model() to print all the modules.

module_name = 'visual'

features = extractor.extract_features(
    batches=batches,
    module_name=module_name,
    flatten_acts=True,
    output_type="ndarray", # or "tensor" (only applicable to PyTorch models of which CLIP and DINO are ones!)
)

save_features(features, out_path='path/to/features', file_format='npy') # file_format can be set to "npy", "txt", "mat", "pt", or "hdf5"

Feature extraction with custom data pipeline

PyTorch

module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)
my_dataset = ...
my_dataloader = ...

with extractor.batch_extraction(module_name, output_type="tensor") as e: 
  for batch in my_dataloader:
    ... # whatever preprocessing you want to add to the batch
    feature_batch = e.extract_batch(
      batch=batch,
      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
      )
    ... # whatever post-processing you want to add to the extracted features

TensorFlow / Keras

module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, TFRecords files)
my_dataset = ...
my_dataloader = ...

for batch in my_dataloader:
  ... # whatever preprocessing you want to add to the batch
  feature_batch = extractor.extract_batch(
    batch=batch,
    module_name=module_name,
    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
    )
  ... # whatever post-processing you want to add to the extracted features

Human alignment

Human alignment: If you want to align the extracted features with human object similarity according to the approach introduced in Improving neural network representations using human similiarty judgments you can optionally align the extracted features using the following method:

aligned_features = extractor.align(
    features=features,
    module_name=module_name,
    alignment_type="gLocal",
)

For more information about the available alignment types and aligned models see the docs.

For more examples on the many models available in thingsvision and explanations of additional functionality like how to optionally turn off center cropping, how to use HDF5 datasets (e.g. NSD stimuli), how to perform RSA or CKA, or how to easily extract features for the THINGS image database, please refer to the Documentation.

(back to top)

👋 How to contribute

If you come across problems or have suggestions please submit an issue!

(back to top)

⚠️ License

This GitHub repository is licensed under the MIT License - see the LICENSE.md file for details.

(back to top)

📃 Citation

If you use this GitHub repository (or any modules associated with it), please cite our paper for the initial version of thingsvision as follows:

@article{Muttenthaler_2021,
	author = {Muttenthaler, Lukas and Hebart, Martin N.},
	title = {THINGSvision: A Python Toolbox for Streamlining the Extraction of Activations From Deep Neural Networks},
	journal ={Frontiers in Neuroinformatics},
	volume = {15},
	pages = {45},
	year = {2021},
	url = {https://www.frontiersin.org/article/10.3389/fninf.2021.679838},
	doi = {10.3389/fninf.2021.679838},
	issn = {1662-5196},
}

(back to top)

💎 Contributions

This is a joint open-source project between the Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, and the Machine Learning Group at Technische Universtität Berlin. Correspondence and requests for contributing should be adressed to Lukas Muttenthaler. Feel free to contact us if you want to become a contributor or have any suggestions/feedback. For the latter, you could also just post an issue or engange in discussions. We'll try to respond as fast as we can.

(back to top)

thingsvision's People

Contributors

Stargazers

Watchers

Forkers

jrcheeseman matthew-brett lukasmut cswin karadumandilara changdedu philippkaniuth florianmahner allensmile rleipe alxmrphi thingsvision nmningmei patrickmineault hiki-t a1247418 syntactic kognijannis

thingsvision's Issues

update extract features to extract from multiple modules

Inside model.extract_features add a loop to extract features from multiple layers. Use defaultdict<list> to store activations.

add center cropping flag

Restrict input to extractor.show_model()

When executing module_name = extractor.show_model() the user is supposed to input the part of the model for which they would like to extract features. Currently, any value is accepted, even if the value is not a valid module name for the respective model.

Therefore, check the user's input whether it actually denotes a valid module name. If not, reject it and prompt the user again. This avoids an error message later in the pipeline when the user tries to execute features = extractor.extract_features(...) which might confuse the user.

UserWarning from torchvision/transforms

OS: macOS 10.14.6 (18G9323) (I know... ask my IT why)

I created and activated a fresh conda env using the environment.yml. I did not install any additional packages.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
batch_size = 68
model_name = 'clip-ViT'
module_name = 'visual'
backend = 'pt'
clip = True
out_path = ...

everything works as expected until I execute

dl = vision.load_dl(root,
                    out_path=out_path,
                    batch_size=batch_size,
                    transforms=model.get_transformations(),
                    backend=backend,
                    file_names=file_names)

Which outputs:

    /Users/kaniuth/Desktop/extractivations/conda_env/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332:
    UserWarning: Argument interpolation should be of type InterpolationMode instead of int. 
    Please, use InterpolationMode enum. warnings.warn(

The pipeline seems to continue normally; I can extract and save activations just fine. I just don't know whether that warning has any consequences on the extracted activations?

That warning did not occur with earlier versions of THINGSvision.

add alexnet weights for ecoset

add TF weights for a alexnet model trained on ecoset https://codeocean.com/capsule/9570390/tree/v1

add flatten activations possibility to TensorFlow version

Fix Center Crop error with Tensorflow

The center cropping raises currently an Exception with Tensorflow backend and CNN transformations activated.
https://github.com/ViCCo-Group/THINGSvision/blob/master/thingsvision/model_class.py#L317

The crop width 224 should not be greater than input width.
Condition x >= 0 did not hold element-wise:
x (shape=() dtype=int32) = 
-221

Move everything model-/source-specific to custom model file or source-specific extractor

Currently, we have some if-statements that address model-specific exceptions. Since these are exceptions rather than something general, we want to specify them in the custom model file or move them to source-specific extractor classes, if applicable. There's an if-statement that exclusively concerns clip models. This should go into the custom model file or the extractor classes for OpenCLIP.

Custom models

Allow a user to add their own (pretrained) models without the necessity to upload them to thingsvision / make a PR

pass a model object to the extractor class rather than a string for the model name (as is required now)

add VGG weights for ecoset

add a custom model for VGG without batch norm

circular import when importing ImageDataset

When I try to import just ImageDataset like from thingsvision.dataset import ImageDataset I get an error cannot import name 'ImageDataset' from partially initialized module 'thingsvision.dataset'. It comes from ImageDataset importing vision (https://github.com/ViCCo-Group/THINGSvision/blob/master/thingsvision/dataset.py#L12) which also imports ImageDatasets (https://github.com/ViCCo-Group/THINGSvision/blob/master/thingsvision/vision.py#L31).

Circular imports in `thingsvision.vision`

Cannot import thingsvision.dataaset.ImageDataset as a stand-alone, without prior import of thingsvision.vision. Code to reproduce:

from thingsvision.dataset import ImageDataset

throws: ImportError: cannot import name 'ImageDataset' from partially initialized module 'thingsvision.dataset' (most likely due to a circular import). Temporary fix is:

import thingsvision.vision
from thingsvision.dataset import ImageDataset

add vNet model plus link to pretrained weights

Implement low-memory option for feature extraction

Right now, features are stored in RAM when calling extractor.extract_features(...), which quickly makes memory usage explode. It might be a good idea to enable a low-memory option that just writes features to disk immediately.

Maybe add an output path to the extractor extractor = Extractor(out_path='...') and an extra flag extractor.extract_features(store_to_disk=True)?

Add ecoset

update download script to get ecoset VGG weights
update README with an example

model feature extraction not possible on cuda

if torch.device is cuda, then thingsvision.model_class.Model.extract_features() throws a type error, since features are torch.Tensor() and features = np.vstack(features) #L223 is not possible. suggested fix on #L221 features.append(act.cpu().numpy())

add travis tests

Deprecate `module_name` and automatically perform extraction for all available modules?

Update tests

Unittests should be more splitted in a test directory

add support for Tensorflow 2

Add support for Tensorflow 2 by:

adding a abstraction class for loading the model and extracting features

import error - scikit image

Hello there,

I would like to try your tool but there are some issue during import (please see below).

I have re-installed scikit-image but his did not help. Any ideas?

Thank you.

ImportError: dlopen(/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/geometry.cpython-39-darwin.so, 2): Symbol not found: ____chkstk_darwin
Referenced from: /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib (which was built for Mac OS X 10.15)
Expected in: /usr/lib/libSystem.B.dylib
in /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "", line 1, in
import thingsvision.vision as vision

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/thingsvision/vision.py", line 67, in
from skimage.transform import resize

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/init.py", line 124, in
_raise_build_error(e)

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/init.py", line 102, in _raise_build_error
raise ImportError("""%s

Your install of scikit-image appears to be broken.
Try re-installing the package following the instructions at:
https://scikit-image.org/docs/stable/install.html

update readme

Add to README that model names and layer names are backend specific

add tests with different batch sizes

add a test where different batch sizes are used (with and without remainders)
change the simple network architecture in the tests to have two hidden neurons instead of one

rename vgg ecoset model

rename the vgg ecoset model to VGG16bn_ecoset

source-specific extractor classes that inherit from a base extractor class

now, one can only use DEFAULT weights for torchvision which is limited behavior (our goal is flexible behavior)
we need to make feature extraction more flexible for a source by inheritance from a base extractor class and a backend-specific extractor class (TensorFlowMixin and PyTorchMixin); something along the following lines does the trick

@dataclass(repr=True)
class TimmExtractor(BaseExtractor, PyTorchMixin):

    def __init__(self, config: object) -> None:
        super(TimmExtractor, self).__init__()
        raise NotImplementedError

Investigate why SHA256 checksum does not match after Model has been downloaded

Memory corruption error `euclidean_matrix`

Hey,

when I call euclidean_matrix (through compute_rdm(x, 'euclidean')), the execution crashes with double free or corruption (!prev) (input: features from alexnet layers). Unfortunately, no more error logs are created. The other distance metrics work fine.
We run python 3.8 with CUDA 10.2 in a container. The python packages are up-to-date.

Can you reproduce that the euclidean_matrix does not work, or is it specific to our setup?

Thanks
David

EDIT: If anyone else has the same problem, I use scipy's squareform + pdist functions as a workaround. I did not benchmark, but it seems pretty fast.

CORnet inconsistent vector length

When running images through CORnet-z, the output layer for IT, as far as I understand from the CORnet preprint (https://www.biorxiv.org/content/10.1101/408385v1.full.pdf), is supposed to be 7x7x512 which when flattened would be a vector of length 25088. This is indeed what happens when apply_center_crop is set to True. However, when it is set to False, the output vectors of the same images (224 x 224 pixels), same layer/module, end up being of size 32768.
This behavior does not appear to be layer specific, I e.g. also tested it for the V1 output layer and the vector length also depends on whether center crop is applied or not.

I am not quite sure if this is a bug or "normal" behavior that I then just do not quite understand.

Add models from VISSL library

add models from the vissl library (in particular SimCLR)
we might want to do this in accordance with the fix for issue #69, since vissl will have its own extractor (VisslExtractor) and config (VisslConfig) classes

add more tests for model class and vision

README improvement

Improve readability of README file. Make text more concise and easier understandable?

update google colab

update ecoset models and clip models

choose model source

Deprecate RSA functions

There is a well-maintained reference implementation for Representational Similarity Analysis, the rsatoolbox. I therefore suggest to deprecate THINGSvision's RSA functions.

Advantages:

End users don't get confused as they only have to use a single gold-standard package for DNN feature extraction and one for RSA analyses.
THINGSvision avoids getting issues raised asking for RSA functions currently not implemented in THINGSvision, e.g. other distance functions.
Less code for THINGSvision to maintain as the focus is solely on feature extraction.

Possibly PR THINGSvision's RSA functions to the rsatoolbox if they turn out to be more efficient.

mismatch in count between input files, file_names.txt and output matrix features.npy

Dear Lukas,
first thanks a lot for this great tool!
When extracting the activations of the CLIP-ViT penultimate layer for the whole THINGS image data set, we noticed a mismatch between the count of input images (n=26,111?), the files in file_names.txt that the data loader saves (26,109), and the rows of the feature matrix (26,107). The difference between file_names.txt and the feature matrix seems to be due to two missing activations for images of the object "peppermint".
Do you know what causes the differences?
Thanks in advance!
Jonas

Enable multi-GPU inference

Enable inference of images using all available GPUs

Source-specific config classes

Add source-specific config classes (e.g., TimmConfig(), VisslConfig()) similarly to how huggingface does it

use openai clip package instead of copy

Torchvision compatibility

Enable compatibility with most recent torchvision version (>= 0.13.0)

extract features problem

add custom user models

add a directory where users can implement their own models/load weights/...
these models can then be loaded via model name and lookup in the directory

Rename extractor.show_model()

.show_model() currently displays all modules of the model, but also prompts the user to input a module name, which it then checks for validity and returns. It should either be renamed to something more descriptive or actually only show the modules, as the name suggests.

We might also consider moving module name checking into .extract_features(...), as the user is not required to call .show_model() before extracting.

add google colab notebook

fix small typos in colabs

Return copy of activations

I think there's a rather serious (but fortunately easily fixable!) problem with THINGSvision. Namely, in the hook that retrieves model activations, the dictionary activations is given a pointer to the original tensor output rather than a copy of output. This means that a layer that operates in-place, such as ReLU(inplace=True), can modify our activation tensor, even if said layer comes after the layer whose activations we are extracting.

To replicate this, you can use the pytorch Colab notebook given, and extract features for Alexnet on pretty much any image. Examining the features from layer features.0 (a 2D convolution), all entries are non-negative, and many are 0. This is because the following layer, features.1, is an in-place ReLU operation, and overwrote all of the negative entries.

To solve this, the dictionary activations should store a copy of the tensor, rather than the original tensor. I'm happy to make a PR and fix this myself, but I thought I would also bring it to folks' attention here as well.

vicco-group / thingsvision Goto Github PK

thingsvision's Introduction

📔 Table of Contents

🌟 About the Project

🦾 Functionality

🗄️ Model collection

🏃 Getting Started

💻 Setting up your environment

Working locally

Google Colab

🔍 Basic usage

Command Line Interface (CLI)

Python commands

Feature extraction with custom data pipeline

PyTorch

TensorFlow / Keras

Human alignment

👋 How to contribute

⚠️ License

📃 Citation

💎 Contributions

thingsvision's People

Contributors

Stargazers

Watchers

Forkers

thingsvision's Issues

Recommend Projects

Recommend Topics

Recommend Org