Code Monkey home page Code Monkey logo

thingsvision's Introduction


πŸ“” Table of Contents

🌟 About the Project

thingsvision is a Python package for extracting (image) representations from many state-of-the-art computer vision models. Essentially, you provide thingsvision with a directory of images and specify the neural network you're interested in. Subsequently, thingsvision returns the representation of the selected neural network for each image, resulting in one feature map (vector or matrix, depending on the layer) per image. These features, used interchangeably with image representations, can then be used for further analyses.

🚨 NOTE: some function calls mentioned in the original paper have been deprecated. To use this package successfully, exclusively follow this README and the documentation! 🚨

(back to top)

🦾 Functionality

With thingsvision, you can:

(back to top)

πŸ—„οΈ Model collection

Neural networks come from different sources. With thingsvision, you can extract image representations of all models from:

  • torchvision
  • Keras
  • timm
  • ssl (self-supervised learning models)
    • simclr-rn50, mocov2-rn50, barlowtwins-rn50, pirl-rn50
    • jigsaw-rn50, rotnet-rn50, swav-rn50, vicreg-rn50
    • dino-rn50, dino-xcit-{small/medium}-{12/24}-p{8/16}
    • dino-vit-{tiny/small/base}-p{8/16}
    • dinov2-vit-{small/base/large/giant}-p14
    • mae-vit-{base/large}-p16, mae-vit-huge-p14
  • OpenCLIP models (CLIP trained on LAION-{400M/2B/5B})
  • CLIP models (CLIP trained on WiT)
  • a few custom models (Alexnet, VGG-16, Resnet50, and Inception_v3) trained on Ecoset rather than ImageNet and one Alexnet model pretrained on ImageNet and fine-tuned on SalObjSub
  • each of the many CORnet versions (recurrent vision models)
  • Harmonization models (see Harmonization repo). The default variant is ViT_B16. Other available models are ResNet50, VGG16, EfficientNetB0, tiny_ConvNeXT, tiny_MaxViT, and LeViT_small
  • DreamSim models (see DreamSim repo). The default variant is open_clip_vitb32. Other available models are clip_vitb32, dino_vitb16, and an ensemble. See the docs for more information
  • FAIR's Segment Anything (SAM) model
  • Kakaobrain's ALIGN implementation

(back to top)

πŸƒ Getting Started

πŸ’» Setting up your environment

Working locally

First, create a new conda environment with Python version 3.8, 3.9, or 3.10 e.g. by using conda:

$ conda create -n thingsvision python=3.9
$ conda activate thingsvision

Then, activate the environment and simply install thingsvision via running the following pip command in your terminal.

$ pip install --upgrade thingsvision
$ pip install git+https://github.com/openai/CLIP.git

If you want to extract features for harmonized models from the Harmonization repo, you have to additionally run the following pip command in your thingsvision environment (FYI: as of now, this seems to be working smoothly on Ubuntu only but not on macOS),

$ pip install git+https://github.com/serre-lab/Harmonization.git
$ pip install keras-cv-attention-models>=1.3.5

If you want to extract features for DreamSim from the DreamSim repo, you have to additionally run the following pip command in your thingsvision environment,

$ pip install dreamsim==0.1.2

See the docs for which DreamSim models are available in thingsvision.

Google Colab

Alternatively, you can use Google Colab to play around with thingsvision by uploading your image data to Google Drive (via directory mounting). You can find the jupyter notebook using PyTorch here and the TensorFlow example here.

(back to top)

πŸ” Basic usage

Command Line Interface (CLI)

thingsvision was designed to simplify feature extraction. If you have some folder of images (e.g., ./images) and want to extract features for each of these images without opening a Jupyter Notebook instance or writing a Python script, it's probably easiest to use our CLI. The interface includes two options,

  • thingsvision show-model
  • thingsvision extract-features

Example calls might look as follows:

thingsvision show-model --model-name "alexnet" --source "torchvision"
thingsvision extract-features --image-root "./data" --model-name "alexnet" --module-name "features.10" --batch-size 32 --device "cuda" --source "torchvision" --file-format "npy" --out-path "./features"

See thingsvision show-model -h and thingsvision extract-features -h for a list of all possible arguments. Note that the CLI provides just the basic extraction functionalities but is probably enough for most users that don't want to dive too deep into various models and modules. If you need more fine-grained control over the extraction itself, we recommend to use the python package directly and write your own Python script.

Python commands

To do this start by importing all the necessary components and instantiating a thingsvision extractor. Here we're using CLIP from the official clip repo as the model to extract features from and also load the model to GPU for faster inference,

import torch
from thingsvision import get_extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

model_name = 'clip'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
    'variant': 'ViT-L/14'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters=model_parameters,
)

As a next step, create both dataset and dataloader for your images. We assume that all of your images are in a single root directory which can contain subfolders (e.g., for individual classes). Therefore, we leverage the ImageDataset class.

root='path/to/your/image/directory' # (e.g., './images/)
batch_size = 32

dataset = ImageDataset(
    root=root,
    out_path='path/to/features',
    backend=extractor.get_backend(), # backend framework of model
    transforms=extractor.get_transformations(resize_dim=256, crop_dim=224) # set the input dimensionality to whichever values are required for your pretrained model
)

batches = DataLoader(
    dataset=dataset,
    batch_size=batch_size,
    backend=extractor.get_backend() # backend framework of model
)

Now all that is left is to extract the image features and store them on disk! Here we're extracting features from the image encoder module of CLIP (visual), but if you don't know which modules are available for a given model, just call extractor.show_model() to print all the modules.

module_name = 'visual'

features = extractor.extract_features(
    batches=batches,
    module_name=module_name,
    flatten_acts=True,
    output_type="ndarray", # or "tensor" (only applicable to PyTorch models of which CLIP and DINO are ones!)
)

save_features(features, out_path='path/to/features', file_format='npy') # file_format can be set to "npy", "txt", "mat", "pt", or "hdf5"

Feature extraction with custom data pipeline

PyTorch
module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)
my_dataset = ...
my_dataloader = ...

with extractor.batch_extraction(module_name, output_type="tensor") as e: 
  for batch in my_dataloader:
    ... # whatever preprocessing you want to add to the batch
    feature_batch = e.extract_batch(
      batch=batch,
      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
      )
    ... # whatever post-processing you want to add to the extracted features
TensorFlow / Keras
module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, TFRecords files)
my_dataset = ...
my_dataloader = ...

for batch in my_dataloader:
  ... # whatever preprocessing you want to add to the batch
  feature_batch = extractor.extract_batch(
    batch=batch,
    module_name=module_name,
    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
    )
  ... # whatever post-processing you want to add to the extracted features

Human alignment

Human alignment: If you want to align the extracted features with human object similarity according to the approach introduced in Improving neural network representations using human similiarty judgments you can optionally align the extracted features using the following method:

aligned_features = extractor.align(
    features=features,
    module_name=module_name,
    alignment_type="gLocal",
)

For more information about the available alignment types and aligned models see the docs.

For more examples on the many models available in thingsvision and explanations of additional functionality like how to optionally turn off center cropping, how to use HDF5 datasets (e.g. NSD stimuli), how to perform RSA or CKA, or how to easily extract features for the THINGS image database, please refer to the Documentation.

(back to top)

πŸ‘‹ How to contribute

If you come across problems or have suggestions please submit an issue!

(back to top)

⚠️ License

This GitHub repository is licensed under the MIT License - see the LICENSE.md file for details.

(back to top)

πŸ“ƒ Citation

If you use this GitHub repository (or any modules associated with it), please cite our paper for the initial version of thingsvision as follows:

@article{Muttenthaler_2021,
	author = {Muttenthaler, Lukas and Hebart, Martin N.},
	title = {THINGSvision: A Python Toolbox for Streamlining the Extraction of Activations From Deep Neural Networks},
	journal ={Frontiers in Neuroinformatics},
	volume = {15},
	pages = {45},
	year = {2021},
	url = {https://www.frontiersin.org/article/10.3389/fninf.2021.679838},
	doi = {10.3389/fninf.2021.679838},
	issn = {1662-5196},
}

(back to top)

πŸ’Ž Contributions

This is a joint open-source project between the Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, and the Machine Learning Group at Technische UniverstitΓ€t Berlin. Correspondence and requests for contributing should be adressed to Lukas Muttenthaler. Feel free to contact us if you want to become a contributor or have any suggestions/feedback. For the latter, you could also just post an issue or engange in discussions. We'll try to respond as fast as we can.

(back to top)

thingsvision's People

Contributors

a1247418 avatar alireza-kr avatar alxmrphi avatar andropar avatar candemircan avatar florianmahner avatar hahahannes avatar jonasd4 avatar lukasmut avatar matthew-brett avatar patrickmineault avatar philippkaniuth avatar rleipe avatar ssundaram21 avatar syntactic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

thingsvision's Issues

Restrict input to extractor.show_model()

When executing module_name = extractor.show_model() the user is supposed to input the part of the model for which they would like to extract features. Currently, any value is accepted, even if the value is not a valid module name for the respective model.

Therefore, check the user's input whether it actually denotes a valid module name. If not, reject it and prompt the user again. This avoids an error message later in the pipeline when the user tries to execute features = extractor.extract_features(...) which might confuse the user.

UserWarning from torchvision/transforms

OS: macOS 10.14.6 (18G9323) (I know... ask my IT why)

I created and activated a fresh conda env using the environment.yml. I did not install any additional packages.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
batch_size = 68
model_name = 'clip-ViT'
module_name = 'visual'
backend = 'pt'
clip = True
out_path = ...

everything works as expected until I execute

dl = vision.load_dl(root,
                    out_path=out_path,
                    batch_size=batch_size,
                    transforms=model.get_transformations(),
                    backend=backend,
                    file_names=file_names)

Which outputs:

    /Users/kaniuth/Desktop/extractivations/conda_env/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332:
    UserWarning: Argument interpolation should be of type InterpolationMode instead of int. 
    Please, use InterpolationMode enum. warnings.warn(

The pipeline seems to continue normally; I can extract and save activations just fine. I just don't know whether that warning has any consequences on the extracted activations?

That warning did not occur with earlier versions of THINGSvision.

Move everything model-/source-specific to custom model file or source-specific extractor

Currently, we have some if-statements that address model-specific exceptions. Since these are exceptions rather than something general, we want to specify them in the custom model file or move them to source-specific extractor classes, if applicable. There's an if-statement that exclusively concerns clip models. This should go into the custom model file or the extractor classes for OpenCLIP.

Custom models

Allow a user to add their own (pretrained) models without the necessity to upload them to thingsvision / make a PR

  • pass a model object to the extractor class rather than a string for the model name (as is required now)

Circular imports in `thingsvision.vision`

Cannot import thingsvision.dataaset.ImageDataset as a stand-alone, without prior import of thingsvision.vision. Code to reproduce:

from thingsvision.dataset import ImageDataset

throws: ImportError: cannot import name 'ImageDataset' from partially initialized module 'thingsvision.dataset' (most likely due to a circular import). Temporary fix is:

import thingsvision.vision
from thingsvision.dataset import ImageDataset

Implement low-memory option for feature extraction

Right now, features are stored in RAM when calling extractor.extract_features(...), which quickly makes memory usage explode. It might be a good idea to enable a low-memory option that just writes features to disk immediately.

Maybe add an output path to the extractor extractor = Extractor(out_path='...') and an extra flag extractor.extract_features(store_to_disk=True)?

Add ecoset

  • update download script to get ecoset VGG weights
  • update README with an example

model feature extraction not possible on cuda

if torch.device is cuda, then thingsvision.model_class.Model.extract_features() throws a type error, since features are torch.Tensor() and features = np.vstack(features) #L223 is not possible. suggested fix on #L221 features.append(act.cpu().numpy())

Update tests

Unittests should be more splitted in a test directory

import error - scikit image

Hello there,

I would like to try your tool but there are some issue during import (please see below).

I have re-installed scikit-image but his did not help. Any ideas?

Thank you.

ImportError: dlopen(/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/geometry.cpython-39-darwin.so, 2): Symbol not found: ____chkstk_darwin
Referenced from: /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib (which was built for Mac OS X 10.15)
Expected in: /usr/lib/libSystem.B.dylib
in /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "", line 1, in
import thingsvision.vision as vision

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/thingsvision/vision.py", line 67, in
from skimage.transform import resize

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/init.py", line 124, in
_raise_build_error(e)

File "/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/init.py", line 102, in _raise_build_error
raise ImportError("""%s

ImportError: dlopen(/anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/geometry.cpython-39-darwin.so, 2): Symbol not found: ____chkstk_darwin
Referenced from: /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib (which was built for Mac OS X 10.15)
Expected in: /usr/lib/libSystem.B.dylib
in /anaconda3/envs/coactivations/lib/python3.9/site-packages/skimage/_shared/../.dylibs/libomp.dylib
It seems that scikit-image has not been built correctly.

Your install of scikit-image appears to be broken.
Try re-installing the package following the instructions at:
https://scikit-image.org/docs/stable/install.html

update readme

Add to README that model names and layer names are backend specific

add tests with different batch sizes

  • add a test where different batch sizes are used (with and without remainders)
  • change the simple network architecture in the tests to have two hidden neurons instead of one

source-specific extractor classes that inherit from a base extractor class

  • now, one can only use DEFAULT weights for torchvision which is limited behavior (our goal is flexible behavior)
  • we need to make feature extraction more flexible for a source by inheritance from a base extractor class and a backend-specific extractor class (TensorFlowMixin and PyTorchMixin); something along the following lines does the trick
@dataclass(repr=True)
class TimmExtractor(BaseExtractor, PyTorchMixin):

    def __init__(self, config: object) -> None:
        super(TimmExtractor, self).__init__()
        raise NotImplementedError

Memory corruption error `euclidean_matrix`

Hey,

when I call euclidean_matrix (through compute_rdm(x, 'euclidean')), the execution crashes with double free or corruption (!prev) (input: features from alexnet layers). Unfortunately, no more error logs are created. The other distance metrics work fine.
We run python 3.8 with CUDA 10.2 in a container. The python packages are up-to-date.

Can you reproduce that the euclidean_matrix does not work, or is it specific to our setup?

Thanks
David

EDIT: If anyone else has the same problem, I use scipy's squareform + pdist functions as a workaround. I did not benchmark, but it seems pretty fast.

CORnet inconsistent vector length

When running images through CORnet-z, the output layer for IT, as far as I understand from the CORnet preprint (https://www.biorxiv.org/content/10.1101/408385v1.full.pdf), is supposed to be 7x7x512 which when flattened would be a vector of length 25088. This is indeed what happens when apply_center_crop is set to True. However, when it is set to False, the output vectors of the same images (224 x 224 pixels), same layer/module, end up being of size 32768.
This behavior does not appear to be layer specific, I e.g. also tested it for the V1 output layer and the vector length also depends on whether center crop is applied or not.

I am not quite sure if this is a bug or "normal" behavior that I then just do not quite understand.

Add models from VISSL library

  • add models from the vissl library (in particular SimCLR)
  • we might want to do this in accordance with the fix for issue #69, since vissl will have its own extractor (VisslExtractor) and config (VisslConfig) classes

README improvement

Improve readability of README file. Make text more concise and easier understandable?

Deprecate RSA functions

There is a well-maintained reference implementation for Representational Similarity Analysis, the rsatoolbox. I therefore suggest to deprecate THINGSvision's RSA functions.

Advantages:

  1. End users don't get confused as they only have to use a single gold-standard package for DNN feature extraction and one for RSA analyses.
  2. THINGSvision avoids getting issues raised asking for RSA functions currently not implemented in THINGSvision, e.g. other distance functions.
  3. Less code for THINGSvision to maintain as the focus is solely on feature extraction.

Possibly PR THINGSvision's RSA functions to the rsatoolbox if they turn out to be more efficient.

mismatch in count between input files, file_names.txt and output matrix features.npy

Dear Lukas,
first thanks a lot for this great tool!
When extracting the activations of the CLIP-ViT penultimate layer for the whole THINGS image data set, we noticed a mismatch between the count of input images (n=26,111?), the files in file_names.txt that the data loader saves (26,109), and the rows of the feature matrix (26,107). The difference between file_names.txt and the feature matrix seems to be due to two missing activations for images of the object "peppermint".
Do you know what causes the differences?
Thanks in advance!
Jonas

add custom user models

add a directory where users can implement their own models/load weights/...
these models can then be loaded via model name and lookup in the directory

Rename extractor.show_model()

.show_model() currently displays all modules of the model, but also prompts the user to input a module name, which it then checks for validity and returns. It should either be renamed to something more descriptive or actually only show the modules, as the name suggests.

We might also consider moving module name checking into .extract_features(...), as the user is not required to call .show_model() before extracting.

Return copy of activations

I think there's a rather serious (but fortunately easily fixable!) problem with THINGSvision. Namely, in the hook that retrieves model activations, the dictionary activations is given a pointer to the original tensor output rather than a copy of output. This means that a layer that operates in-place, such as ReLU(inplace=True), can modify our activation tensor, even if said layer comes after the layer whose activations we are extracting.

To replicate this, you can use the pytorch Colab notebook given, and extract features for Alexnet on pretty much any image. Examining the features from layer features.0 (a 2D convolution), all entries are non-negative, and many are 0. This is because the following layer, features.1, is an in-place ReLU operation, and overwrote all of the negative entries.

To solve this, the dictionary activations should store a copy of the tensor, rather than the original tensor. I'm happy to make a PR and fix this myself, but I thought I would also bring it to folks' attention here as well.

Support feature extraction for images stored in HDF5 files

Some datasets have images that are stored in HDF5 format (e.g. NSD stimuli). Currently, these have to be written to disk to use the Extractor with an ImageDataset. It would be nice to have an HDF5Dataset that directly uses HDF5 files for extraction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.