Following the steps given in the README.md but unable

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to build the MobileViT model? about ml-cvnets HOT 3 CLOSED

apple commented on May 18, 2024

How to build the MobileViT model?

from ml-cvnets.

Comments (3)

sacmehta commented on May 18, 2024 1

@sayannath It appears that you are not building from root folder of this codebase. The easiest way would be to adapt test_model.py file. I hope this helps.

Inside the tests folder, you can run it like this:

python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml

This would generate output similar to below:

2022-08-26 08:08:12 - LOGS    - Model statistics for an input of size torch.Size([1, 3, 256, 256])
=================================================================
                          MobileViT Summary
=================================================================
ConvLayer       	 Params:    0.000 M 	 MACs :    7.078 M
-------------------------------------------------------------------
InvertedResidual 	 Params:    0.004 M 	 MACs :   59.769 M
-------------------------------------------------------------------
InvertedResidual
+InvertedResidual
+InvertedResidual 	 Params:    0.087 M 	 MACs :  392.692 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    0.657 M 	 MACs :  870.547 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    1.772 M 	 MACs :  505.577 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    2.314 M 	 MACs :  161.738 M
-------------------------------------------------------------------
ConvLayer       	 Params:    0.104 M 	 MACs :    6.554 M
-------------------------------------------------------------------
Classifier      	 Params:    0.641 M 	 MACs :    0.641 M
=================================================================
Overall parameters   =    5.579 M
Overall parameters (sanity check) =    5.579 M
Overall MACs (theoretical) = 2004.596 M
Overall MACs (FVCore)** = 2033.383 M

** Theoretical and FVCore MACs may vary as theoretical MACs do not account for certain operations which may or may not be accounted in FVCore
Note: Theoretical MACs depends on user-implementation. Be cautious
=================================================================
Flops computed using FVCore for an input of size=[1, 3, 256, 256] are    2.033 G
MobileViT(
  (conv_1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
  (layer_1): Sequential(
    (0): InvertedResidual(in_channels=16, out_channels=32, stride=1, exp=4, dilation=1, skip_conn=False)
  )
  (layer_2): Sequential(
    (0): InvertedResidual(in_channels=32, out_channels=64, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
    (2): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
  )
  (layer_3): Sequential(
    (0): InvertedResidual(in_channels=64, out_channels=96, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(96, 144, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((144,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(144, 96, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (layer_4): Sequential(
    (0): InvertedResidual(in_channels=96, out_channels=128, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(128, 192, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (layer_5): Sequential(
    (0): InvertedResidual(in_channels=128, out_channels=160, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(160, 240, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((240,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(240, 160, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(320, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (conv_1x1_exp): Conv2d(160, 640, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
  (classifier): Sequential(
    (global_pool): GlobalPool(type=mean)
    (dropout): Dropout(p=0.1, inplace=True)
    (fc): LinearLayer(in_features=640, out_features=1000, bias=True, channel_first=False)
  )
)
ClsCrossEntropy(
	ignore_idx=-1
	class_wts=False
	label_smoothing=0.1
)
Random Input : torch.Size([1, 3, 256, 256])
Random Target: torch.Size([1])
Random Output: torch.Size([1, 1000])

from ml-cvnets.

sacmehta commented on May 18, 2024

It seems that your issue is resolved, so closing it. Feel free to reopen if it is not resolved.

from ml-cvnets.

prsbsvrn commented on May 18, 2024

Hello,

I am running the command you provided: "python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml". However, I encountered the following error:

Traceback (most recent call last):
File "/home/../test/ml-cvnets/tests/test_model.py", line 19, in
from tests.configs import get_config
File "/home/../test/ml-cvnets/tests/../tests/configs.py", line 8, in
from options.opts import get_training_arguments
File "/home/../test/ml-cvnets/options/opts.py", line 11, in
from data.collate_fns import arguments_collate_fn
ImportError: cannot import name 'arguments_collate_fn' from 'data.collate_fns' (unknown location)

Additionally, could you provide code to plot the top1 and top5 accuracies, as well as the loss per epoch?
@sacmehta

from ml-cvnets.

How to build the MobileViT model? about ml-cvnets HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent