Code Monkey home page Code Monkey logo

Comments (3)

sacmehta avatar sacmehta commented on May 18, 2024 1

@sayannath It appears that you are not building from root folder of this codebase. The easiest way would be to adapt test_model.py file. I hope this helps.

Inside the tests folder, you can run it like this:

python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml 

This would generate output similar to below:

2022-08-26 08:08:12 - LOGS    - Model statistics for an input of size torch.Size([1, 3, 256, 256])
=================================================================
                          MobileViT Summary
=================================================================
ConvLayer       	 Params:    0.000 M 	 MACs :    7.078 M
-------------------------------------------------------------------
InvertedResidual 	 Params:    0.004 M 	 MACs :   59.769 M
-------------------------------------------------------------------
InvertedResidual
+InvertedResidual
+InvertedResidual 	 Params:    0.087 M 	 MACs :  392.692 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    0.657 M 	 MACs :  870.547 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    1.772 M 	 MACs :  505.577 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock 	 Params:    2.314 M 	 MACs :  161.738 M
-------------------------------------------------------------------
ConvLayer       	 Params:    0.104 M 	 MACs :    6.554 M
-------------------------------------------------------------------
Classifier      	 Params:    0.641 M 	 MACs :    0.641 M
=================================================================
Overall parameters   =    5.579 M
Overall parameters (sanity check) =    5.579 M
Overall MACs (theoretical) = 2004.596 M
Overall MACs (FVCore)** = 2033.383 M

** Theoretical and FVCore MACs may vary as theoretical MACs do not account for certain operations which may or may not be accounted in FVCore
Note: Theoretical MACs depends on user-implementation. Be cautious
=================================================================
Flops computed using FVCore for an input of size=[1, 3, 256, 256] are    2.033 G
MobileViT(
  (conv_1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
  (layer_1): Sequential(
    (0): InvertedResidual(in_channels=16, out_channels=32, stride=1, exp=4, dilation=1, skip_conn=False)
  )
  (layer_2): Sequential(
    (0): InvertedResidual(in_channels=32, out_channels=64, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
    (2): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
  )
  (layer_3): Sequential(
    (0): InvertedResidual(in_channels=64, out_channels=96, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(96, 144, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((144,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(144, 96, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (layer_4): Sequential(
    (0): InvertedResidual(in_channels=96, out_channels=128, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(128, 192, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((192,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (layer_5): Sequential(
    (0): InvertedResidual(in_channels=128, out_channels=160, stride=2, exp=4, dilation=1, skip_conn=False)
    (1): MobileViTBlock(
    	 Local representations
    		 Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    		 Conv2d(160, 240, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
    	 Global representations with patch size of 2x2
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
    		 LayerNorm((240,), eps=1e-05, elementwise_affine=True)
    		 Conv2d(240, 160, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    	 Feature fusion
    		 Conv2d(320, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
    )
  )
  (conv_1x1_exp): Conv2d(160, 640, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
  (classifier): Sequential(
    (global_pool): GlobalPool(type=mean)
    (dropout): Dropout(p=0.1, inplace=True)
    (fc): LinearLayer(in_features=640, out_features=1000, bias=True, channel_first=False)
  )
)
ClsCrossEntropy(
	ignore_idx=-1
	class_wts=False
	label_smoothing=0.1
)
Random Input : torch.Size([1, 3, 256, 256])
Random Target: torch.Size([1])
Random Output: torch.Size([1, 1000])

from ml-cvnets.

sacmehta avatar sacmehta commented on May 18, 2024

It seems that your issue is resolved, so closing it. Feel free to reopen if it is not resolved.

from ml-cvnets.

prsbsvrn avatar prsbsvrn commented on May 18, 2024

Hello,

I am running the command you provided: "python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml". However, I encountered the following error:

Traceback (most recent call last):
File "/home/../test/ml-cvnets/tests/test_model.py", line 19, in
from tests.configs import get_config
File "/home/../test/ml-cvnets/tests/../tests/configs.py", line 8, in
from options.opts import get_training_arguments
File "/home/../test/ml-cvnets/options/opts.py", line 11, in
from data.collate_fns import arguments_collate_fn
ImportError: cannot import name 'arguments_collate_fn' from 'data.collate_fns' (unknown location)

Additionally, could you provide code to plot the top1 and top5 accuracies, as well as the loss per epoch?
@sacmehta

from ml-cvnets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.