Comments (3)
@sayannath It appears that you are not building from root folder of this codebase. The easiest way would be to adapt test_model.py file. I hope this helps.
Inside the tests
folder, you can run it like this:
python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml
This would generate output similar to below:
2022-08-26 08:08:12 - LOGS - Model statistics for an input of size torch.Size([1, 3, 256, 256])
=================================================================
MobileViT Summary
=================================================================
ConvLayer Params: 0.000 M MACs : 7.078 M
-------------------------------------------------------------------
InvertedResidual Params: 0.004 M MACs : 59.769 M
-------------------------------------------------------------------
InvertedResidual
+InvertedResidual
+InvertedResidual Params: 0.087 M MACs : 392.692 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock Params: 0.657 M MACs : 870.547 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock Params: 1.772 M MACs : 505.577 M
-------------------------------------------------------------------
InvertedResidual
+MobileViTBlock Params: 2.314 M MACs : 161.738 M
-------------------------------------------------------------------
ConvLayer Params: 0.104 M MACs : 6.554 M
-------------------------------------------------------------------
Classifier Params: 0.641 M MACs : 0.641 M
=================================================================
Overall parameters = 5.579 M
Overall parameters (sanity check) = 5.579 M
Overall MACs (theoretical) = 2004.596 M
Overall MACs (FVCore)** = 2033.383 M
** Theoretical and FVCore MACs may vary as theoretical MACs do not account for certain operations which may or may not be accounted in FVCore
Note: Theoretical MACs depends on user-implementation. Be cautious
=================================================================
Flops computed using FVCore for an input of size=[1, 3, 256, 256] are 2.033 G
MobileViT(
(conv_1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
(layer_1): Sequential(
(0): InvertedResidual(in_channels=16, out_channels=32, stride=1, exp=4, dilation=1, skip_conn=False)
)
(layer_2): Sequential(
(0): InvertedResidual(in_channels=32, out_channels=64, stride=2, exp=4, dilation=1, skip_conn=False)
(1): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
(2): InvertedResidual(in_channels=64, out_channels=64, stride=1, exp=4, dilation=1, skip_conn=True)
)
(layer_3): Sequential(
(0): InvertedResidual(in_channels=64, out_channels=96, stride=2, exp=4, dilation=1, skip_conn=False)
(1): MobileViTBlock(
Local representations
Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Conv2d(96, 144, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
Global representations with patch size of 2x2
TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=144, ffn_dim=288, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
LayerNorm((144,), eps=1e-05, elementwise_affine=True)
Conv2d(144, 96, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Feature fusion
Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
)
)
(layer_4): Sequential(
(0): InvertedResidual(in_channels=96, out_channels=128, stride=2, exp=4, dilation=1, skip_conn=False)
(1): MobileViTBlock(
Local representations
Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Conv2d(128, 192, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
Global representations with patch size of 2x2
TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=192, ffn_dim=384, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
LayerNorm((192,), eps=1e-05, elementwise_affine=True)
Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Feature fusion
Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
)
)
(layer_5): Sequential(
(0): InvertedResidual(in_channels=128, out_channels=160, stride=2, exp=4, dilation=1, skip_conn=False)
(1): MobileViTBlock(
Local representations
Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Conv2d(160, 240, kernel_size=(1, 1), stride=(1, 1), bias=False, bias=False)
Global representations with patch size of 2x2
TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
TransformerEncoder(embed_dim=240, ffn_dim=480, dropout=0.1, ffn_dropout=0.0, attn_fn=MultiHeadAttention, act_fn=Swish, norm_fn=layer_norm)
LayerNorm((240,), eps=1e-05, elementwise_affine=True)
Conv2d(240, 160, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
Feature fusion
Conv2d(320, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
)
)
(conv_1x1_exp): Conv2d(160, 640, kernel_size=(1, 1), stride=(1, 1), bias=False, normalization=BatchNorm2d, activation=Swish, bias=False)
(classifier): Sequential(
(global_pool): GlobalPool(type=mean)
(dropout): Dropout(p=0.1, inplace=True)
(fc): LinearLayer(in_features=640, out_features=1000, bias=True, channel_first=False)
)
)
ClsCrossEntropy(
ignore_idx=-1
class_wts=False
label_smoothing=0.1
)
Random Input : torch.Size([1, 3, 256, 256])
Random Target: torch.Size([1])
Random Output: torch.Size([1, 1000])
from ml-cvnets.
It seems that your issue is resolved, so closing it. Feel free to reopen if it is not resolved.
from ml-cvnets.
Hello,
I am running the command you provided: "python test_model.py --common.config-file ../config/classification/imagenet/mobilevit.yaml". However, I encountered the following error:
Traceback (most recent call last):
File "/home/../test/ml-cvnets/tests/test_model.py", line 19, in
from tests.configs import get_config
File "/home/../test/ml-cvnets/tests/../tests/configs.py", line 8, in
from options.opts import get_training_arguments
File "/home/../test/ml-cvnets/options/opts.py", line 11, in
from data.collate_fns import arguments_collate_fn
ImportError: cannot import name 'arguments_collate_fn' from 'data.collate_fns' (unknown location)
Additionally, could you provide code to plot the top1 and top5 accuracies, as well as the loss per epoch?
@sacmehta
from ml-cvnets.
Related Issues (20)
- Questions about the file bytes length of ByteFormer HOT 3
- How to solve this problem:ModuleNotFoundError: No module named 'cvnets.models.classification.' HOT 5
- Normalization Params
- 'nan' loss when training 'ByteFormer' using ImageNet
- Runtime error on single GPU Linux environment training HOT 1
- Not possible to test ByteFormer HOT 5
- Cross Attention Computation in LinearSelfAttention()
- [Feature Request] Docker container
- Segmentation model conversion size mismatch HOT 1
- Size mismatch error when loading a pretrained model HOT 1
- The license of pretrained weights
- Question: Do you have removed the support for video classification? HOT 2
- crash if different number of classess within `train/test` set HOT 6
- VIT-tiny weights and config dont match? HOT 1
- How to convert segmentation model results into VNInstanceMaskObservation? HOT 4
- ModuleNotFoundError: No module named 'main_train' HOT 2
- Using vision transformers for different image resolutions HOT 1
- ModuleNotFoundError: No module named 'main_train'
- ModuleNotFoundError: No module named 'main_train'
- AttributeError: 'NoneType' object has no attribute 'size' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-cvnets.