The tokenlabeling from zihangjiang

generate_label.py unable to find model lvvit_s

Hi,

When I tried to run the label generation script for the model lvvit_s it returned an error "RuntimeError: Unknown model".

Solution: It worked when I added the line "import tlt.models" in the file generate_label.py.

Token Label and ground truth

The shape of 'score_map' is [2, 5, H, W], but I'm curious about why append image class label in this coordinate.

TokenLabeling/tlt/data/dataset.py

Line 97 in 5cc1461

score_maps[-1,0,0,5]=target

not use distributed_train.sh

if I only have one GPU, how to change it?

The model parameters couldn't be downloaded.

The link of LV-ViT pre-trained model with resolution 448 is broken, could you please update it?

关于TokenLabel的疑问

作者好，
请问下如果我不传token_label,
看起来是用的默认dataset，
那就感觉像是普通训练呢，那是否token_label_path是必传，
还是代码里会处理普通数据？我没看到呢

Pretrained weights for LV-ViT-T

Hi,

Thanks for sharing your work.
Could you also provide the pre-trained weights for the LV-ViT-T model variant, the one that achieves 79.1% top1-acc. as mentioned in Table 1 of your paper?

All the best,
Marc

in ur released lvvit.py code, mixtoken is implemented by cut & mix the origin gridmap and the flipped one, with labels no need to change, which is not as described in the paper.
is this what you actually did during the training process?

how to apply token labeling to CNN ?

Hello ~
I'm interested in your token labeling technique,
So I want to apply this technique in CNN based model because ViT is very heavy to train.

can I get the your code with CNN token labeling?
if you're not give me some detail for implementing

thank you.

Python3.6, ok; Python3.8, error

Test: [ 0/1] Time: 11.293 (11.293) Loss: 0.7043 (0.7043) Acc@1: 42.1875 (42.1875) Acc@5: 100.0000 (100.0000) Test: [ 1/1] Time: 0.108 (5.701) Loss: 0.5847 (0.6689) Acc@1: 89.8148 (56.3187) Acc@5: 100.0000 (100.0000) free(): invalid pointer free(): invalid pointer Traceback (most recent call last): File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 303, in <module> main() File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 294, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.8', '-u', 'main.py', '--local_rank=1', './dataset/c/c', '--model', 'lvvit_s', '-b', '128', '--apex-amp', '--img-size', '224', '--drop-path', '0.1', '--token-label', '--token-label-size', '14', '--dense-weight', '0.0', '--num-classes', '2', '--finetune', './pretrained/lvvit_s-26M-384-84-4.pth.tar']' died with <Signals.SIGABRT: 6>. root@btq3ajqsfk1cu-0:/puxin_libochao/TokenLabeling# CUDA_VISIBLE_DEVICES=0,1 bash ./distributed_train.sh 2 ./dataset/c/c --model lvvit_s -b 128 --apex-amp --img-size 224 --drop-path 0.1 --token-label --token-label-size 14 --dense-weight 0.0 --num-classes 2 --finetune ./pretrained/lvvit_s-26M-384-84-4.pth.tar

label_map does not do the same augmentation (random crop) as the input image

Hi
Thanks so much for the nice work!
I am curious if you could share the insight on processing of the label_map.
If I understand it correctly, after we load image and the corresponding, we shall do the same cropping/ flip/ resize, but in

TokenLabeling/tlt/data/label_transforms_factory.py

Lines 58 to 73 in aa438ef

    
           def __call__(self, img, label_map): 
        
               i, j, h, w = self.get_params(img, self.scale, self.ratio) 
        
               coords = (i / img.size[1], 
        
                         j / img.size[0], 
        
                         h / img.size[1], 
        
                         w / img.size[0]) 
        
               coords_map = torch.zeros_like(label_map[0:1]) 
        
               # trick to store coords_map is label_map 
        
               coords_map[0,0,0,0],coords_map[0,0,0,1],coords_map[0,0,0,2],coords_map[0,0,0,3] = coords 
        
               label_map = torch.cat([label_map, coords_map]) 
        
               if isinstance(self.interpolation, (tuple, list)): 
        
                   interpolation = random.choice(self.interpolation) 
        
               else: 
        
                   interpolation = self.interpolation 
        
               return torchvision_F.resized_crop(img, i, j, h, w, self.size, 
        
                                        interpolation), label_map

Seems only image was cropped, but the label map does not do the same cropping, which make the label map not match with the image?

Shall we do

        return torchvision_F.resized_crop(
                img, i, j, h, w, self.size, interpolation
        ), torchvision_F.resized_crop(
                label_map, i / ratio, j / ratio, h / ratio, w / ratio, self.size, interpolation
        )

Thanks

target_cls 部分代码有问题？？

这样写的后果是 # rely more on target_cls if target_cls is incorrect.

Do you consider combine the tokens-to-token (kernel>1 vs vit where kernel=1) with token labeling?

Have you tried this setting?

The accuracy of the validation set is 0，and the loss is always around 13

Hello! I use ILSVRC2012_img_train and ILSVRC2012_img_val, and use the provided label_top5_train_nfnet from Google Drive. I train lv-vit-s with batch_size 64 without apex for one epoch. Thanks for your advice.

where is the training code?

Can Token labeling reach higher than annotator model?

Greetings,

Thank you for this incredible research.

I would like to know if it is possible to use Token Labeling to achieve scores higher than that of the annotator model, I believe this was the case with VOLO D5 model where it achieved higher score than NFNet, model used for annotation.

train error: AttributeError: 'tuple' object has no attribute 'log_softmax'

Hi, thanks for you great work. When I train script, some error occurs: AttributeError: 'tuple' object has no attribute 'log_softmax'

with amp_autocast():   
            output = model(input)  
            loss = loss_fn(output, target)  # error occurs

and loss function is train_loss_fn = LabelSmoothingCrossEntropy(smoothing=0.0).cuda()

by the way: Could you please tell me why we need to specify smoothing=0.0?

Could you please provide the training log ？

BatchSize Specified

if I wanna to use 1p to train,
how many batchsize I need to allocate?
or there's the formula to compute?, please

RuntimeError: CUDA error: device-side assert triggered

I am a green hand of DL. When I run the code of volo with tlt in a single or multi GPU, I get an error as follows:
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Traceback (most recent call last):
File "main.py", line 949, in
main()
File "main.py", line 664, in main
optimizers=optimizers)
File "main.py", line 773, in train_one_epoch
label_size=args.token_label_size)
File "/opt/conda/lib/python3.6/site-packages/tlt/data/mixup.py", line 90, in mixup_target
y1 = get_labelmaps_with_coords(target, num_classes, on_value=on_value, off_value=off_value, device=device, label_size=label_size)
File "/opt/conda/lib/python3.6/site-packages/tlt/data/mixup.py", line 64, in get_labelmaps_with_coords
num_classes=num_classes,device=device)
File "/opt/conda/lib/python3.6/site-packages/tlt/data/mixup.py", line 16, in get_featuremaps
_label_topk[1][:, :, :].long(),
RuntimeError: CUDA error: device-side assert triggered.

I can't fix this problem right now.

error: download the pretrained model but couldn't be unzipped

tar -xvf lvvit_s-26M-384-84-4.pth.tar tar: This does not look like a tar archive tar: Skipping to next header tar: Exiting with failure status due to previous errors

How to print the output of the wrong prediction of validation dataset?

Generating label for custom dataset

Hello,

Thank you for sharing your work. I am currently trying to generate token label to a custom dataset for model lvvit_s, but I keep getting the loss close to 7 and the Accuracy 0 (not pre-trained and using 1 GPU in Google Colab). I also tried using the pre-trained model with --transfer but got 0 in both Loss and Acc . What option should I use for a custom dataset?

[ LV-ViT-S pretrained model ]

Hi,

Thanks for the wonderful work.
Could you share with us the password to unzip LV-ViT-S pretrained model ?

Thanks !

Model settings for Cifar10

I am interested if there is any LV-ViT- model setup you have tested for Cifar10. I would like to know the proper setup of all blocks in none pretrained weights settings.

Provided models from download link does not work

Hi, can you re-check the models you provided by the download link? I downloaded the first one but it cannot be unzipped.

A Bag of Training Techniques for ViT

Hi, thanks for your wonderful work. I have a question that whether training techniques mentioned in the LV-Vit can be used in other
downstream task like object detection? In your paper, I see that many of this techniques are used in ImageNet. Thanks!

Dimension inconsistency of the token labels

Hi, I am curious about the problem of dimension inconsistency.
(1) The shape of "score_map" that generated in generate_label.py is [2, 5, H, W], but the dimension of score_maps seems to be [2, H, W, 5] in "score_maps[-1,0,0,5]=target " (line 97 of TokenLabeling/tlt/data/dataset.py )
(2) The dimension of "label_maps_topk" in line 54 of TokenLabeling/tlt/data/mixup.py is [batch_size, 3, H, H, 5], but I cannot find how to transform "score_maps" to "label_maps_topk", and what information is stored in the 0, 1, 2 dimension of "label_maps_topk", respectively.
This problem has also been mentioned in the Issue #9

	def __call__(self, img, label_map):
	i, j, h, w = self.get_params(img, self.scale, self.ratio)
	coords = (i / img.size[1],
	j / img.size[0],
	h / img.size[1],
	w / img.size[0])
	coords_map = torch.zeros_like(label_map[0:1])
	# trick to store coords_map is label_map
	coords_map[0,0,0,0],coords_map[0,0,0,1],coords_map[0,0,0,2],coords_map[0,0,0,3] = coords
	label_map = torch.cat([label_map, coords_map])
	if isinstance(self.interpolation, (tuple, list)):
	interpolation = random.choice(self.interpolation)
	else:
	interpolation = self.interpolation
	return torchvision_F.resized_crop(img, i, j, h, w, self.size,
	interpolation), label_map

zihangjiang / tokenlabeling Goto Github PK

tokenlabeling's People

Contributors

Stargazers

Watchers

Forkers

tokenlabeling's Issues

Recommend Projects

Recommend Topics

Recommend Org