Code Monkey home page Code Monkey logo

Comments (13)

willprice avatar willprice commented on May 19, 2024

Hi @JieFeng-cse,

It sounds like the issue is probably with data preprocessing.

We use the following data preprocessing pipeline in testing

from torchvision.transforms import Compose
from transforms import GroupScale, GroupCenterCrop, GroupOverSample, Stack, ToTorchFormatTensor, GroupNormalize

crop_count = 10
net = ...
backbone_arch = ...

if crop_count == 1:
    cropping = Compose([
        GroupScale(net.scale_size),
        GroupCenterCrop(net.input_size),
    ])
elif crop_count == 10:
    cropping = GroupOverSample(net.input_size, net.scale_size)
else:
    raise ValueError("Only 1 and 10 crop_count are supported while we got {}".format(crop_count))

transform = Compose([
    cropping,
    Stack(roll=backbone_arch == 'BNInception'),
    ToTorchFormatTensor(div=backbone_arch != 'BNInception'),
    GroupNormalize(net.input_mean, net.input_std),
])

Where transforms is https://github.com/yjxiong/tsn-pytorch/blob/master/transforms.py

This transform is applied to the frames which are loaded a list of PIL images.

Let me know how you get on with that and if this resolves the problem, if it does I'll update the README with these instructions.

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

Thank you very much!
When I finish testing I will let you report immediately.

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

Actually, I use this to process data:

 transform = torchvision.transforms.Compose([
     transforms.GroupOverSample(net.input_size, net.scale_size),
     transforms.Stack(roll=(args.arch in ['BNInception', 'InceptionV3'])),
     transforms.ToTorchFormatTensor(div=(args.arch not in ['BNInception', 'InceptionV3'])),    
       transforms.GroupNormalize(net.input_mean, net.input_std),
 ])

I think the effect should be the same, but I do notice the confidence is growing when I change to the code above.
However, it still can not perform like it should be.the results like:( I output the index to make sure I didn't make it wrong.)

$  python test_video.py --arch BNInception --dataset EPIC     --weights pretrain/TRN_arch=BNInception_modality=RGB_segments=8-a770bfbd.pth.tar  --frame_folder sample_data/take
125
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Loading frames in sample_data/take
RESULT ON sample_data/take
0.429 -> sort
tensor(35, device='cuda:0')
0.198 -> swirl
tensor(112, device='cuda:0')
0.175 -> pull
tensor(54, device='cuda:0')
0.160 -> add
tensor(25, device='cuda:0')
0.028 -> soak
tensor(52, device='cuda:0')

from epic-kitchens-55-action-models.

willprice avatar willprice commented on May 19, 2024

Did you put the model in .eval() mode?

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

net.cuda().eval()
like this? Yes

from epic-kitchens-55-action-models.

willprice avatar willprice commented on May 19, 2024

Can you send me a copy of your data loader code and test_models.py script so I can investigate further?

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

Most code I just use from the original TRN from:https://github.com/metalbubble/TRN-pytorch
And now I am using test_video.py to make predict. All the differences are happened when:

# Load model.
net = TSN(num_class,
          args.test_segments,
          args.modality,
          base_model=args.arch,
          consensus_type=args.consensus_type,
          img_feature_dim=args.img_feature_dim, print_spec=False)

checkpoint = torch.load(args.weights)

net.load_state_dict(checkpoint['state_dict'], strict=False)
net.cuda().eval()`
`crop_count = 10
backbone_arch = "BNInception"

if crop_count == 1:
    cropping = Compose([
        GroupScale(net.scale_size),
        GroupCenterCrop(net.input_size),
    ])
elif crop_count == 10:
    cropping = GroupOverSample(net.input_size, net.scale_size)
else:
    raise ValueError("Only 1 and 10 crop_count are supported while we got {}".format(crop_count))

transform = Compose([
    cropping,
    Stack(roll=backbone_arch == 'BNInception'),
    ToTorchFormatTensor(div=backbone_arch != 'BNInception'),
    GroupNormalize(net.input_mean, net.input_std),
])

Except that, I made no other change. If that description is not enough I can upload my code. Because it can not do well in single test I haven't begun to use test_models.py.

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

test_video.txt

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

python test_video.py --arch BNInception --dataset EPIC --weights pretrain/TRN_arch=BNInception_modality=RGB_segments=8-a770bfbd.pth.tar --frame_folder sample_data/cut2
125
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Loading frames in sample_data/cut2
RESULT ON sample_data/cut2
0.997 -> flush
tensor(109, device='cuda:0')
0.002 -> attach
tensor(79, device='cuda:0')
0.000 -> flatten
tensor(93, device='cuda:0')
0.000 -> spray
tensor(44, device='cuda:0')
0.000 -> unfreeze
tensor(120, device='cuda:0')

cut2 obtains frames from 8990 to 9655, the annotation is like:

40,P01,P01_01,still cutting courgette,00:02:10.99,00:02:25.79,7859,8747,cut,5,courgette,69,['courgette'],[69]
41,P01,P01_01,dicing courgette,00:02:30.53,00:02:45.65,9031,9939,dice,5,courgette,69,['courgette'],[69]
42,P01,P01_01,still dicing courgette,00:02:45.75,00:03:00.28,9945,10816,dice,5,courgette,69,['courgette'],[69]
43,P01,P01_01,still dicing courgette,00:03:00.38,00:03:13.90,10822,11634,dice,5,courgette,69,['courgette'],[69]

May this where give you more understanding about my problem. Thank you again, from the bottom of my heart.

from epic-kitchens-55-action-models.

willprice avatar willprice commented on May 19, 2024

What happens when you disable strict in load_state_dict? You need to use the model definitions provided in this repository, they have been modified to have two output layers for verbs and nouns. You cannot reuse the existing published code from TSN/TRN/TSM.

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

That‘s the problem? I will try, sorry for the inconvenience.

from epic-kitchens-55-action-models.

JieFeng-cse avatar JieFeng-cse commented on May 19, 2024

Yes I am so sorry, it turns out that is the truth for this story, I change to use your Tsn code and the accuracy is just like what it should be. Sorry for all the inconvenience and I am very grateful for your help.

from epic-kitchens-55-action-models.

willprice avatar willprice commented on May 19, 2024

Hi @JieFeng-cse,
Glad you got it working in the end.

from epic-kitchens-55-action-models.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.