Code Monkey home page Code Monkey logo

2d-tan's People

Contributors

penghouwen avatar sy-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

2d-tan's Issues

Having cuda error when applying custom features

Hi, @Sy-Zhang and team.
First of all, thank you for sharing your work!

I have a question using your code.

I am trying to use your model with my visual and text features (using clip)

For charades dataset, it worked well.

However for TACoS, the cuda error occurs as below

Traceback (most recent call last):
  File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 22, in forward
    txt_h = self.tex_linear(txt_h)[:,:,None,None]
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

and when i pass CUDA_LAUNCH_BLOCKING=1,

File "moment_localization/train.py", line 319, in <module>
    scheduler=scheduler)
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 42, in train
    state['optimizer'].step(closure)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/optim/adam.py", line 92, in step
    loss = closure()
  File "/home/jckim/2D-TAN/moment_localization/../lib/core/engine.py", line 31, in closure
    loss, output = state['network'](state['sample'])
  File "moment_localization/train.py", line 151, in network
    prediction, map_mask = model(textual_input, textual_mask, visual_input)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/tan.py", line 22, in forward
    fused_h = self.fusion_layer(textual_input, textual_mask, map_h, map_mask)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/2D-TAN/moment_localization/../lib/models/fusion_modules/base_fusion.py", line 23, in forward
    map_h = self.vis_conv(map_h)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/jckim/mambaforge/envs/HLTI/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([32, 512, 128, 128], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
    data_type = CUDNN_DATA_FLOAT
    padding = [0, 0, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x5610fdbf3970
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
output: TensorDescriptor 0x5610fdaa24f0
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 32, 512, 128, 128,
    strideA = 8388608, 16384, 128, 1,
weight: FilterDescriptor 0x5610fdbe9210
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 512, 512, 1, 1,
Pointer addresses:
    input: 0x7fac74000000
    output: 0x7face6000000
    weight: 0x7fad8db00000

I changed config file like below

WORKERS: 16

MODEL_DIR: ./models/conv
RESULT_DIR: ./results/conv
LOG_DIR: ./log
DATA_DIR: ./data/TACoS
FEATURE_DIR: {directory to my visual_features} <- custom added and worked well on charades.

DATASET:
  NAME: TACoS
  VIS_INPUT_TYPE: clip
  NO_VAL: True
  NUM_SAMPLE_CLIPS: 256
  TARGET_STRIDE: 2
  NORMALIZE: True
  RANDOM_SAMPLING: False

TEST:
  BATCH_SIZE: 32
  RECALL: 1,5
  TIOU: 0.1,0.3,0.5,0.7
  EVAL_TRAIN: False
  NMS_THRESH: 0.5

CUDNN:
  DETERMINISTIC: False
  BENCHMARK: True

TRAIN:
  BATCH_SIZE: 32
  LR: 0.0001
  WEIGHT_DECAY: 0.0000
  MAX_EPOCH: 100
  CONTINUE: False

LOSS:
  NAME: bce_rescale_loss
  PARAMS:
    MIN_IOU: 0.3
    MAX_IOU: 0.7
    BIAS: 0.0

TAN:
  FRAME_MODULE:
    NAME: FrameAvgPool
    PARAMS:
      INPUT_SIZE: 512 <<< 
      HIDDEN_SIZE: 512
      KERNEL_SIZE: 2
      STRIDE: 2

  PROP_MODULE:
    NAME: SparsePropConv
    PARAMS:
      HIDDEN_SIZE: 512
      NUM_SCALE_LAYERS: [16, 8, 8, 8]

  FUSION_MODULE:
    NAME: BaseFusion
    PARAMS:
      HIDDEN_SIZE: 512
      TXT_INPUT_SIZE: 512 <<<
      TXT_HIDDEN_SIZE: 512
      LSTM:
        NUM_LAYERS: 3
        BIDIRECTIONAL: False

  MAP_MODULE:
    NAME: MapConv
    PARAMS:
      INPUT_SIZE: 512
      HIDDEN_SIZES: [512, 512, 512, 512, 512, 512, 512, 512]
      KERNEL_SIZES: [5, 5, 5, 5, 5, 5, 5, 5]
      STRIDES: [1, 1, 1, 1, 1, 1, 1, 1]
      PADDINGS: [16, 0, 0, 0, 0, 0, 0, 0]
      DILATIONS: [1, 1, 1, 1, 1, 1, 1, 1]

  PRED_INPUT_SIZE: 512

MODEL:
  NAME: TAN
  CHECKPOINT: ./checkpoints/TACoS/iter016165-0.4644-0.7443.pkl

this type of changing config also worked well on charades.

for loading a features, i am using code like this in ./lib/dataset/tacos.py

def get_word_embedding(self, sentence):
        inputs = self.clip_tokenizer(sentence, return_tensors="pt")
        with torch.no_grad():
            features = self.clip_model(**inputs)
            last_hidden_state_feature = features.last_hidden_state.squeeze() 
            last_hidden_state_feature = last_hidden_state_feature
        return last_hidden_state_feature
def get_video_features(self, vid):
        feature_path = os.path.join(self.feature_dir, vid + '.npz')
        features = torch.Tensor(np.load(feature_path)['features'][:]).float()
        if config.DATASET.NORMALIZE:
            features = F.normalize(features, dim=1)
        vis_mask = torch.ones((features.shape[0], 1))
        return features, vis_mask

Also this code worked well when applying to charades.

the code snippet given in error code do not produce any problem
image

Is there any problem you might think that occurs?

as charades works well, I think it is not my gpu or system problem.

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.