mpc001 / auto_avsr Goto Github PK

View Code? Open in Web Editor NEW

151.0 5.0 35.0 31.44 MB

Auto-AVSR: Lip-Reading Sentences Project

License: Apache License 2.0

Python 99.82% Shell 0.18%

lipreading visual-speech-recognition

auto_avsr's People

Contributors

Stargazers

Watchers

auto_avsr's Issues

Problem about training

Hello, thank you for sharing the code. I would like to repeat your work first. May I ask under what equipment does this code run and how long has it been trained

How to train an auto-avsr model from scratch through curriculum learning

Thank you for sharing the code.

I am interested in training a visual-only model from scratch on the LRS2 dataset, using curriculum learning.
I want to know the optimal learning rate and the number of epochs for training the model using a subset of LRS2 that includes only short utterances lasting no more than 4 seconds (100 frames).
Could you provide details on how you trained the visual-only model available in the model zoo using only the LRS3 dataset (438 hours)?

The AV-ASR pretrained models.

Thanks for sharing your work.
May I ask when will the audiovisual pretrained models be likely to be released?

Re-implementation error

Does the problem previously posed as a question(#20) affect performance?

I'm re-training with a newly updated code.

Here, as a question,
I'm using A100 GPUs(4) to perform training,

so I'm wondering if it's the right way to perform training by 8 times less than the A100 GPUs(32) you used.

And, we're training your code countless times without modifying it, but 96.6% like [vsr_trlrs3_23h_base.pth] is not coming out,
only 99.4% is coming out as a result, and I need some advice.

cannot import name 'eval_env' from 'torchaudio._internal.module_utils'

When I run:
python train.py exp_dir=D:/pycharmProject/auto_avsr-main/auto_avsr-main/checkpoints exp_name=exp1 data.modality=video data.dataset.root_dir=D:/BaiduNetdiskDownload/pre

I got this error:
Traceback (most recent call last): File "train.py", line 10, in <module> from datamodule.data_module import DataModule File "D:\pycharmProject\auto_avsr-main\auto_avsr-main\datamodule\data_module.py", line 6, in <module> from .av_dataset import AVDataset File "D:\pycharmProject\auto_avsr-main\auto_avsr-main\datamodule\av_dataset.py", line 4, in <module> import torchaudio File "D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\__init__.py", line 1, in <module> from torchaudio import ( # noqa: F401 File "D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\_extension\__init__.py", line 5, in <module> from torchaudio._internal.module_utils import eval_env, fail_with_message, is_module_available, no_op ImportError: cannot import name 'eval_env' from 'torchaudio._internal.module_utils' (D:\anaconda3\envs\auto_avsr\lib\site-packages\torchaudio\_internal\module_utils.py)

How to use AUTO-AVSR to train a Chinese AVSR model

How to use AUTO-AVSR to train a Chinese AVSR model, e.g. using CMLR data sets.Visual_Speech_Recognition_for_Multiple_Languages this project came to my attention.What is the relationship between this project and the AUTO-AVSR project

Problems with data processing using a self-built dataset

Thank you very much for your work.
When I used https://github.com/mpc001/auto_avsr/blob/main/INSTRUCTION.md process my own dataset, an offset occurred. What could be the reason for this? The detector used is mediapipe.

AVSpeech Dataset

I noticed that there isn't a location to download the videos of the AVSpeech dataset. I understand that you likely took the time to download the videos, trim them, label them etc. Is it possible that you could share the video dataset itself similar to how the other datasets are available? or at least share your code on how to extract the AVSpeech dataset in the same manner you did so that we can reproduce your results?

Something went wrong with hydra and omegaconf

When I run

python main.py exp_dir=exp \
               exp_name=train_24_scratch \
               data.modality=vsr \
               optimizer.lr=3e-4 \

for training, something error with hydra.
Error occus:

Error merging 'config' with schema
Key 'exp_name' not in 'FairseqConfig'
        full_key: exp_name
        reference_type=Optional[Dict[Union[str, Enum], Any]]
        object_type=FairseqConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

So I run

HYDRA_FULL_ERROR=1 python main.py exp_dir=exp \
               exp_name=train_24_scratch \
               data.modality=vsr \
               optimizer.lr=3e-4 \

and it shows:

Traceback (most recent call last):
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 618, in _load_config_impl
    merged = OmegaConf.merge(schema.config, ret.config)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 321, in merge
    target.merge_with(*others[1:])
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 331, in merge_with
    self._format_and_raise(key=None, value=None, cause=e)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 329, in merge_with
    self._merge_with(*others)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 347, in _merge_with
    BaseContainer._map_merge(self, other)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 314, in _map_merge
    dest[key] = src._get_node(key)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 258, in __setitem__
    self._format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
    _raise(ex, cause)
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'exp_name' not in 'FairseqConfig'
        full_key: exp_name
        reference_type=Optional[Dict[Union[str, Enum], Any]]
        object_type=FairseqConfig
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 74, in <module>
    main()
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 100, in run
    cfg = self.compose_config(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 224, in _load_configuration
    job_cfg, job_cfg_load_trace = self._load_primary_config(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 819, in _load_primary_config
    ret, load_trace = self._load_config_impl(
  File "/home/luosongtao/miniconda3/envs/autoavsr/lib/python3.8/site-packages/hydra/_internal/config_loader_impl.py", line 628, in _load_config_impl
    raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging 'config' with schema

Any help to resolve this would be greatly appreciated!

`cut_or_pad` function is wrong

The current version is:

def cut_or_pad(data, size, dim=0):
    if data.size(dim) < size:
        padding = size - data.size(dim)
        data = torch.nn.functional.pad(data, (0, padding), "constant")
    elif data.size(dim) > size:
        data = data[:size]
    assert data.size(dim) == size
    return data

The right version should be:

def cut_or_pad(data, size, dim=0):
    if data.size(dim) < size:
        padding = size - data.size(dim)
        data = torch.nn.functional.pad(data, (0, 0, 0, padding), "constant")           # modified
        size = data.size(dim)                                                          # added
    elif data.size(dim) > size:
        data = data[:size]
    assert data.size(dim) == size
    return data

Unicode Decode Error when running the LRS2 data preparation

Thank you for providing the training code for the Auto AVSR.

I am facing an issue when trying to run the preprocess_lrs2lrs3.py file using the LRS2 dataset. I am seeing the below error:

Traceback (most recent call last):
File "preprocess_lrs2lrs3.py", line 77, in
text_transform = TextTransform()
File "A:\Projects\auto_avsr\preparation\transforms.py", line 152, in init
units = open(dict_path).read().splitlines()
File "C:\Users\Girish\anaconda3\envs\autoavsr\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4416: character maps to

Any help to resolve this would be greatly appreciated!

test issue

Dear author, after conducting unimodal training on video information, I obtained some models. I want to test them for alignment, but during the testing process, I found the following issues: My testing command is as follows:

HYDRA_FULL_ERROR=1 python eval.py data.modality=video
data.dataset.root_dir=/media/aa/a4b46d17-0f49-4392-98d6-49a5c9dee8e9/zzs/data/lrs2
data.dataset.test_file=lrs2_test_transcript_lengths_seg24s.csv
pretrained_model_path=/media/aa/a4b46d17-0f49-4392-98d6-49a5c9dee8e9/zzs/autoenvs/video/vtrainlrs2/last.ckpt

Whether do you perform deduplication between the LRS3 test set and AVSpeech

Hi there.
There're some overlap between AVSpeech and test set of LRS3, such as VsjTVGIw4z8, TRajLqEaWhQ and H14bBuluwB8. I'm wondering whether you perfomed the deduplication between them.

Want to train VSR model for digit recognition using grid dataset.

Thank you for sharing the code.

I am interested in training VSR model (lip_reading) for digit recognition using grid dataset. because the pre-train weights are not perfectly working for digits recognition. this repo gives idea for training lrs2 nd lrs3. how can i use this to train model for grid dataset?

How to get audio from mp4 using torchaudio

Hi， when I run the preprocess_lrs2lrs3.py ， I got an error when 'audio_data = aud_dataloader.load_data(data_filename)' .
It seem that the sox do not support the mp4 files, how can i to solve it ?

Thank you very much.

No module named 'ibug.face_alignment'

from ibug.face_alignment import FANPredictor
ModuleNotFoundError: No module named 'ibug.face_alignment'

can i know input time?

Hello, thank you for sharing a great model.

I leave an issue with a question.

What is the time dimension that goes into the input of the model? (In the paper, we found that the fps is 25 but not the time dimension.)
What happens if I put a video longer than the input time dimension?

Thank you.

Hydra Conflict Problem

Hi
Thank you for sharing the code.

For Installing additional packages in step 3.4 I got this error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
arviz 0.15.1 requires setuptools>=60.0.0, but you have setuptools 59.5.0 which is incompatible.
cvxpy 1.3.2 requires setuptools>65.5.1, but you have setuptools 59.5.0 which is incompatible.
fairseq 0.12.2 requires hydra-core<1.1,>=1.0.7, but you have hydra-core 1.3.0 which is incompatible.
fairseq 0.12.2 requires omegaconf<2.1, but you have omegaconf 2.3.0 which is incompatible.
Successfully installed GitPython-3.1.32 antlr4-python3-runtime-4.9.3 av-10.0.0 docker-pycreds-0.4.0 gitdb-4.0.10 hydra-core-1.3.0 lightning-utilities-0.9.0 omegaconf-2.3.0 pathtools-0.1.2 pyDeprecate-0.3.1 pytorch-lightning-1.5.10 sentencepiece-0.1.99 sentry-sdk-1.29.2 setproctitle-1.3.2 setuptools-59.5.0 smmap-5.0.0 torchmetrics-1.0.3 wandb-0.15.8

Can you help me about this?

Running demo.py in colab results in

Hi!

Thanks a lot for the model.
I try to run AVSR model in colab using demo.py. I'm using asr_trlrwlrs2lrs3vox2avsp_base.pth and I've specified the modality as 'audiovisual'. I'm getting this error:

Error executing job with overrides: ['data.modality=[audiovisual]', 'pretrained_model_path=[/content/asr_trlrwlrs2lrs3vox2avsp_base.pth]', 'file_path=[/content/de0fe3b3380fcc9575a8193b43226e51.mp4]']
Traceback (most recent call last):
  File "/content/auto_avsr/demo.py", line 77, in main
    pipeline = InferencePipeline(cfg)
  File "/content/auto_avsr/demo.py", line 30, in __init__
    self.modelmodule = ModelModule(cfg)
  File "/content/auto_avsr/lightning.py", line 29, in __init__
    self.model = E2E(len(self.token_list), self.backbone_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ModelModule' object has no attribute 'backbone_args'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I tried also specifying ['audio', 'video'] but that doesn't seem right.

What result should be obtained under normal circumstances after preprocessing?

What result should be obtained under normal circumstances after preprocessing?
Currently I only have a 0KB file called lrs3_train_transcript_lengths_seg24s.csv. Is this correct?

Running Demo gets ModuleNotFoundError: No module named 'six'

I've tried to run demo with a video for vsr executing the next line:

(TT) PS D:\auto_avsr> python demo.py data.modality='audio' pretrained_model_path='.\asr_trlrs3vox2_base.pth' file_path='.\avsr_english_1.mp4'

but i got the next error

Traceback (most recent call last):
  File "demo.py", line 7, in <module>
    from lightning import ModelModule
  File "D:\auto_avsr\lightning.py", line 7, in <module>
    from espnet.nets.batch_beam_search import BatchBeamSearch
  File "D:\auto_avsr\espnet\nets\batch_beam_search.py", line 8, in <module>
    from espnet.nets.beam_search import BeamSearch, Hypothesis
  File "D:\auto_avsr\espnet\nets\beam_search.py", line 9, in <module>
    from espnet.nets.e2e_asr_common import end_detect
  File "D:\auto_avsr\espnet\nets\e2e_asr_common.py", line 16, in <module>
    import six
ModuleNotFoundError: No module named 'six'

I already execute step by step to setup enviroment, and already install c++ requirements but still get the same error

Conda enviroment
Python 3.8.18
Windows 11 w/ powershell

The Audiovisual pretrained model.

Thanks for sharing your work.
And thanks for releasing the pretrained VSR and ASR models.
Is it possible to release the pretrained AVSR models?

Number of gpus / total batch size to reproduce the results in the paper

Hi,

Thanks for sharing your code. If my understanding of the code is correct, the size of batch is actually related to the number of gpus for training. If I want to get a good result (i.e. reproduce the result in the paper), how many gpus I need? How many gpus are used to get the result in the paper?

Thank you!

Number of GPUs for training

Hi,

Thanks for releasing the training code for Auto-AVSR. I was curious to know the number of gpus used for training the model when using different amount of data like 23/438/3448 hours for LRS3.

Thanks

Issue with hydra - Error merging data/dataset=cstm Key 'defaults' not in 'FairseqConfig'

Hello,

I left the below comment under issue #3 but since it's closed I am not sure the comment will be seen.

I am trying to run the training on a custom dataset and also experiencing this issue.
The file cstm.yaml is placed in here auto_avsr/conf/data/dataset and looks like this:
`defaults:

self
root: "/content/drive/MyDrive/sepedi/data/preprocess_datasets"
label_dir: "labels"
train_file: "train_labels.csv"
val_file: "val_labels.csv"
test_file: "test_labels.csv"`

As suggested above, I tried renaming the conf/config.yaml file. However, when I run:

!python main.py exp_dir=exp \
exp_name=trainaudio \
data.modality=audio \
ckpt_path='content/drive/MyDrive/LRS3_A_WER1.0/model.pth' \
+data/dataset=cstm \ trainer.num_nodes=1

I get:

Error merging data/dataset=cstm Key 'defaults' not in 'FairseqConfig' full_key: defaults reference_type=Optional[FairseqConfig] object_type=FairseqConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

When running:

%env HYDRA_FULL_ERROR=1 !python main.py exp_dir=exp \ exp_name=trainaudio \ data.modality=audio \ ckpt_path='content/drive/MyDrive/LRS3_A_WER1.0/model.pth' \ +data/dataset=cstm \ trainer.num_nodes=1

I get:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 720, in _merge_config
ret = OmegaConf.merge(cfg, loaded_cfg)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/omegaconf.py", line 321, in merge
target.merge_with(*others[1:])
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 331, in merge_with
self._format_and_raise(key=None, value=None, cause=e)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/base.py", line 95, in _format_and_raise
format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 629, in format_and_raise
_raise(ex, cause)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 329, in merge_with
self._merge_with(*others)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 347, in _merge_with
BaseContainer._map_merge(self, other)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/basecontainer.py", line 314, in _map_merge
dest[key] = src._get_node(key)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/dictconfig.py", line 258, in setitem
self._format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/base.py", line 95, in _format_and_raise
format_and_raise(
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 629, in format_and_raise
_raise(ex, cause)
File "/usr/local/lib/python3.10/dist-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'defaults' not in 'FairseqConfig'
full_key: defaults
reference_type=Optional[FairseqConfig]
object_type=FairseqConfig

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/content/auto_avsr/main.py", line 74, in
main()
File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 32, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 346, in _run_hydra
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 201, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 347, in
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 100, in run
cfg = self.compose_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 507, in compose_config
cfg = self.config_loader.load_configuration(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
return self._load_configuration(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
cfg = self._merge_defaults_into_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 805, in _merge_defaults_into_config
hydra_cfg = merge_defaults_list_into_config(hydra_cfg, user_list)
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 777, in merge_defaults_list_into_config
merged_cfg = self._merge_config(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/config_loader_impl.py", line 724, in _merge_config
raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging data/dataset=cstm

I am running this on colab due to issues installing fairseq editable locally.

Thank you in advance!

Asking for how to build the corpus for LRS2

Thank your for sharing your excellent work. I want to train the model based on the LRS2 dataset and thus wonder whether the corpus for LRS3 is also applicable for LRS2? If not, can you provide any recipe to build the new corpus?

VSR Model Training Issues

We are learning the VSR model as it is without any modifications.

The code you proceed with during your study is as follows.

python train.py exp_dir=[exp_dir]
exp_name=[exp_name]
data.modality="video"
data.dataset.root_dir=[root_dir]
data.dataset.train_file="lrs3_train_transcript_lengths_seg24s.csv"
data.dataset.val_file="lrs3_test_transcript_lengths_seg24s.csv"
trainer.num_nodes="1"
trainer.gpus="5"
data.max_frames="1800"
optimizer.lr="0.0002" \

: However, even after training the code several times, the values of "decoder_acc_step" and "decoder_acc_val" do not change when they exceed Epoch 30.

This means that the loss value does not drop.

Is there anything else important to set up when training in particular?

Thank you for your response in advance.

train issue

Hello, I noticed that the text information in the LRS3 dataset contains timestamps. What is the main use of these timestamps? Are they used in training? If I want to train with my own dataset, do I also need to provide timestamp information?

A potential bug

I used part of your code in my work, and I find a potential bug (I have not run your original code though). Please can you give it a check? Specifically, this line pads the audio data if its length is smaller than 640 times the corresponding video data length. And this line says the variable data has a size of Tx1, so the torch.nn.functional.pad function in this line will result a output size of Tx(1+padding). This seems incorrect to me. I think the padding result is supposed to be (T+padding)x1, and this line may need to be changed to something like torch.nn.functional.pad(data, (0, 0, 0, padding), "constant"). I know I may be wrong as I have not run your original code. Please can you check it anyway?

Thanks!

About the real-time AVSR model

Hi ,
Thanks for sharing your work. May I ask when will the real-time audiovisual pretrained models be likely to be released?
I download a realtime avsr model from https://download.pytorch.org/torchaudio/tutorial-assets/device_avsr_model.pt .
Then I want to test the eval.py(https://github.com/pytorch/audio/tree/main/examples/avsr/eval.py ). But it was wrong when load the model

Question about training on a Chinese dataset created by myself.

As your following guidence say:

I’m prepareing the .csv file. But I don't understand the last step. The 'tokenize' method's output is a tensor. Should I transform the tensor to a list or use the tensor itself to create the .csv?
Thanks for your help.

Error when training on LRS2

Thank you for your work. When I try to train the model on LRS2 dataset, an error says

Traceback (most recent call last):
  File "D:\avsr\auto_avsr-main\train.py", line 59, in <module>
    main()
  File "D:\ProgramData\miniforge3\envs\autoavsr\\lib\site-packages\hydra\main.py", line 94, in decorated_main
    _run_hydra(
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\utils.py", line 394, in _run_hydra
    _run_app(
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\utils.py", line 457, in _run_app
    run_and_report(
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\utils.py", line 223, in run_and_report
    raise ex
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\utils.py", line 220, in run_and_report
    return func()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\_internal\hydra.py", line 132, in run
    _ = ret.return_value
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\core\utils.py", line 260, in return_value
    raise self._return_value
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\hydra\core\utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "D:\avsr\auto_avsr-main\train.py", line 53, in main
    trainer.fit(model=modelmodule,
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainerer\trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run
    self._dispatch()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage
    return self._run_train()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run
      self.advance(*args, **kwargs)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\loops\base.py", line 140, in run
    self.on_run_start(*args, **kwargs)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 141, in on_run_start
    self._dataloader_iter = _update_dataloader_iter(data_fetcher, self.batch_idx + 1)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\loops\utilities.py", line 121, in _update_dataloader_iter
    dataloader_iter = enumerate(data_fetcher, batch_idx)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 198, in __iter__
    self._apply_patch()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 133, in _apply_patch
    apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 181, in loader_iters
    loader_iters = self.dataloader_iter.loader_iters
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 577, in create_loader_iters
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 96, in apply_to_collection
     return function(data, *args, **kwargs)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\torch\utils\data\dataloader.py", line 441, in __iter__
    return self._get_iterator()
   File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\site-packages\torch\utils\data\dataloader.py", line 1042, in __init__
    w.start()
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'AudioTransform.__init__.<locals>.<lambda>'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\ProgramData\miniforge3\envs\autoavsr\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Can you help me fix this? I follow your guide to install the environment.

mpc001 / auto_avsr Goto Github PK

auto_avsr's People

Contributors

Stargazers

Watchers

Forkers

auto_avsr's Issues

Recommend Projects

Recommend Topics

Recommend Org