felixfuyihui / aishell-4 Goto Github PK

View Code? Open in Web Editor NEW

114.0 114.0 26.0 372.34 MB

License: Apache License 2.0

Python 95.39% Shell 4.61%

aishell-4's Introduction

👋 Hi, I’m 付艺辉/Yihui Fu
👀 I’m interested in Speech processing
🌱 I’m currently learning Quantitative Trading
💞️ I’m looking to collaborate on alcohol
📫 How to reach me [email protected]

aishell-4's People

Contributors

Stargazers

Watchers

aishell-4's Issues

question about the speaker info in test

hi yihui, thanks for awesome system implementation,
I can only get the speaker info of the train set, but no speaker info for the test set.
http://aishell-4.oss-cn-hangzhou.aliyuncs.com/spk_info.xlsx
can you tell where I can find the test spk_info?

thanks a lot
@felixfuyihui

reproduce the results in the paper

hi yihui！Thank you for your dedication to the code.But I have a few questions. In AISHELL-4/data_preparation/generate_fe_trainingdata.py line29、30 etc no random seed set in these places，also the documents in path/to/wavlist/of/speaker1、path/to/wavlist/of/speaker2 etc need to be prepared by myself, which makes it impossible to completely reproduce the results in the paper. How can I solve this problem?Thanks again!

原始数据中 TextGrid 标注错误

train_M/TextGrid/20200622_M_R002S07C01.TextGrid
train_M/TextGrid/20200710_M_R002S06C01.TextGrid

这两条数据中 Item[x].xmax (2104.492) 应当等于全局的 xmax (2187.436)，否则 textgrid (version==1.5) 读取该文件会出错

出错示例：

Amount of Clean Non-Overlapped data

It looks like the amount of non-overlapped data is much smaller than the overall corpus. I am seeing less than 20 hours. Is this correct?

Thanks
Michael Picheny

How to prepare each .scp file?

I tried to format all required .scp as . It worked for the front-end part, but I encountered some problems when doing ASR and evaluation. I wonder what the exact format of each .scp file is. especially wav_nospk_nofe.scp, wav_spk_nofe.scp, wav_nospk_fe.scp and wav_spk_fe.scp for ASR.
1~2 lines of examples should be very helpful.

something wrong with the pretrained asr model

Hi：
there is something wrong with the pre-trained asr model, it can't be decompressed, can you fixed this problem ????

how can I get wavlist for data preparation?

Hi Fu, I notice scripts in data preparation have to take several wavlists as input, like spk1_list, wav_list, noise_list.
How can I get or generate these lists? Are there any rules or scripts? Thx!

RuntimeError: stack expects each tensor to be equal size, but got [8, 64000] at entry 0 and [8, 205052] at entry 7

I have got this error when train the model

Traceback (most recent call last):
File "steps/run_realmask.py", line 402, in
main()
File "steps/run_realmask.py", line 277, in main
train(model, device, writer)
File "steps/run_realmask.py", line 88, in train
val_loss = validation(model, -1, lr, device)
File "steps/run_realmask.py", line 209, in validation
for idx, data in enumerate(dataloader):
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.

Original Traceback (most recent call last):
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 63, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [8, 64000] at entry 0 and [8, 205052] at entry 7

About speaker diarization on AISHELL-4

Thanks for sharing this dataset!

I plan to train and evaluate pyannote speaker diarization pipelines on AISHELL-4.

I'd like to understand the speaker diarization labels better. In particular, I'd like to know if speaker labels are global to the whole dataset or only local to each file. For instance, can we assume that speaker 001-M in file 20200705_M_R002S01C01 is the same as speaker 001-M in file 20200616_M_R001S01C01? Or, are speaker labels recycled and inconsistent across files?
Are you aware of any published speaker diarization results on AISHELL-4?

felixfuyihui / aishell-4 Goto Github PK

aishell-4's Introduction

aishell-4's People

Contributors

Stargazers

Watchers

Forkers

aishell-4's Issues

question about the speaker info in test

reproduce the results in the paper

原始数据中 TextGrid 标注错误

Amount of Clean Non-Overlapped data

How to prepare each .scp file?

something wrong with the pretrained asr model

how can I get wavlist for data preparation?

RuntimeError: stack expects each tensor to be equal size, but got [8, 64000] at entry 0 and [8, 205052] at entry 7

About speaker diarization on AISHELL-4

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent