Code Monkey home page Code Monkey logo

aishell-4's Introduction

  • 👋 Hi, I’m 付艺辉/Yihui Fu
  • 👀 I’m interested in Speech processing
  • 🌱 I’m currently learning Quantitative Trading
  • 💞️ I’m looking to collaborate on alcohol
  • 📫 How to reach me [email protected]

aishell-4's People

Contributors

felixfuyihui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aishell-4's Issues

reproduce the results in the paper

hi yihui!Thank you for your dedication to the code.But I have a few questions. In AISHELL-4/data_preparation/generate_fe_trainingdata.py line29、30 etc no random seed set in these places,also the documents in path/to/wavlist/of/speaker1、path/to/wavlist/of/speaker2 etc need to be prepared by myself, which makes it impossible to completely reproduce the results in the paper. How can I solve this problem?Thanks again!

原始数据中 TextGrid 标注错误

  • train_M/TextGrid/20200622_M_R002S07C01.TextGrid
  • train_M/TextGrid/20200710_M_R002S06C01.TextGrid

这两条数据中 Item[x].xmax (2104.492) 应当等于全局的 xmax (2187.436),否则 textgrid (version==1.5) 读取该文件会出错

x1ryV1uRrl

出错示例:
image

Amount of Clean Non-Overlapped data

It looks like the amount of non-overlapped data is much smaller than the overall corpus. I am seeing less than 20 hours. Is this correct?

Thanks
Michael Picheny

How to prepare each .scp file?

I tried to format all required .scp as . It worked for the front-end part, but I encountered some problems when doing ASR and evaluation. I wonder what the exact format of each .scp file is. especially wav_nospk_nofe.scp, wav_spk_nofe.scp, wav_nospk_fe.scp and wav_spk_fe.scp for ASR.
1~2 lines of examples should be very helpful.

how can I get wavlist for data preparation?

Hi Fu, I notice scripts in data preparation have to take several wavlists as input, like spk1_list, wav_list, noise_list.
How can I get or generate these lists? Are there any rules or scripts? Thx!

RuntimeError: stack expects each tensor to be equal size, but got [8, 64000] at entry 0 and [8, 205052] at entry 7

I have got this error when train the model

Traceback (most recent call last):
File "steps/run_realmask.py", line 402, in
main()
File "steps/run_realmask.py", line 277, in main
train(model, device, writer)
File "steps/run_realmask.py", line 88, in train
val_loss = validation(model, -1, lr, device)
File "steps/run_realmask.py", line 209, in validation
for idx, data in enumerate(dataloader):
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.

Original Traceback (most recent call last):
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 73, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 63, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/home/pingan/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [8, 64000] at entry 0 and [8, 205052] at entry 7

About speaker diarization on AISHELL-4

Thanks for sharing this dataset!

I plan to train and evaluate pyannote speaker diarization pipelines on AISHELL-4.

  1. I'd like to understand the speaker diarization labels better. In particular, I'd like to know if speaker labels are global to the whole dataset or only local to each file. For instance, can we assume that speaker 001-M in file 20200705_M_R002S01C01 is the same as speaker 001-M in file 20200616_M_R001S01C01? Or, are speaker labels recycled and inconsistent across files?

  2. Are you aware of any published speaker diarization results on AISHELL-4?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.