Comments (4)
Looks like they are being saved to original
dir, but attempting to read from the dataset dir.
Try changing basepath
to original
in wrappers/objects/dataset.py
lines 43,45
self.additional_word_list_txt = self.basepath.joinpath('additional_word_list.txt')
self.corpus_txt = self.basepath.joinpath('corpus.txt')
from elpis.
word_count.json
gets generated from the filtered.json
file, before the wordlist with additional text is created. That's the data that the API returns to the GUI. Do we want to return all words?
Is it best to only show the transcription text? For now, make a note on the interface.
from elpis.
This has been fixed in l2s PR #42
from elpis.
Ah, it makes sense that the corpus isn't included in the wordlist - which is used to generate the pronunciation dictionary - because the wordlist is used for the acoustic model, but corpus is used for the language model. I'll add a description on the interface to say this.
from elpis.
Related Issues (20)
- Dataset prepare should prevent progression if not enough files HOT 1
- Add model type to model name info on transcription page
- Fix info table on Training dashboard
- Update all the things HOT 1
- Loading an uploaded model expects the `dataset_name` to be set HOT 1
- When multiple models are uploaded, only the first uploaded model is used for transcription
- Rename or split the HFT Tokenisation stage HOT 1
- Update the docker run command image in the docs
- update i18n for latest PRs
- HFT audio minimum-maximum filter not working HOT 10
- Change train settings for CPU
- Docs style consistency
- Expose dataset options as args in CLI HFT example
- Add some helpers to the Docker image HOT 1
- input file options
- Docker build is failing HOT 2
- Training fails when dataset contains empty annotations
- [Feature Request] Fine Tuning with MMS-ASR for 1100+ Low Resource Languages
- KeyError: 'eval_wer' when training
- Kaldi CLI fails on transcription
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elpis.