Comments (47)
I can suggest this algorithm.
- If last symbol of first word is 's', 'x', 'z', 't', 'd' and first symbol of second word is h or any vowel
- If last symbol of first word is consonant + apostrophe and first symbol of second word is h or any vowel
then if the last consonant in 1st word is pronounced then just delete pause between words
if last consonant in 1st word is NOT pronounced then change pause between words on 'z' for 's', 'x', 'z' and on 't' for 't', 'd'.
I think it covers most cases.
from larynx.
I agree liason in french is important, it helps a lot to understand the speaker and larynx doesn't make it.
At this late hour, I will not pick a solution. It fairly complex to know when we should make liaison or not (in fact it's more a feeling than a rule that I apply).
I found this algorithm yesterday, with two function check_liaison
which verify if we should to do a liason between two words. And liaison
which apply the liaison.
https://github.com/tjiho/PoemesProfonds/blob/ede1b32df153254e826cd9779f971fe72d6bd3eb/lecture.py#L143
from larynx.
For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/
Tomorrow I could make a summary in english.
from larynx.
200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9
from larynx.
Gruut can do syntax analysis using python-crfsuite. I trained a model for French today on the French Universal Dependencies treebank, and it seems to work quite well.
Here's the result of my first attempt: https://drive.google.com/drive/folders/1a232BIJ_gTfm3wHEKr0F86K8BkepQYay?usp=sharing
I took my example sentences from here and implemented just these few rules (for now):
- "et" = no liason
- determiner -> anything = liason
- pronoun -> verb, liason
- preposition or très -> noun, liason
This is the log with generated phonemes:
Words for 'Un enfant.': ['un/DET', 'enfant/NOUN', './PUNCT']
Phonemes for 'Un enfant.': ['#', 'œ̃', 'n', 'ɑ̃', 'f', 'ɑ̃', '#', '‖', '‖']
Words for 'Les arbres.': ['les/DET', 'arbres/NOUN', './PUNCT']
Phonemes for 'Les arbres.': ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖']
Words for 'Deux amis.': ['deux/NUM', 'amis/NOUN', './PUNCT']
Phonemes for 'Deux amis.': ['#', 'd', 'ø', 'z', 'a', 'm', 'i', '#', '‖', '‖']
Words for 'Ton excellent vin.': ['ton/DET', 'excellent/ADJ', 'vin/NOUN', './PUNCT']
Phonemes for 'Ton excellent vin.': ['#', 't', 'ɔ̃', 'n', 'ɛ', 'k', 's', 'ɛ', 'l', '#', 'v', 'ɛ̃', '#', '‖', '‖']
Words for 'Ces autres voyages.': ['ces/DET', 'autres/ADJ', 'voyages/NOUN', './PUNCT']
Phonemes for 'Ces autres voyages.': ['#', 's', 'ɛ', 'z', 'o', 't', 'ʁ', '#', 'v', 'w', 'a', 'j', 'a', 'ʒ', '#', '‖', '‖']
Words for 'On est là!': ['on/PRON', 'est/AUX', 'là/ADV', '!/PUNCT']
Phonemes for 'On est là!': ['#', 'ɔ̃', 'n', 'ɛ', '#', 'l', 'a', '#', '‖', '‖']
Words for 'Elles ont faim!': ['elles/PRON', 'ont/AUX', 'faim/VERB', '!/PUNCT']
Phonemes for 'Elles ont faim!': ['#', 'ɛ', 'l', 'z', 'ɔ̃', '#', 'f', 'ɛ̃', '#', '‖', '‖']
Words for 'Vous êtes sûrs?': ['vous/PRON', 'êtes/AUX', 'sûrs/ADJ', '?/PUNCT']
Phonemes for 'Vous êtes sûrs?': ['#', 'v', 'u', 'z', 'ɛ', 't', '#', 's', 'y', 'ʁ', '#', '‖', '‖']
Words for 'Tu nous entends.': ['tu/PRON', 'nous/PRON', 'entends/VERB', './PUNCT']
Phonemes for 'Tu nous entends.': ['#', 't', 'y', '#', 'n', 'u', 'z', 'ɑ̃', 't', 'ɑ̃', '#', '‖', '‖']
Words for 'Je les adore.': ['je/PRON', 'les/PRON', 'adore/VERB', './PUNCT']
Phonemes for 'Je les adore.': ['#', 'ʒ', 'ə', '#', 'l', 'e', 'z', '#', 'a', 'd', 'ɔ', 'ʁ', '#', '‖', '‖']
Words for 'J'ai des petites oreilles.': ["j'ai/VERB", 'des/DET', 'petites/ADJ', 'oreilles/NOUN', './PUNCT']
Phonemes for 'J'ai des petites oreilles.': ['#', 'ʒ', 'e', '#', 'd', 'e', '#', 'p', 'ə', 't', 'i', 't', 'z', 'ɔ', 'ʁ', 'ɛ', 'j', '#', '‖', '‖']
Words for 'Michel est un grand ami.': ['michel/PROPN', 'est/AUX', 'un/DET', 'grand/ADJ', 'ami/NOUN', './PUNCT']
Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']
Words for 'Je regarde la télé sur un petit écran.': ['je/PRON', 'regarde/VERB', 'la/DET', 'télé/NOUN', 'sur/ADP', 'un/DET', 'petit/ADJ', 'écran/NOUN', './PUNCT']
Phonemes for 'Je regarde la télé sur un petit écran.': ['#', 'ʒ', 'ə', '#', 'ʁ', 'ə', 'ɡ', 'a', 'ʁ', 'd', '#', 'l', 'a', '#', 't', 'e', 'l', 'e', '#', 's', 'y', 'ʁ', '#', 'œ̃', 'n', '#', 'p', 'ə', 't', 'i', 't', 'e', 'k', 'ʁ', 'ɑ̃', '#', '‖', '‖']
Words for 'C'est un ancien élève.': ["c'est/AUX", 'un/DET', 'ancien/ADJ', 'élève/NOUN', './PUNCT']
Phonemes for 'C'est un ancien élève.': ['#', 's', 'ɛ', '#', 'œ̃', 'n', '#', 'ɑ̃', 's', 'j', 'ɛ̃', 'n', 'e', 'l', 'ɛ', 'v', '#', '‖', '‖']
Words for 'C'est très amusant!': ["c'est/AUX", 'très/ADV', 'amusant/ADJ', '!/PUNCT']
Phonemes for 'C'est très amusant!': ['#', 's', 'ɛ', '#', 't', 'ʁ', 'ɛ', 'z', 'a', 'm', 'y', 'z', 'ɑ̃', '#', '‖', '‖']
Words for 'Je vis en Amérique.': ['je/PRON', 'vis/VERB', 'en/ADP', 'amérique/PROPN', './PUNCT']
Phonemes for 'Je vis en Amérique.': ['#', 'ʒ', 'ə', '#', 'v', 'i', '#', 'ɑ̃', 'n', 'a', 'm', 'e', 'ʁ', 'i', 'k', '#', '‖', '‖']
Words for 'Ils sont chez eux.': ['ils/PRON', 'sont/AUX', 'chez/ADP', 'eux/PRON', './PUNCT']
Phonemes for 'Ils sont chez eux.': ['#', 'i', 'l', '#', 's', 'ɔ̃', '#', 'ʃ', 'e', 'z', 'ø', '#', '‖', '‖']
Words for 'J'arrive dans une minute.': ["j'arrive/VERB", 'dans/ADP', 'une/DET', 'minute/NOUN', './PUNCT']
Phonemes for 'J'arrive dans une minute.': ['#', 'ʒ', 'a', 'ʁ', 'i', 'v', '#', 'd', 'ɑ̃', 'z', 'y', 'n', '#', 'm', 'i', 'n', 'y', 't', '#', '‖', '‖']
from larynx.
- adjective + noun : liason
I believe there are adjectives like grand, petit, gros, long, beaux, bel
It's possible to add them to a first word exception.
It apply to this kind of adjectives (qualificatif) : un petit_ami
But also to:
- possessive adjectives,
mon
,ma
,mes
,ton
,ta
,tes
, etc... :mon_amant
- relative adjectives,
Lequel
,duquel
,auquel
,laquelle
,Lesquels
,desquels
,auxquels
,Lesquelles
,desquelles
,auxquelles
:Lesquelles_avocats
. - And numeral adjective (numbers):
deux_amants
- other adjective are in fact determinant
- there is a list of typical expression to know for which you should do a liason (todo: find this list).
Can you give some example(s)?
Sure, the article talks about some expressions:
tout_à l’heure
c’est_à dire
plus_ou moins
peut_être
from larynx.
@synesthesiam All sentences are OK for me.
from larynx.
In the second one, les arbres
is better. In the first one, I hear ['#', 'l', 'e', 'z', 'a', 'b', 'ʁ', '#', '‖', '‖'] instead of ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖'] ('ʁ is missing before the 'b')
from larynx.
OK, thank you, I'll check it tomorrow.
I use Windows subsystem for linux 1))
from larynx.
Liason looks like it's going to be fairly complex to implement correctly. I'm leaving this link here for future me: https://github.com/juliacarbajal/french_phonologizer/blob/master/phonologize.py
from larynx.
Also I think it's not all examples by link is correct in modern French.
For example (from your learning French dataset), "any instances of après that are not part of après-midi"
Cyrus Smith et Gédéon Spilett, après être << liason
C’est le mot du professeur, qui, après avoir << liason
but I'm not a native French speaker.
from larynx.
@tjiho, I don't think it's possible to make an one universal solution. For example, in your article he wrote you shouldn't use a liason for phrases like “des haricots” (last 's' and first 'h') but if you check siwis dataset in sentence "Voilà donc de quoi dépendent les destins des hommes !" she used a liason for "des_hommes" and as I know it's a standard pronunciation. Maybe it depends from region where man/woman lives but even in most known self-study guide Mauger "Course de Lange et de Civilisation Francaises" they used the liason in that case (see the page 4: [dezom]).
from larynx.
@alt131 About liason with h
I searched on internet and I found this: http://www.languefrancaise.net/forum/viewtopic.php?id=180
It say (in summary) that there is not liason with word beginning with a h
non latin (Germanic or Greek). There is a list of those (~200), i will search it and publish it when I'll have the list.
It's very ugly to say des_haies
[dezɛ]. And you're right it should say des_hommes
[dezom].
from larynx.
For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/Tomorrow I could make a summary in english.
So it lists some rules:
- you should do a liaison only it the first word ends in a consonant and the second begins with a vowel.
- determinant + noun : liason
- pronoun + verb : liason
- adjective + noun : liason
- there is a list of typical expression to know for which you should do a liason (todo: find this list).
- noun + adjective: no liason
- after
et
: no liason (else it sounds likeest
) - after the verb
être
andavoir
liason is optional (it sounds more formal with the liason) - we could simplify after a verb no liason - if the second word begin with a h aspiré (non latin words) : no liason (todo: find the list)
- if the second word begin with a h muet (latin words): liason
from larynx.
@synesthesiam, did you define a part of speech of word in gruut?
@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?
from larynx.
@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?
@alt131 yes i understand it
from larynx.
@tjiho, I doubt what gruut has a syntactical analyzer.
I suggest to use my algorithm above and add there some exceptions like 'et' for first word and 'à', 'il', 'elle' etc for second word. It covers 95-98% cases for TTS and STT and there is 3 sentences on Python I believe.
from larynx.
And I also believe it's not a big problem if we'll not add a liason for "un_ami" etc, at least for speech-to-text neuro net.
from larynx.
200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9
A bit more
from larynx.
- adjective + noun : liason
I believe there are adjectives like grand, petit, gros, long, beaux, bel
It's possible to add them to a first word exception.
- there is a list of typical expression to know for which you should do a liason (todo: find this list).
Can you give some example(s)?
from larynx.
@tjiho, thanks.
@synesthesiam, I like it but "Je vis en Amérique." Phonems is correct but the pronunciation is not.
'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.
from larynx.
'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.
excellent
could have mutiple pronunciation depending of the context. It has been pronunced like a verb. ils excellent en mathématiques
from larynx.
@tjiho, I don't like a pronunciation 'ton' :)
It's very different from original one.
from larynx.
from larynx.
Here are the same sentences with a word break (#
) added after the liason: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR?usp=sharing
Do these sound better or worse?
from larynx.
@synesthesiam, 'Ton excellent vin.' is OK
"Je vis en Amérique." is OK
I check another and write later.
from larynx.
Sorry "Je vis en Amérique." Liason is lost but pronunciation is still better))
from larynx.
OK, I'll keep the word breaks in then. This seems like progress at least
Thanks for all your help!
from larynx.
Please generate these sentences also
Un bâtiment est en vue de l’île.
Sa vie n’était pas en danger
let's check them too
from larynx.
I've added them here: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR
from larynx.
'Sa vie n’était pas en danger' is OK
'Un bâtiment est en vue de l’île.' I hear "est en" as something 'e' 'd' 'ɑ̃', but it should be 'e' 't' 'ɑ̃'
from larynx.
Check phonems for it if they are OK then do nothing.
from larynx.
DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']
😕
from larynx.
'#', 'ɛ', '#', 'ɑ̃', '#' should be '#', 'ɛ', 't', '#', 'ɑ̃', '#'. 't' was lost.
from larynx.
Another example for you 'Amalia est en danger.'
from larynx.
Ah, I'm missing the verb -> vowel case. Hang on.
from larynx.
Let's check this too "C`est incroyable!"
from larynx.
Updated the Google Drive directory.
DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']
DEBUG:larynx:Words for 'Amalia est en danger.': ['amalia/PROPN', 'est/AUX', 'en/ADP', 'danger/NOUN', './PUNCT']
DEBUG:larynx:Phonemes for 'Amalia est en danger.': ['#', 'a', 'm', 'a', 'l', 'j', 'a', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'd', 'ɑ̃', 'ʒ', 'e', '#', '‖', '‖']
DEBUG:larynx:Words for 'C'est incroyable!': ["c'est/AUX", 'incroyable/ADJ', '!/PUNCT']
DEBUG:larynx:Phonemes for 'C'est incroyable!': ['#', 's', 'ɛ', 't', '#', 'ɛ̃', 'k', 'ʁ', 'w', 'a', 'j', 'a', 'b', 'l', '#', '‖', '‖']
from larynx.
They are OK. (Phonems)
from larynx.
The pronunciation is OK too.
from larynx.
Great, thanks!
I've uploaded new code for gruut and larynx as well as the French model with POS tagging. I won't be able to update Docker images until later.
from larynx.
DEBUG:larynx:Words for 'je peux vous aider à le retrouver': ['je', 'peux', 'vous', 'aider', 'à', 'le', 'retrouver']
DEBUG:larynx:Phonemes for 'je peux vous aider à le retrouver': ['#', 'ʒ', 'ə', '#', 'p', 'ø', '#', 'v', 'u', '#', 'e', 'd', 'e', '#', 'a', '#', 'l', 'ə', '#', 'ʁ', 'ə', 't', 'ʁ', 'u', 'v', 'e', '#', '‖', '‖']
no liason in vous_aider and sound 'z' was lost in phonems
from larynx.
'Chacun est uni à l`arbre de vie.'
sh: 1: arbre: not found
sh: 1: chacun: not found
And then:
DEBUG:hifi_gan:Initializing denoiser
Traceback (most recent call last):
File "/usr/local/bin/larynx", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/larynx/main.py", line 165, in main
line_id, line = line.split(args.id_delimiter, maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)
from larynx.
You have at least 2 bugs in French models.
First bug. Current NNs were trained without the liason.
Second bug. To compeтsate first bug you added # in liason.
I think you need to delete # in liason and train swiss model for test. I believe the result will be better quality and more accurate
And bug #5 still exists for example for 'livre' or 'homme' phonemes are OK but pronunciation is not (inside a word)
from larynx.
Haven't updated the Docker images yet. I had to roll back to push a different fix.
The ValueError you got is likely from leaving the --csv
command-line argument on while passing in sentences without an id field (like id|text
).
from larynx.
The ValueError you got is likely from leaving the
--csv
command-line argument on while passing in sentences without an id field (likeid|text
).
No, it's because in " l`arbre" used no standard apostrophe. If I change it on standard ', then it works OK.
from larynx.
Just wanted to mention:
Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']
On this pronunciation sample, I hear a D-sound rather than the expected T-sound: est “D”un grand instead of est “T”un grand.
from larynx.
Related Issues (20)
- Colab example showing how to train/finetune
- How to send text to larynx SERVER using BASH script? HOT 2
- Python install fail HOT 2
- Voice suggestion: GLaDOS from portal HOT 3
- How to change port number when running from docker HOT 4
- Suppress warnings
- Reads nice as niece
- Make web demo optional
- how to init a docker image which contains specified voice
- How to train a voice model? HOT 1
- Browser request for favicon.ico returns HTTP 500 error and error on console
- ImportError: cannot import name 'escape' from 'jinja2' HOT 1
- voices-dir option of larynx.server doesn't work
- Dates like "1700s" and "1980s" are replaced with the current date
- Question about quality of voice HOT 1
- Improve performance with caching HOT 1
- Bash MacOS Install won't run due to CERTIFICATE_VERIFY_FAILED
- Persian support
- larynx.text_to_speech() function doesn't work HOT 1
- Failed to COPY download/ /download/ during docker build HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from larynx.