Code Monkey home page Code Monkey logo

Comments (47)

alt131 avatar alt131 commented on June 27, 2024 1

I can suggest this algorithm.

  1. If last symbol of first word is 's', 'x', 'z', 't', 'd' and first symbol of second word is h or any vowel
  2. If last symbol of first word is consonant + apostrophe and first symbol of second word is h or any vowel
    then if the last consonant in 1st word is pronounced then just delete pause between words
    if last consonant in 1st word is NOT pronounced then change pause between words on 'z' for 's', 'x', 'z' and on 't' for 't', 'd'.
    I think it covers most cases.

from larynx.

tjiho avatar tjiho commented on June 27, 2024 1

I agree liason in french is important, it helps a lot to understand the speaker and larynx doesn't make it.
At this late hour, I will not pick a solution. It fairly complex to know when we should make liaison or not (in fact it's more a feeling than a rule that I apply).
I found this algorithm yesterday, with two function check_liaison which verify if we should to do a liason between two words. And liaison which apply the liaison.
https://github.com/tjiho/PoemesProfonds/blob/ede1b32df153254e826cd9779f971fe72d6bd3eb/lecture.py#L143

from larynx.

tjiho avatar tjiho commented on June 27, 2024 1

For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/

Tomorrow I could make a summary in english.

from larynx.

alt131 avatar alt131 commented on June 27, 2024 1

200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024 1

Gruut can do syntax analysis using python-crfsuite. I trained a model for French today on the French Universal Dependencies treebank, and it seems to work quite well.

Here's the result of my first attempt: https://drive.google.com/drive/folders/1a232BIJ_gTfm3wHEKr0F86K8BkepQYay?usp=sharing

I took my example sentences from here and implemented just these few rules (for now):

  • "et" = no liason
  • determiner -> anything = liason
  • pronoun -> verb, liason
  • preposition or très -> noun, liason

This is the log with generated phonemes:

Words for 'Un enfant.': ['un/DET', 'enfant/NOUN', './PUNCT']
Phonemes for 'Un enfant.': ['#', 'œ̃', 'n', 'ɑ̃', 'f', 'ɑ̃', '#', '‖', '‖']

Words for 'Les arbres.': ['les/DET', 'arbres/NOUN', './PUNCT']
Phonemes for 'Les arbres.': ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖']

Words for 'Deux amis.': ['deux/NUM', 'amis/NOUN', './PUNCT']
Phonemes for 'Deux amis.': ['#', 'd', 'ø', 'z', 'a', 'm', 'i', '#', '‖', '‖']

Words for 'Ton excellent vin.': ['ton/DET', 'excellent/ADJ', 'vin/NOUN', './PUNCT']
Phonemes for 'Ton excellent vin.': ['#', 't', 'ɔ̃', 'n', 'ɛ', 'k', 's', 'ɛ', 'l', '#', 'v', 'ɛ̃', '#', '‖', '‖']

Words for 'Ces autres voyages.': ['ces/DET', 'autres/ADJ', 'voyages/NOUN', './PUNCT']
Phonemes for 'Ces autres voyages.': ['#', 's', 'ɛ', 'z', 'o', 't', 'ʁ', '#', 'v', 'w', 'a', 'j', 'a', 'ʒ', '#', '‖', '‖']

Words for 'On est là!': ['on/PRON', 'est/AUX', 'là/ADV', '!/PUNCT']
Phonemes for 'On est là!': ['#', 'ɔ̃', 'n', 'ɛ', '#', 'l', 'a', '#', '‖', '‖']

Words for 'Elles ont faim!': ['elles/PRON', 'ont/AUX', 'faim/VERB', '!/PUNCT']
Phonemes for 'Elles ont faim!': ['#', 'ɛ', 'l', 'z', 'ɔ̃', '#', 'f', 'ɛ̃', '#', '‖', '‖']

Words for 'Vous êtes sûrs?': ['vous/PRON', 'êtes/AUX', 'sûrs/ADJ', '?/PUNCT']
Phonemes for 'Vous êtes sûrs?': ['#', 'v', 'u', 'z', 'ɛ', 't', '#', 's', 'y', 'ʁ', '#', '‖', '‖']

Words for 'Tu nous entends.': ['tu/PRON', 'nous/PRON', 'entends/VERB', './PUNCT']
Phonemes for 'Tu nous entends.': ['#', 't', 'y', '#', 'n', 'u', 'z', 'ɑ̃', 't', 'ɑ̃', '#', '‖', '‖']

Words for 'Je les adore.': ['je/PRON', 'les/PRON', 'adore/VERB', './PUNCT']
Phonemes for 'Je les adore.': ['#', 'ʒ', 'ə', '#', 'l', 'e', 'z', '#', 'a', 'd', 'ɔ', 'ʁ', '#', '‖', '‖']

Words for 'J'ai des petites oreilles.': ["j'ai/VERB", 'des/DET', 'petites/ADJ', 'oreilles/NOUN', './PUNCT']
Phonemes for 'J'ai des petites oreilles.': ['#', 'ʒ', 'e', '#', 'd', 'e', '#', 'p', 'ə', 't', 'i', 't', 'z', 'ɔ', 'ʁ', 'ɛ', 'j', '#', '‖', '‖']

Words for 'Michel est un grand ami.': ['michel/PROPN', 'est/AUX', 'un/DET', 'grand/ADJ', 'ami/NOUN', './PUNCT']
Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']

Words for 'Je regarde la télé sur un petit écran.': ['je/PRON', 'regarde/VERB', 'la/DET', 'télé/NOUN', 'sur/ADP', 'un/DET', 'petit/ADJ', 'écran/NOUN', './PUNCT']
Phonemes for 'Je regarde la télé sur un petit écran.': ['#', 'ʒ', 'ə', '#', 'ʁ', 'ə', 'ɡ', 'a', 'ʁ', 'd', '#', 'l', 'a', '#', 't', 'e', 'l', 'e', '#', 's', 'y', 'ʁ', '#', 'œ̃', 'n', '#', 'p', 'ə', 't', 'i', 't', 'e', 'k', 'ʁ', 'ɑ̃', '#', '‖', '‖']

Words for 'C'est un ancien élève.': ["c'est/AUX", 'un/DET', 'ancien/ADJ', 'élève/NOUN', './PUNCT']
Phonemes for 'C'est un ancien élève.': ['#', 's', 'ɛ', '#', 'œ̃', 'n', '#', 'ɑ̃', 's', 'j', 'ɛ̃', 'n', 'e', 'l', 'ɛ', 'v', '#', '‖', '‖']

Words for 'C'est très amusant!': ["c'est/AUX", 'très/ADV', 'amusant/ADJ', '!/PUNCT']
Phonemes for 'C'est très amusant!': ['#', 's', 'ɛ', '#', 't', 'ʁ', 'ɛ', 'z', 'a', 'm', 'y', 'z', 'ɑ̃', '#', '‖', '‖']

Words for 'Je vis en Amérique.': ['je/PRON', 'vis/VERB', 'en/ADP', 'amérique/PROPN', './PUNCT']
Phonemes for 'Je vis en Amérique.': ['#', 'ʒ', 'ə', '#', 'v', 'i', '#', 'ɑ̃', 'n', 'a', 'm', 'e', 'ʁ', 'i', 'k', '#', '‖', '‖']

Words for 'Ils sont chez eux.': ['ils/PRON', 'sont/AUX', 'chez/ADP', 'eux/PRON', './PUNCT']
Phonemes for 'Ils sont chez eux.': ['#', 'i', 'l', '#', 's', 'ɔ̃', '#', 'ʃ', 'e', 'z', 'ø', '#', '‖', '‖']

Words for 'J'arrive dans une minute.': ["j'arrive/VERB", 'dans/ADP', 'une/DET', 'minute/NOUN', './PUNCT']
Phonemes for 'J'arrive dans une minute.': ['#', 'ʒ', 'a', 'ʁ', 'i', 'v', '#', 'd', 'ɑ̃', 'z', 'y', 'n', '#', 'm', 'i', 'n', 'y', 't', '#', '‖', '‖']

from larynx.

tjiho avatar tjiho commented on June 27, 2024 1
  • adjective + noun : liason
    I believe there are adjectives like grand, petit, gros, long, beaux, bel
    It's possible to add them to a first word exception.

It apply to this kind of adjectives (qualificatif) : un petit_ami
But also to:

  • possessive adjectives, mon, ma, mes, ton, ta, tes, etc... : mon_amant
  • relative adjectives, Lequel, duquel, auquel, laquelle, Lesquels, desquels, auxquels, Lesquelles, desquelles, auxquelles : Lesquelles_avocats.
  • And numeral adjective (numbers): deux_amants
  • other adjective are in fact determinant
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
    Can you give some example(s)?

Sure, the article talks about some expressions:
tout_à l’heure
c’est_à dire
plus_ou moins
peut_être

from larynx.

alt131 avatar alt131 commented on June 27, 2024 1

@synesthesiam All sentences are OK for me.

from larynx.

tjiho avatar tjiho commented on June 27, 2024 1

In the second one, les arbres is better. In the first one, I hear ['#', 'l', 'e', 'z', 'a', 'b', 'ʁ', '#', '‖', '‖'] instead of ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖'] ('ʁ is missing before the 'b')

from larynx.

alt131 avatar alt131 commented on June 27, 2024 1

OK, thank you, I'll check it tomorrow.
I use Windows subsystem for linux 1))

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Liason looks like it's going to be fairly complex to implement correctly. I'm leaving this link here for future me: https://github.com/juliacarbajal/french_phonologizer/blob/master/phonologize.py

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Also I think it's not all examples by link is correct in modern French.
For example (from your learning French dataset), "any instances of après that are not part of après-midi"
Cyrus Smith et Gédéon Spilett, après être << liason
C’est le mot du professeur, qui, après avoir << liason

but I'm not a native French speaker.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@tjiho, I don't think it's possible to make an one universal solution. For example, in your article he wrote you shouldn't use a liason for phrases like “des haricots” (last 's' and first 'h') but if you check siwis dataset in sentence "Voilà donc de quoi dépendent les destins des hommes !" she used a liason for "des_hommes" and as I know it's a standard pronunciation. Maybe it depends from region where man/woman lives but even in most known self-study guide Mauger "Course de Lange et de Civilisation Francaises" they used the liason in that case (see the page 4: [dezom]).

from larynx.

tjiho avatar tjiho commented on June 27, 2024

@alt131 About liason with h I searched on internet and I found this: http://www.languefrancaise.net/forum/viewtopic.php?id=180
It say (in summary) that there is not liason with word beginning with a h non latin (Germanic or Greek). There is a list of those (~200), i will search it and publish it when I'll have the list.
It's very ugly to say des_haies [dezɛ]. And you're right it should say des_hommes [dezom].

from larynx.

tjiho avatar tjiho commented on June 27, 2024

For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/

Tomorrow I could make a summary in english.

So it lists some rules:

  • you should do a liaison only it the first word ends in a consonant and the second begins with a vowel.
  • determinant + noun : liason
  • pronoun + verb : liason
  • adjective + noun : liason
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
  • noun + adjective: no liason
  • after et: no liason (else it sounds like est)
  • after the verb être and avoir liason is optional (it sounds more formal with the liason) - we could simplify after a verb no liason
  • if the second word begin with a h aspiré (non latin words) : no liason (todo: find the list)
  • if the second word begin with a h muet (latin words): liason

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@synesthesiam, did you define a part of speech of word in gruut?
@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?

from larynx.

tjiho avatar tjiho commented on June 27, 2024

@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?

@alt131 yes i understand it 😅 It will be a big improvement to have correct liason.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@tjiho, I doubt what gruut has a syntactical analyzer.
I suggest to use my algorithm above and add there some exceptions like 'et' for first word and 'à', 'il', 'elle' etc for second word. It covers 95-98% cases for TTS and STT and there is 3 sentences on Python I believe.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

And I also believe it's not a big problem if we'll not add a liason for "un_ami" etc, at least for speech-to-text neuro net.

from larynx.

tjiho avatar tjiho commented on June 27, 2024

200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9

A bit more 😇 There are 573 words. That's nice, so we have all the words with a h aspiré .

from larynx.

alt131 avatar alt131 commented on June 27, 2024
  • adjective + noun : liason
    I believe there are adjectives like grand, petit, gros, long, beaux, bel
    It's possible to add them to a first word exception.
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
    Can you give some example(s)?

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@tjiho, thanks.
@synesthesiam, I like it but "Je vis en Amérique." Phonems is correct but the pronunciation is not.
'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.

from larynx.

tjiho avatar tjiho commented on June 27, 2024

'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.

excellent could have mutiple pronunciation depending of the context. It has been pronunced like a verb. ils excellent en mathématiques

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@tjiho, I don't like a pronunciation 'ton' :)
It's very different from original one.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Tu descends sous ton nez...

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Here are the same sentences with a word break (#) added after the liason: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR?usp=sharing

Do these sound better or worse?

from larynx.

alt131 avatar alt131 commented on June 27, 2024

@synesthesiam, 'Ton excellent vin.' is OK
"Je vis en Amérique." is OK
I check another and write later.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Sorry "Je vis en Amérique." Liason is lost but pronunciation is still better))

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

OK, I'll keep the word breaks in then. This seems like progress at least 🙂
Thanks for all your help!

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Please generate these sentences also
Un bâtiment est en vue de l’île.
Sa vie n’était pas en danger
let's check them too

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

I've added them here: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR

from larynx.

alt131 avatar alt131 commented on June 27, 2024

'Sa vie n’était pas en danger' is OK
'Un bâtiment est en vue de l’île.' I hear "est en" as something 'e' 'd' 'ɑ̃', but it should be 'e' 't' 'ɑ̃'

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Check phonems for it if they are OK then do nothing.

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']

😕

from larynx.

alt131 avatar alt131 commented on June 27, 2024

'#', 'ɛ', '#', 'ɑ̃', '#' should be '#', 'ɛ', 't', '#', 'ɑ̃', '#'. 't' was lost.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Another example for you 'Amalia est en danger.'

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Ah, I'm missing the verb -> vowel case. Hang on.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

Let's check this too "C`est incroyable!"

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Updated the Google Drive directory.

DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']

DEBUG:larynx:Words for 'Amalia est en danger.': ['amalia/PROPN', 'est/AUX', 'en/ADP', 'danger/NOUN', './PUNCT']
DEBUG:larynx:Phonemes for 'Amalia est en danger.': ['#', 'a', 'm', 'a', 'l', 'j', 'a', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'd', 'ɑ̃', 'ʒ', 'e', '#', '‖', '‖']

DEBUG:larynx:Words for 'C'est incroyable!': ["c'est/AUX", 'incroyable/ADJ', '!/PUNCT']
DEBUG:larynx:Phonemes for 'C'est incroyable!': ['#', 's', 'ɛ', 't', '#', 'ɛ̃', 'k', 'ʁ', 'w', 'a', 'j', 'a', 'b', 'l', '#', '‖', '‖']

from larynx.

alt131 avatar alt131 commented on June 27, 2024

They are OK. (Phonems)

from larynx.

alt131 avatar alt131 commented on June 27, 2024

The pronunciation is OK too.

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Great, thanks!

I've uploaded new code for gruut and larynx as well as the French model with POS tagging. I won't be able to update Docker images until later.

from larynx.

alt131 avatar alt131 commented on June 27, 2024

DEBUG:larynx:Words for 'je peux vous aider à le retrouver': ['je', 'peux', 'vous', 'aider', 'à', 'le', 'retrouver']
DEBUG:larynx:Phonemes for 'je peux vous aider à le retrouver': ['#', 'ʒ', 'ə', '#', 'p', 'ø', '#', 'v', 'u', '#', 'e', 'd', 'e', '#', 'a', '#', 'l', 'ə', '#', 'ʁ', 'ə', 't', 'ʁ', 'u', 'v', 'e', '#', '‖', '‖']

no liason in vous_aider and sound 'z' was lost in phonems

from larynx.

alt131 avatar alt131 commented on June 27, 2024

'Chacun est uni à l`arbre de vie.'
sh: 1: arbre: not found
sh: 1: chacun: not found

And then:
DEBUG:hifi_gan:Initializing denoiser
Traceback (most recent call last):
File "/usr/local/bin/larynx", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/larynx/main.py", line 165, in main
line_id, line = line.split(args.id_delimiter, maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)

from larynx.

alt131 avatar alt131 commented on June 27, 2024

You have at least 2 bugs in French models.
First bug. Current NNs were trained without the liason.
Second bug. To compeтsate first bug you added # in liason.
I think you need to delete # in liason and train swiss model for test. I believe the result will be better quality and more accurate
And bug #5 still exists for example for 'livre' or 'homme' phonemes are OK but pronunciation is not (inside a word)

from larynx.

synesthesiam avatar synesthesiam commented on June 27, 2024

Haven't updated the Docker images yet. I had to roll back to push a different fix.

The ValueError you got is likely from leaving the --csv command-line argument on while passing in sentences without an id field (like id|text).

from larynx.

alt131 avatar alt131 commented on June 27, 2024

The ValueError you got is likely from leaving the --csv command-line argument on while passing in sentences without an id field (like id|text).

No, it's because in " l`arbre" used no standard apostrophe. If I change it on standard ', then it works OK.

from larynx.

zopieux avatar zopieux commented on June 27, 2024

Just wanted to mention:

Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']

On this pronunciation sample, I hear a D-sound rather than the expected T-sound: est “D”un grand instead of est “T”un grand.

from larynx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.