jaekookang / p2fa_py3 Goto Github PK

Penn Phonetics Lab Forced Aligner Toolkit (P2FA) for Python3

Python 100.00%

p2fa python3 forcedaligner

p2fa_py3's Introduction

P2FA for Python3.x

This is a modified version of P2FA for Python3 compatibility. Everything else remains the same as the original P2FA. Forced alignment helps to align linguistic units (e.g., phoneme or words) with the corresponding sound file. All you need is to have a sound file with a transcription file. The output will be .TextGrid file with time-aligned phone, word and optionally state-level tiers.

This was tested on macOS Ventura and Arch Linux.

1. Install HTK

First, you need to download HTK source code (http://htk.eng.cam.ac.uk/). This HTK installation guide is retrieved from Link. (2021-04-13) Installation is based on macOS Sierra. (2023-05-06) Installation is based on macOS Ventura 13.3.1 (a) - Apple M1 Max

For 1.1 Arch Linux

I couldn't run HTK-3.4.1 on Arch Linux. I switched to 3.4.0 and everything works fine. Installation of HTK is the same as the one described below.

Unzip HTK-3.4.1.tar.gz file. I unzipped htk under the current repository so as to keep it in the same directory.

$ tar -xvf HTK-3.4.1.tar.gz

After extracting the tar file, switch to htk directory.

$ cd htk

Compile HTK in the htk directory.

$ export CPPFLAGS=-UPHNALG
$ ./configure --disable-hlmtools --disable-hslab
$ make clean    # necessary if you're not starting from scratch
$ make all
$ sudo make install # use "sudo" to make htk functions available for all users

1.2 For Ubuntu 20.04.5 LTS

(Tested as of 2023-05-06)

# Because I assume 64-bit platform, you may need to install 32-bit headers and libraries first (See: https://stackoverflow.com/a/54082790/7170059)
$ sudo apt-get install gcc-multilib

$ ./configure --disable-hlmtools --disable-hslab
$ make clean    # necessary if you're not starting from scratch
$ make all
$ sudo make install # use "sudo" to make htk functions available for all users

# Quick test if HVite works to confirm that htk functions are installed correctly
$ HVite

1.3 For macOS

(Tested as of 2023-05-06) You may need to follow these steps before compiling HTK:

# Add CPPFLAGS, LIBRARY_PATH
$ export CPPFLAGS=-I/opt/X11/include
$ export LIBRARY_PATH=/opt/X11/lib

# If the above doesn't work, do 
$ ln -s /opt/X11/include/X11 /usr/local/include/X11

# Replace line 21 (#include <malloc.h>) of HTKLib/strarr.c as below
#   include <malloc/malloc.h> 

# Replace line 1650 (labid != splabid) of HTKLib/HRec.c as below
#   labpr != splabid
# This step will prevent "ERROR [+8522] LatFromPaths: Align have dur<=0"
# See: https://speechtechie.wordpress.com/2009/06/12/using-htk-3-4-1-on-mac-os-10-5/

# Compile with options if necessary
$ ./configure 
$ make all
$ sudo make install  # use "sudo" to make htk functions available for all users

# Quick test if HVite works to confirm that htk functions are installed correctly
$ HVite

1.4 Troubleshooting

If the "make" command generates errors such as:

HGraf.c:73:10: fatal error: 'X11/Xlib.h' file not found

you may need to install XQuartz. Download XQuartz from this site. You will have /opt/X11 folder generated with necessary files. You need to manually compile HTKLib with the following command:

$ cd HTKLib
# Compile
$ gcc  -ansi -g -O2 -DNO_AUDIO -D'ARCH="darwin"' -I/usr/include/malloc -Wall -Wno-switch -g -O2 -I. -DPHNALG   -c -o HGraf.o HGraf.c -I /opt/X11/include
$ cd ..
# Set the path
$ export LIBRARY_PATH=/opt/X11/lib
# Run it again
$ make all
$ make install
(See: http://unixnme.blogspot.com/2018/01/build-htk-on-macos.html)

If you encounter strarr.c:21:10: fatal error: 'malloc.h' file not found, then comment out #include <malloc.h> and add #include <stdlib.h> instead as follows. (See: JoFrhwld/FAVE#48 (comment))
If you see errors like HTrain.c implicitly declaring library function 'finite', replace all finite functions in HTKLib/HTrain.c with isfinite. (See: https://trac.macports.org/ticket/61614)
Architecture errors like esignal.c:1184:25: error: use of undeclared identifier 'ARCH' requires fixing it with the right architecture specifier. Open HTKLib/esignal.cand replace lines including ARCH as in (strcmp(architecture, ARCH) == 0) with "darwin" as in (strcmp(architecture, "darwin") == 0) (See: https://wstyler.ucsd.edu/posts/p2fa_mac.html)

2. Install sox

$ sudo apt-get install sox

# or in Arch

$ sudo pacman -S sox

# or using brew

$ brew install sox

3. Run

stand alone

$ python align.py examples/ploppy.wav examples/ploppy.txt examples/ploppy.TextGrid

as part of your code

You can invoke the aligner from your code:

from p2fa import align

phoneme_alignments, word_alignments = align.align('WAV_FILE_PATH', 'TRANSCRIPTION_FILE_PATH')

# or 

phoneme_alignments, word_alignments, state_alignments = align.align('WAV_FILE_PATH', 'TRANSCRIPTION_FILE_PATH', state_align=True)

4. Result

With state-alignments

TODO

Updated installation guide
Refactor align.py

References

http://www.ling.upenn.edu/phonetics/p2fa/
Jiahong Yuan and Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Proceedings of Acoustics '08.
https://github.com/prosodylab/Prosodylab-Aligner (P2FA seems better than Prosodylab-Aligner based on my qualitative evaluation)
English HMM-state level aligner: Link
Korean Forced Aligner: Link from EMCSLabs.
Installing p2fa on Mac Link

p2fa_py3's People

Contributors

Stargazers

Watchers

Forkers

papagandalf cranndarach davebraze iyuba domhnallboyle zge summersnowzgq fantasyyyy salmedina daksithj pietrop sourcery-ai-bot hlp2819 death-from-ai road2018 donny-son jemoka liaosishi msub2

p2fa_py3's Issues

bits/libc-header-start.h issue

"bits/libc-header-start.h: No such file or directory" in ubuntu
solution:
sudo apt install libc6-dev-i386

HCopy: command not found

Hi, I am getting sh: Hcopy: command not found when I run align.align. I am running on mac and have installed htk. Any recommendations would be greatly appreciated.

HTK compatibility during installation

My apologies if the request is misplaced but I am having difficulties installing HTK dependency for the aligner.

I followed the instructions you provide for installing HTK but keep getting an error about lgcc:

/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/10/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/10/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc
collect2: error: ld returned 1 exit status

Our systems administrator says that the gcc library is installed, and the error is likely due to incompatibility issues with the HTK toolkit (3.4.1) requiring an older version of the library.

Could you comment on this and/or provide some suggestions on how to fix the error?
Thank you!

Comparisons between p2fa and montreal-forced-aligner

I think there are at least two major options for doing forced alignment (FA), which I am familiar with; that is, p2fa and montreal-forced-aligner (https://montreal-forced-aligner.readthedocs.io/). They were developed using different tools and dataset, so it is not easy to make one-to-one comparisons. Still, I found some differences when visually checked on a TextGrid file (eg. 19-198-0001.wav from Librispeech) from both tools. This is not a thorough comparison; rather, it is to share with people how both FA tools might differ for their boundary predictions. Please correct me if I'm wrong or add your impressions.

Here are some of my observations:

Fricatives tend to be longer in duration in p2fa than montreal aligner (/sh, f/).
Stops are sometimes shorter, especially the stop interval, in p2fa than montreal aligner.
Approximants (/l, r/) and r-colored vowels differ, but not consistently. p2fa seems to make conservative boundary judgments, but not always (/w/).
In Words tier, "sp" or "sil" label does not exist in montreal aligner.

Example files: 19-198-0001.zip

Long audio suggestion

I'm trying to run the aligner on long audio files (~1 hour long) and this takes a lot of time to complete. I can see that most of the time is for HVite to complete. Are there any parameters that I can change to accelerate the process. Do you have any suggestions? Thanks!

Supporting languages other than English

Hi Jaekoo,

Does this forced alignment tool support languages other than English, for example, Chinese?

Thanks,
Bing

skipping word .... .... ...

thank you @jaekookang for sharing your work. i am trying to generate visemes . but the least i could get is generating phonemes by your repo. but it is giving me this
_SKIPPING WORD COFFEEMAKER
SKIPPING WORD HARDER”
SKIPPING WORD IRRITATION”
SKIPPING WORD CARRIE”
SKIPPING WORD “
SKIPPING WORD WHAT”
SKIPPING WORD “
SKIPPING WORD DON’T
SKIPPING WORD THERE”
SKIPPING WORD THEM”
SKIPPING WORD I’M
SKIPPING WORD SORRY”
SKIPPING WORD HARD”
SKIPPING WORD IT”
SKIPPING WORD SAID”
SKIPPING WORD I’M
SKIPPING WORD SHOULDN’T
SKIPPING WORD HELP”
SKIPPING WORD FRONT”
SKIPPING WORD OK”
SKIPPING WORD NODDED”
SKIPPING WORD YOU’LL
SKIPPING WORD IT”
SKIPPING WORD “
SKIPPING WORD DON’T
SKIPPING WORD KNOW”
SKIPPING WORD “
SKIPPING WORD DON’T
SKIPPING WORD YOU’VE
SKIPPING WORD DECIDED”
SKIPPING WORD “
SKIPPING WORD GUESS”
SKIPPING WORD DIDN’T
SKIPPING WORD I’D
SKIPPING WORD LAND”
SKIPPING WORD GOD”
SKIPPING WORD QUIETLY”
SKIPPING WORD GOD”
SKIPPING WORD EYE”
SKIPPING WORD TOMORROW”
SKIPPING WORD HEAD”
SKIPPING WORD THAT’S
SKIPPING WORD SEWING”
SKIPPING WORD COULDN’T
SKIPPING WORD COULDN’T
SKIPPING WORD JAMIE—IT
SKIPPING WORD …
SKIPPING WORD I’D
SKIPPING WORD WHATEVER—MY
SKIPPING WORD HEART—AND
SKIPPING WORD SHE’D
SKIPPING WORD THERE’D
SKIPPING WORD DIDN’T
SKIPPING WORD AND-SO
SKIPPING WORD AND-SUCH
SKIPPING WORD SHE’D
SKIPPING WORD DONE—I
SKIPPING WORD DIDN’T
SKIPPING WORD …
SKIPPING WORD SHE’D
SKIPPING WORD BEEN—AND
SKIPPING WORD THERE’D
SKIPPING WORD MOTHER—WHAT
SKIPPING WORD SAY”
SKIPPING WORD WRONG”
and this is a long list
what do you think could be the problem and how can i solve it.

Question 2

As i am trying to generate viseme, do you think i can use this table

to map phonemes to visemes.

I will really appreciate any help.

Thanks

How Processes multiple files at the same time

Hello. I have 100 different audio files of people pronouncing the exact word. Please, I need to bulk align the different audio with one text file. Is there a way I can achieve that?

add p2fa_py3 under this project

aligned.mlf not being created by script

Hello! I hope this project is still being maintained, as I'm currently having an issue with the forced alignment in python. I am running this code to try to perform forced alignment on the data file based on the tutorial in the readme:
phoneme_alignments, word_alignments = align.align('filepath/pn_pra_OL7_au.wav', 'test.txt')

However, when I try to run it, I get this error message:

creating plp...
 HCopy -T 1 -C C:\Users\rocco\Documents\1-School_Stuff\ling400\p2fa/model\16000\config -S C:\Users\rocco\AppData\Local\Temp\p2fa\codetr.scp
running viterbi...
 HVite -T 1 -a -m -I C:\Users\rocco\AppData\Local\Temp\p2fa\tmp.mlf -H C:\Users\rocco\Documents\1-School_Stuff\ling400\p2fa/model\16000\macros -H C:\Users\rocco\Documents\1-School_Stuff\ling400\p2fa/model\16000\hmmdefs -S C:\Users\rocco\AppData\Local\Temp\p2fa\test.scp -i C:\Users\rocco\AppData\Local\Temp\p2fa\aligned.mlf -p 0.0 -s 5.0 C:\Users\rocco\AppData\Local\Temp\p2fa\dict C:\Users\rocco\Documents\1-School_Stuff\ling400\p2fa/model\monophones > C:\Users\rocco\AppData\Local\Temp\p2fa\aligned.results
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_38416\805211500.py in <module>
----> 1 phoneme_alignments, word_alignments = align.align('filepath/pn_pra_OL7_au.wav', 'test.txt')

~\Documents\1-School_Stuff\ling400\p2fa\align.py in align(wavfile, trsfile, outfile, wave_start, wave_end, sr_override, model_path, custom_dict, state_align, verbose)
    449         state_alignments = None
    450 
--> 451     _alignments = read_aligned_mlf(output_mlf, sr, float(wave_start))
    452     phoneme_alignments, word_alignments = make_alignment_lists(_alignments)
    453 

~\Documents\1-School_Stuff\ling400\p2fa\align.py in read_aligned_mlf(mlffile, SR, wave_start)
    186     # TODO: extract log-likelihood score
    187 
--> 188     f = open(mlffile, 'r')
    189     lines = [l.rstrip() for l in f.readlines()]
    190     f.close()

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\rocco\\AppData\\Local\\Temp\\p2fa\\aligned.mlf'

This errors out because it cant find the file "aligned.mlf" in the temp folder. I can't create this file myself, since the program clears this folder upon each run. Hopefully, I've done something simple wrong and you can let me know! Thanks!

Validation

Hello! How did you/do you validate your results? For example, for a given text file how does one know that the transcription is correct? When you were writing this did you just compare the output to other versions of the aligner? Thank you for writing this! It's so much easier to set up and use than other versions.