Speech corpus of Armenian question-answer dialogues

This is a corpus of elicited controlled speech. The stimuli was a sequence of dialogues with intermittent fillers. This repository is for only the stimuli. The stimuli was designed to elicit intonation patterns for questions and answers in two Armenian dialects: Western Armenian (WA) and Eastern Armenian (EA). The recordings can be used for topics like intonation prosody, forced alignment, or ASR (Automatic Speech Recognition).

The dataset is is open-access at 8,852 dialogues, consisting of 23,711 utterances (individual sound files), for a total of 2.7GB and 8.5hrs. Each utterance has a sound file, a Praat TextGrid (with full linguistic annotation), and text file that has orthographic forms for easier ASR uses. Pronunciation dictionaries are provided for ASR or forced alignment purposes as well. We genereted a forced alignment for these recordings using a cross-language alignment thanks to Interlingual-MFA. See the Alignments folder.

If you use the data in any way, please cite us as:

Chakmakjian, Samuel and Hossep Dolatian. 2022. Speech corpus of Armenian question-answer dialogues.

Stimuli design

Overview

A dialogue is made up of at least a question (Q) and an answer (A). Some dialogues include an interjection (I) and a negated verb (N). We call all these elements (Q, A, I, N) utterances.

The question and answer were SOV sentences. The dialogues were of three types, each with a different position of focus. Focus was either on the subject, object, or verb. Dialogues also varied in the choice of the object word. The object word could have either final stress, penultimate stress, or initial stress.

File utterance-metadata (in Excel and TSV versions) has metadata on the conditions for each recorded utterance.

Dialogue types and focus type

The following is the template for the dialogues. The actual recordings vary in the TARGET word for the object. Note that for Western Armenian, our speakers were from Syria. They usually didn’t aspirate.

Type		Subject focus dialogue
Question	IPA (WA)	ov	TARGET	əsɑv
	IPA (EA)	ov	TARGET	ɑsɑt͡sʰ
	Gloss	who	TARGET	said
	Translation	Who said TARGET?
	Orthography	Ո՞վ «TARGET» ըսաւ/ասաց։
Answer	IPA (WA)	mɑɾjɑmə	TARGET	əsɑv
	IPA (EA)	mɑɾjɑmə	TARGET	ɑsɑt͡sʰ
	Gloss	Mariam	TARGET	said
	Translation	Mariam said TARGET.
	Orthography	Մարիամը «TARGET» ըսաւ/ասաց։

Type		Object focus dialogue
Question	IPA (WA)	mɑɾjɑmə	int͡ʃ	əsɑv
	IPA (EA)	mɑɾjɑmə	int͡ʃʰ	ɑsɑt͡sʰ
	Gloss	Mariam	what	said
	Translation	What did Mariam say?
	Orthography	Մարիամը ի՞նչ ըսաւ/ասաց։
Answer	IPA (WA)	mɑɾjɑmə	TARGET	əsɑv
	IPA (EA)	mɑɾjɑmə	TARGET	ɑsɑt͡sʰ
	Gloss	Mariam	TARGET	said
	Translation	Mariam said TARGET.
	Orthography	Մարիամը «TARGET» ըսաւ/ասաց։

Type		Verb focus dialogue
Question	IPA (WA)	mɑɾjɑmə	TARGET	ɡɑɾtɑt͡s
	IPA (EA)	mɑɾjɑmə	TARGET	kɑɾtʰɑt͡sʰ
	Gloss	Mariam	TARGET	read
	Translation	Did Mariam read TARGET?
	Orthography	Մարիամը «TARGET» կարդա՞ց։
Interjection	IPA (WA)	vot͡ʃ
	IPA (EA)	vot͡ʃʰ
	Gloss	no
	Translation	No
	Orthography	Ոչ
Answer	IPA (WA)	mɑɾjɑmə	TARGET	əsɑv
	IPA (EA)	mɑɾjɑmə	TARGET	ɑsɑt͡sʰ
	Gloss	Mariam	TARGET	said
	Translation	Mariam said TARGET.
	Orthography	Մարիամը «TARGET» ըսաւ/ասաց
Negation	IPA (WA)	t͡ʃəɡɑɾtɑt͡s
	IPA (EA)	t͡ʃʰəkɑɾtʰɑt͡sʰ
	Gloss	not.read
	Translation	She didn't read.
	Orthography	չկարդաց։

In the typical case, each type of question and answer sentence had its own special intonational contour, summarized in the following table.

Focus type	Utterance
	Question (q)	Answer (a)
Subject focus (tS)	Pitch-rise on subject Post-focal deaccenting Final rise (WA) Final fall (EA)	Pitch-rise on subject Post-focal deaccenting Final fall
Object focus (tO)	Pitch-rise on object Post-focal deaccenting Final rise (WA) Final fall (EA)	Pitch-rise on object Post-focal deaccenting Final fall
Verb focus (tV)	Pitch-rise on verb = final rise Optional pre-focal deaccenting	Optional pitch-rise on verb Final fall

Stress type of target word

The TARGET word varies in its stress location. It has one of the following conditions.

Stress type (code)	Subcategory	Example WA	Example EA	Orthography	Translation
Final (s3)		dɑniki	tɑnikʰi	տանիքի	of the roof
Final (s3a)	adverb		sutoɾen	սուտորեն	falsely
Penult (s2)	ends in /-ə/	kid͡zeɾə	ɡit͡seɾə	գիծերը	the lines
Penult (s2s)	ends in /-əs/	bɑdiʒəs	pɑtiʒəs	պատիժս	my punishment
Penult (s2t)	ends in /-ət/	mɑdidət	mɑtitət	մատիտդ	your punishment
Initial (s1o)	ordinal	uteɾoɾt	utʰeɾoɾtʰ	ութերորդ	eighth
Initial (s1a)	adverb	sudoɾen		սուտօրէն	falsely

Materials

Recordings were made with 19 speakers: 10 for Eastern Armenian (5 female, 5 male) and 9 for Western Armenian (5 female, 4 male). In terms of origin, the Eastern Armenian speakers were from Yerevan, Armenia, while the Western Armenian speakers were from Aleppo, Syria. All 19 speakers were living in Yerevan during the time of the recording. Speaker metadata is in file speaker-metadata (in Excel and TSV versions).

The participants were recorded reading the dialogues on a PowerPoint presentation. In our annotation, we broke up each dialogue into its component utterances (Q, A, I, N) using a Praat script. Each utterance is found in the repository in the form of a sound file .wav, a Praat TextGrid .TextGrid, and a transcript file .txt. Data is in the data folder.

We annotated the recordings with information on quality. Most recordings had little to no disfluencies or background noise. These are found in the data-few-issues.

Some recorded examples however had such problems. Files were annotated with the symbol _? if they had a mild issue in data-moderate-issues, and _0 if they had a severe issue in data-severe-issues. We list such problems:

Mild or moderate issues:
- focus-unclear: The intonation is ambiguous.
- laughing: The participant is laughing.
- noise-mild: There is mild background noise.
- pause-mild: There is a small felicitous pause in the middle of the sentence.
- pause-noise-mild: There is both mild background noise and a small pause.
- unclear-segments: A segment was pronounced unclearly.
Severe issues:
- focus-wrong-intonation: The participant used the wrong intonation.
- noise-extreme: There is extreme background noise.
- pause-extreme: There is a long infelicitous pause in the middle of the sentence.
- pause-noise-extreme: There is both extreme noise and a long pause.
- not-template: The utterance was misread in a way that doesn't fit into our templates, such as omitting the subject.
- stutter-or-missing-sound: The participant stuttered in speech or omitted a sound.

We provided forced alignments using for the data-few-issues recordings. See the Alignments folder.

Recommendations

The recordings can be used for different purposes. We plan on using them for work on intonation phonetics and forced alignment. For phonetic studies, recordings with no or moderate issues can be suitable. But recordings with severe issues are not ideal or recommended. But for forced alignment, the recordings with severe issues might still be useful as a way to prevent overfitting or accommodating noisy data.

The transcript files .txt are to make forced alignment tasks easier. The pronunciation dictionaries for Western Armenian and Eastern Armenian are for forced alignment purposes.

License

The dataset is made available to the research community licensed under the GNU General Public License v3.0.

Contact

Feel free to contact us at [email protected] if you have any questions or concerns.

jhdeov / armenian-intonation Goto Github PK

armenian-intonation's Introduction

Speech corpus of Armenian question-answer dialogues

Stimuli design

Overview

Dialogue types and focus type

Stress type of target word

Materials

Recommendations

License

Contact

armenian-intonation's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent