The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.
Speakers |
Duration |
Episodes |
Female |
Male |
92 |
~15hrs 24 min |
52 |
43 |
38 |
Accent |
Speaker |
Duration |
Female |
Male |
Rio de Janeiro |
5 |
49 min |
2 |
3 |
Bahia |
4 |
1hr 27 min |
4 |
|
Mato Grosso do Sul |
4 |
18 min |
3 |
1 |
Maranhão |
7 |
1hr 18 min |
2 |
3 |
Minas Gerais |
~35 |
5hrs 23 min |
~13 |
~22 |
Recife |
10 |
3hrs 45 min |
|
|
São Paulo |
~25 |
1hr 18 min |
~19 |
~7 |
Rio Grande do Sul |
2 |
~53 min |
|
2 |
Accent |
Train_speakers |
Dev_speakers |
Test_speakers |
Podcasts |
Episodes |
Duration |
segments |
RE |
69 |
23 |
11 |
15 |
57 |
~48.23 |
14,008 |
SP |
52 |
18 |
15 |
11 |
78 |
~30.88 |
11,906 |