Comments (33)
Yes, it runs perfectly now. Thanks again for all your work on this project.
from googlelanguager.
Kudos Mark! Thanks for your excellent and continued work on this project!
from googlelanguager.
There are many new features that can be configured, for now to allow more flexibility I've added a customConfig object you can pass in that can enable any new features. e.g.
## Use a custom configuration
my_config <- list(encoding = "LINEAR16",
enableSpeakerDiarization = TRUE,
diarizationSpeakerCount = 3)
# languageCode is required, so will be added if not in your custom config
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)
Refer to here to see what options you can add - https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig
This will apply to #48 as well, and other features such as enhanced models and recognition metadata.
I would appreciate feedback if these should be moved to more formal arguments, or leave those for more standard use cases and rely on custom objects for the advanced use cases.
from googlelanguager.
The following runs without the customConfig argument, but not with it:
my_config <- list(encoding = "LINEAR16",
enableSpeakerDiarization = TRUE,
diarizationSpeakerCount = 2)
transcript <- gl_speech("wavfile.wav", sampleRateHertz = wavfile$sample.rate, languageCode = "en-US", customConfig = my_config)
Error: API returned: Invalid JSON payload received. Unknown name "enable_speaker_diarization" at 'config': Cannot find field.
Invalid JSON payload received. Unknown name "diarization_speaker_count" at 'config': Cannot find field.
from googlelanguager.
I just changed the API endpoint for GitHub - 2a86fb0 will test myself but if you are on now give it another go with the new GitHub version
from googlelanguager.
That should work now. Its only available in the beta endpoint (currently v1p1beta1
), not the v1 endpoint it was defaulting to.
from googlelanguager.
Yes I have just come back from NEXT where we got a demo of the service, very cool and will be the next thing to work on.
from googlelanguager.
Thanks Mark.
I look forward to it.
E
from googlelanguager.
I'll keep this open to track the feature.
from googlelanguager.
Awesome -- thanks for your hard work on this.
from googlelanguager.
Ok that didn't help - will run through some tests and get it working. The syntax will be above though
from googlelanguager.
- Tests
- Documentation
from googlelanguager.
Thanks for your work on this great package, Mark. I'm now wondering how to pass the metadata field of RecognitionConfig via your package. I'd like to set the metadata, particularly interaction type (https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#RecognitionMetadata). Is this currently possible?
Thanks!
from googlelanguager.
Untested but this should work:
my_config <- list(metadata = list(interactionType = "DISCUSSION"))
transcript <- gl_speech("wavfile.wav", customConfig = my_config)
You will need the GitHub version
from googlelanguager.
Hi Mark. I am trying to implement the extended functionality (punctuation, speaker diarization, enhanced speech) recognition when using gl_speech()
. I ran the following test code:
gcs_upload(choose.file())
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech("gs://[bucket]/audio3", encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)
This gives the following feedback after the end of the process:
> gl_speech_op(o1g1k1)
2018-11-02 15:54:38 -- Asychronous transcription finished.
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
NULL
Warning message:
In call_api() :
API Data failed to parse. Returning parsed from JSON content.
Use this to test against your data_parse_function.
I tried removing the customConfig
argument, and the call then returns the transcript. Any idea what may cause the problem? (I am pretty new to R and programming in general, so I have no idea how to interpret this.)
from googlelanguager.
May I check what version you are using? sessionInfo() output would be useful.
from googlelanguager.
Hi again, sessionInfo() output is:
> sessionInfo("googleLanguageR")
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Norwegian Bokmål_Norway.1252 LC_CTYPE=Norwegian Bokmål_Norway.1252
[3] LC_MONETARY=Norwegian Bokmål_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian Bokmål_Norway.1252
attached base packages:
character(0)
other attached packages:
[1] googleLanguageR_0.2.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 rstudioapi_0.8 knitr_1.20
[4] magrittr_1.5 googleCloudStorageR_0.4.0 grDevices_3.5.1
[7] R6_2.3.0 rlang_0.3.0.1 fansi_0.4.0
[10] httr_1.3.1 tools_3.5.1 utils_3.5.1
[13] utf8_1.1.4 cli_1.0.1 googleAuthR_0.6.3
[16] htmltools_0.3.6 stats_3.5.1 datasets_3.5.1
[19] rprojroot_1.3-2 openssl_1.0.2 yaml_2.2.0
[22] assertthat_0.2.0 digest_0.6.18 tibble_1.4.2
[25] base_3.5.1 crayon_1.3.4 zip_1.0.0
[28] purrr_0.2.5 base64enc_0.1-3 graphics_3.5.1
[31] curl_3.2 evaluate_0.12 memoise_1.1.0
[34] mime_0.6 rmarkdown_1.10 compiler_3.5.1
[37] pillar_1.3.0 backports_1.1.2 methods_3.5.1
[40] jsonlite_1.5
from googlelanguager.
Hmm your code works for me. Can you see if you have the same trouble with the test audio file?
test_audio <- system.file(package = "googleLanguageR", "woman1_wb.wav")
library(googleLanguageR)
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech(test_audio, encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)
gl_speech_op(o1g1k1)
2018-11-02 16:45:13 -- - started at 2018-11-02T15:45:12.585792Z - last update: 2018-11-02T15:45:13.149485Z
## Send to gl_speech_op() for status
## 8713868634506813015
> gl_speech_op(o1g1k1)
2018-11-02 16:45:25 -- Asynchronous transcription finished.
$transcript
# A tibble: 1 x 2
transcript confidence
<chr> <chr>
1 20 minutter mens du animals is free cookie that it difficult matter and the and sometimes It's necessary to do so 0.82878345
$timings
startTime endTime word speakerTag
1 0s 0.200s 20 1
2 0.200s 0.600s minutter 1
3 0.600s 1s mens 1
4 1s 1.300s du 1
5 1.300s 1.900s animals 1
6 1.900s 2.100s is 1
7 2.100s 2.300s free 1
8 2.300s 2.600s cookie 1
9 2.600s 2.900s that 1
10 2.900s 3s it 1
11 3s 3.200s difficult 1
12 3.200s 3.800s matter 1
13 3.800s 4.200s and 1
14 4.200s 4.300s the 1
15 4.300s 4.500s and 1
16 4.500s 5s sometimes 1
17 5s 5.200s It's 1
18 5.200s 5.400s necessary 1
19 5.400s 5.800s to 1
20 5.800s 6s do 1
21 6s 6.200s so 1
>sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] googleLanguageR_0.2.0.9000
loaded via a namespace (and not attached):
[1] fansi_0.3.0 utf8_1.1.4 digest_0.6.15 crayon_1.3.4 assertthat_0.2.0 R6_2.2.2 jsonlite_1.5
[8] magrittr_1.5 pillar_1.3.0 httr_1.3.1 cli_1.0.0 rlang_0.2.2 curl_3.2 rstudioapi_0.7
[15] googleAuthR_0.6.3 tools_3.5.0 purrr_0.2.5 yaml_2.1.19 compiler_3.5.0 base64enc_0.1-3 memoise_1.1.0
[22] openssl_1.0.2 tibble_1.4.2
from googlelanguager.
I tried with the test audio file, and this works. (Interestingly, it's transcribed to English, even if language is set to Norwegian.) My audio file is considerably longer (~36 min), but it shouldn't exceed the limit. Also, it was transcribed when I removed the customConfig arguments. Somehow that makes a difference.
from googlelanguager.
Curious :) Ok well from your error message I can see the transcribe works, it just the final parsing in R that fails. If it was my file I would put in a browser()
command in the file before parsing so to inspect the object returned and be able to see what is failing. I'm happy to do this if you are able to make the file available to me
from googlelanguager.
Hi again. Sorry for the late reply. Due to some privacy issues regarding the content of this file, I'd have to check if I can make it available to you. I am pretty new to R. Can you explain to me how I can use the browser()
command in this context?
from googlelanguager.
I understand. In that case, could you try cloning this repo, then loading the package locally by opening it in another RStudio session (Build should be top right)
You then have a local copy of this library rather than relying on the GitHub version, so you can change the code (located in the R folder). browser()
is a debugging tool that stops the execution of functions so you can inspect the state at that time. I suggest adding it to this line:
https://github.com/ropensci/googleLanguageR/blob/master/R/speech-to-text.R#L182
Rebuild the package, then try your API call. It should stop the function at the above line, then you can inspect the object x to see the data as it comes out raw from the API. I imagine the next lines trigger the error, hopefully you can diagnose why once you have the object x.
If not, just posting the structure of object x here should help me diagnose it. (str(x)
) I suspect its because there are extra columns in the response that are not expected.
Change the code and reload the package to try again, if it works, open a pull request :)
from googlelanguager.
Ok, calling gl_speech_op()
now gives the following output:
> gl_speech_op(o1g1k1)
2018-11-06 14:13:55 -- Asychronous transcription finished.
Called from: parse_speech(x$response)
Browse[1]>
debug at C:/Users/[path]/RTestProject/googleLanguageR/googleLanguageR-master/R/speech-to-text.R#183: transcript <- my_map_df(x$results$alternatives, ~as_tibble(cbind(transcript = ifelse(!is.null(.x$transcript),
.x$transcript, NA), confidence = ifelse(!is.null(.x$confidence),
.x$confidence, NA))))
Browse[2]>
debug at C:/Users/[path]/RTestProject/googleLanguageR/googleLanguageR-master/R/speech-to-text.R#189: timings <- my_map_df(x$results$alternatives, ~.x$words[[1]])
Browse[2]>
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
NULL
Warning message:
In call_api() :
API Data failed to parse. Returning parsed from JSON content.
Use this to test against your data_parse_function.
Does this help?
from googlelanguager.
I'm experiencing the same issue.
2019-01-08 17:58:23 -- Asychronous transcription finished.
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
NULL
Warning message:
In call_api() :
API Data failed to parse. Returning parsed from JSON content.
Use this to test against your data_parse_function.
test file works
test_audio <- system.file(package = "googleLanguageR", "woman1_wb.wav")
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech(test_audio, encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)
gl_speech_op(o1g1k1)
but my own file throws the error above
test_gcs <- "gs://path/to/myfile.wav"
my_config <- list(encoding="MULAW",
sampleRateHertz=8000,
enableSpeakerDiarization = TRUE,
diarizationSpeakerCount = 2)
async <- gl_speech(test_gcs, async=TRUE, languageCode = "en-US", customConfig = my_config)
sessionInfo("googleLanguageR")
output is
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
character(0)
other attached packages:
[1] googleLanguageR_0.2.0.9000
loaded via a namespace (and not attached):
[1] zip_1.0.0 Rcpp_1.0.0 pillar_1.3.1 compiler_3.5.1
[5] googleAuthR_0.7.0 methods_3.5.1 prettyunits_1.0.2 base64enc_0.1-3
[9] remotes_2.0.2 utils_3.5.1 tools_3.5.1 grDevices_3.5.1
[13] testthat_2.0.1 digest_0.6.18 pkgbuild_1.0.2 pkgload_1.0.2
[17] jsonlite_1.6 memoise_1.1.0 tibble_2.0.0 pkgconfig_2.0.2
[21] rlang_0.3.1.9000 cli_1.0.1 rstudioapi_0.8 curl_3.2
[25] yaml_2.2.0 withr_2.1.2 httr_1.4.0 googleCloudStorageR_0.4.0
[29] fs_1.2.6 desc_1.2.0 graphics_3.5.1 datasets_3.5.1
[33] stats_3.5.1 hms_0.4.2 devtools_2.0.1 rprojroot_1.3-2
[37] glue_1.3.0 base_3.5.1 R6_2.3.0 processx_3.2.0
[41] fansi_0.4.0 sessioninfo_1.1.1 readr_1.3.1 purrr_0.2.5
[45] callr_3.0.0 magrittr_1.5 usethis_1.4.0 backports_1.1.3
[49] ps_1.2.1 assertthat_0.2.0 utf8_1.1.4 openssl_1.1
[53] crayon_1.3.4
from googlelanguager.
Could you install remotes:install_github("MarkEdmondson1234/googleAuthR")
then run the command again which should leave a gar_parse_error.rds in your working directory. Then call googleAuthR::gar_debug_parsing()
so we can examine the object that is coming back from the API and failing to parse
from googlelanguager.
Ok, I have the gar_parse_error()
object in memory.
What can I tell you about it?
from googlelanguager.
There should be some output when you put it through the function above, in particular the contents of the $response is what is needed to diagnose what fields are not parsed.
from googlelanguager.
I can't share the content of $response because the audio is a phone call that contains Protected Health Information (PHI). I imagine that's what you want to see. Can you be more specific about what you are looking for? It looks like everything is there. Is there a function I can run to manually parse it?
from googlelanguager.
Ok fair enough, I really need the structure so a str() on the object response should be sufficient.
from googlelanguager.
Here's the output of str()
with all of the word
and transcript
contents redacted.
List of 3
$ data_parse_args: list()
$ data_parse_func:function (x)
$ content :List of 4
..$ name : chr ""
..$ metadata:List of 4
.. ..$ @type : chr "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata"
.. ..$ progressPercent: int 100
.. ..$ startTime : chr "2019-01-09T16:06:00.449507Z"
.. ..$ lastUpdateTime : chr "2019-01-09T16:15:21.973116Z"
..$ done : logi TRUE
..$ response:List of 2
.. ..$ @type : chr "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse"
.. ..$ results:'data.frame': 117 obs. of 2 variables:
.. .. ..$ alternatives:List of 117
.. .. .. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. .. .. ..$ transcript: chr
.. .. .. .. ..$ confidence: num 0.733
.. .. .. .. ..$ words :List of 1
.. .. .. .. .. ..$ :'data.frame': 13 obs. of 3 variables:
.. .. .. .. .. .. ..$ startTime: chr [1:13] "0.800s" "1.400s" "2.600s" "2.700s" ...
.. .. .. .. .. .. ..$ endTime : chr [1:13] "1.400s" "2.600s" "2.700s" "2.900s" ...
.. .. .. .. .. .. ..$ word : chr [1:13]
.. .. .. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. .. .. ..$ transcript: chr
.. .. .. .. ..$ confidence: num 0.684
.. .. .. .. ..$ words :List of 1
.. .. .. .. .. ..$ :'data.frame': 18 obs. of 3 variables:
.. .. .. .. .. .. ..$ startTime: chr [1:18] "7.700s" "8.300s" "8.600s" "8.700s" ...
.. .. .. .. .. .. ..$ endTime : chr [1:18] "8.300s" "8.600s" "8.700s" "9.100s" ...
.. .. .. .. .. .. ..$ word : chr [1:18]
...etc....
.. .. ..$ languageCode: chr [1:117]
from googlelanguager.
Looks like the word output changes when enabling speechDiarization - from https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize#SpeechRecognitionAlternative
Output only. A list of word-specific information for each recognized word. Note: When enableSpeakerDiarization is true, you will see all the words from the beginning of the audio.
If I can get a good sample file of a few people talking (WAV) I will include it within the package to help test and cater for the above.
from googlelanguager.
To account for the different data structure returned, gl_speech()
now returns a list of two data.frames - $transcript
and $timings
with one entry per alternative transcript. It seems the speech diarization appears only for one of them, but perhaps that will be different for your sample. If it is present, then its within a speakerTag
column.
## Use a custom configuration
my_config1 <- list(encoding = "LINEAR16",
enableSpeakerDiarization = TRUE,
diarizationSpeakerCount = 3)
speaker_d_test <- "gs://mark-edmondson-public-read/boring_conversation.wav"
t1 <- gl_speech(speaker_d_test,
languageCode = "en-US",
customConfig = my_config1,
asynch = TRUE)
result3 <- gl_speech_op(t1)
giving output:
str(result3)
List of 2
$ transcript:'data.frame': 2 obs. of 4 variables:
..$ transcript : chr [1:2] "I can't even believe Michigan went for two against OSU it's just play it safe with the field goal is 10,000 yar"| __truncated__ " best wireless food looks amazing thank you let me just tell you yesterday like om freaking GA was crazy I didn"| __truncated__
..$ confidence : chr [1:2] "0.8717281" "0.957514"
..$ languageCode: chr [1:2] "en-us" "en-us"
..$ channelTag : logi [1:2] NA NA
$ timings :List of 2
..$ :'data.frame': 32 obs. of 3 variables:
.. ..$ startTime: chr [1:32] "0s" "0.200s" "0.500s" "0.600s" ...
.. ..$ endTime : chr [1:32] "0.200s" "0.500s" "0.600s" "0.800s" ...
.. ..$ word : chr [1:32] "I" "can't" "even" "believe" ...
..$ :'data.frame': 145 obs. of 4 variables:
.. ..$ startTime : chr [1:145] "0s" "0.200s" "0.500s" "0.600s" ...
.. ..$ endTime : chr [1:145] "0.200s" "0.500s" "0.600s" "0.800s" ...
.. ..$ word : chr [1:145] "I" "can't" "even" "believe" ...
.. ..$ speakerTag: int [1:145] 1 1 1 1 1 1 1 1 1 1 ...
from googlelanguager.
The Recognition Object is different now, this is the best reference:
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#SpeakerDiarizationConfig
In R this is:
my_config <- list(encoding = "LINEAR16",
diarizationConfig = list(
enableSpeakerDiarization = TRUE,
minSpeakerCount = 2,
maxSpeakerCount = 3
))
from googlelanguager.
Related Issues (20)
- Error with speaker diarization HOT 7
- I am getting this error on passing a wav file into readwave function. HOT 1
- Support SSML for text-to-speech HOT 2
- Support device profiles in text to speech
- Authenticated website examples
- Step #1: API returned: Invalid JSON payload received. Unknown name "enableSpeakerDiarization" at 'config': Cannot find field. HOT 1
- Package has a VignetteBuilder field but no prebuilt vignette index.
- Possible issue with asynch call? HOT 2
- gl_speech request almost always times out, no proper error message HOT 18
- no access to Google Cloud Service HOT 2
- Entity sentiment shows but document sentiment shows NA HOT 4
- googleLanguageR does not translate tweets. HOT 8
- lack of MP3 encoding HOT 1
- Error midway and no translated text output HOT 3
- Call for co-maintainers :-) HOT 3
- Split calls in gl_translate more effectively - not all or nothing? HOT 2
- Error: lexical error: invalid char in json text. HOT 1
- Link to package webpage broken - gives 404 on markedmonson.me ...
- Add support to translate files HOT 7
- gl_talk: language code detection HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from googlelanguager.