Hi, Thank you for creating a nifty bit of code with this package. It

I just changed the API endpoint for GitHub - <a class="commit-link" data-hovercard-typ

Implement Speaker diarization,about ropensci/googlelanguager

Comments (33)

kgarnick commented on June 15, 2024 2

Yes, it runs perfectly now. Thanks again for all your work on this project.

from googlelanguager.

eric-kruger commented on June 15, 2024 2

Kudos Mark! Thanks for your excellent and continued work on this project!

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024 1

There are many new features that can be configured, for now to allow more flexibility I've added a customConfig object you can pass in that can enable any new features. e.g.

 ## Use a custom configuration
 my_config <- list(encoding = "LINEAR16",
                   enableSpeakerDiarization = TRUE,
                   diarizationSpeakerCount = 3)

# languageCode is required, so will be added if not in your custom config
gl_speech(my_audio, languageCode = "en-US", customConfig = my_config)

Refer to here to see what options you can add - https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig

This will apply to #48 as well, and other features such as enhanced models and recognition metadata.

I would appreciate feedback if these should be moved to more formal arguments, or leave those for more standard use cases and rely on custom objects for the advanced use cases.

from googlelanguager.

kgarnick commented on June 15, 2024 1

The following runs without the customConfig argument, but not with it:

    my_config <- list(encoding = "LINEAR16",
                      enableSpeakerDiarization = TRUE,
                      diarizationSpeakerCount = 2)
    transcript <- gl_speech("wavfile.wav", sampleRateHertz = wavfile$sample.rate, languageCode = "en-US", customConfig = my_config)

Error: API returned: Invalid JSON payload received. Unknown name "enable_speaker_diarization" at 'config': Cannot find field.
Invalid JSON payload received. Unknown name "diarization_speaker_count" at 'config': Cannot find field.

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024 1

I just changed the API endpoint for GitHub - 2a86fb0 will test myself but if you are on now give it another go with the new GitHub version

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024 1

That should work now. Its only available in the beta endpoint (currently v1p1beta1), not the v1 endpoint it was defaulting to.

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Yes I have just come back from NEXT where we got a demo of the service, very cool and will be the next thing to work on.

from googlelanguager.

eric-kruger commented on June 15, 2024

Thanks Mark.
I look forward to it.
E

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

I'll keep this open to track the feature.

from googlelanguager.

kgarnick commented on June 15, 2024

Awesome -- thanks for your hard work on this.

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Ok that didn't help - will run through some tests and get it working. The syntax will be above though

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Tests
Documentation

from googlelanguager.

johnfrombluff commented on June 15, 2024

Thanks for your work on this great package, Mark. I'm now wondering how to pass the metadata field of RecognitionConfig via your package. I'd like to set the metadata, particularly interaction type (https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#RecognitionMetadata). Is this currently possible?

Thanks!

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Untested but this should work:

my_config <- list(metadata = list(interactionType = "DISCUSSION"))

transcript <- gl_speech("wavfile.wav", customConfig = my_config)

You will need the GitHub version

from googlelanguager.

pehkawn commented on June 15, 2024

Hi Mark. I am trying to implement the extended functionality (punctuation, speaker diarization, enhanced speech) recognition when using gl_speech(). I ran the following test code:

gcs_upload(choose.file())
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech("gs://[bucket]/audio3", encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)

This gives the following feedback after the end of the process:

> gl_speech_op(o1g1k1)

2018-11-02 15:54:38 -- Asychronous transcription finished.
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match
NULL
Warning message:
In call_api() :
  API Data failed to parse.  Returning parsed from JSON content.
                    Use this to test against your data_parse_function.

I tried removing the customConfig argument, and the call then returns the transcript. Any idea what may cause the problem? (I am pretty new to R and programming in general, so I have no idea how to interpret this.)

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

May I check what version you are using? sessionInfo() output would be useful.

from googlelanguager.

pehkawn commented on June 15, 2024

Hi again, sessionInfo() output is:

> sessionInfo("googleLanguageR")
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Norwegian Bokmål_Norway.1252  LC_CTYPE=Norwegian Bokmål_Norway.1252   
[3] LC_MONETARY=Norwegian Bokmål_Norway.1252 LC_NUMERIC=C                            
[5] LC_TIME=Norwegian Bokmål_Norway.1252    

attached base packages:
character(0)

other attached packages:
[1] googleLanguageR_0.2.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19              rstudioapi_0.8            knitr_1.20               
 [4] magrittr_1.5              googleCloudStorageR_0.4.0 grDevices_3.5.1          
 [7] R6_2.3.0                  rlang_0.3.0.1             fansi_0.4.0              
[10] httr_1.3.1                tools_3.5.1               utils_3.5.1              
[13] utf8_1.1.4                cli_1.0.1                 googleAuthR_0.6.3        
[16] htmltools_0.3.6           stats_3.5.1               datasets_3.5.1           
[19] rprojroot_1.3-2           openssl_1.0.2             yaml_2.2.0               
[22] assertthat_0.2.0          digest_0.6.18             tibble_1.4.2             
[25] base_3.5.1                crayon_1.3.4              zip_1.0.0                
[28] purrr_0.2.5               base64enc_0.1-3           graphics_3.5.1           
[31] curl_3.2                  evaluate_0.12             memoise_1.1.0            
[34] mime_0.6                  rmarkdown_1.10            compiler_3.5.1           
[37] pillar_1.3.0              backports_1.1.2           methods_3.5.1            
[40] jsonlite_1.5

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Hmm your code works for me. Can you see if you have the same trouble with the test audio file?

test_audio <- system.file(package = "googleLanguageR", "woman1_wb.wav")

library(googleLanguageR)
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech(test_audio, encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)
gl_speech_op(o1g1k1)
2018-11-02 16:45:13 -- - started at 2018-11-02T15:45:12.585792Z - last update: 2018-11-02T15:45:13.149485Z
## Send to gl_speech_op() for status
## 8713868634506813015
> gl_speech_op(o1g1k1)
2018-11-02 16:45:25 -- Asynchronous transcription finished.
$transcript
# A tibble: 1 x 2
  transcript                                                                                                        confidence
  <chr>                                                                                                             <chr>     
1 20 minutter mens du animals is free cookie that it difficult matter and the and sometimes It's necessary to do so 0.82878345

$timings
   startTime endTime      word speakerTag
1         0s  0.200s        20          1
2     0.200s  0.600s  minutter          1
3     0.600s      1s      mens          1
4         1s  1.300s        du          1
5     1.300s  1.900s   animals          1
6     1.900s  2.100s        is          1
7     2.100s  2.300s      free          1
8     2.300s  2.600s    cookie          1
9     2.600s  2.900s      that          1
10    2.900s      3s        it          1
11        3s  3.200s difficult          1
12    3.200s  3.800s    matter          1
13    3.800s  4.200s       and          1
14    4.200s  4.300s       the          1
15    4.300s  4.500s       and          1
16    4.500s      5s sometimes          1
17        5s  5.200s      It's          1
18    5.200s  5.400s necessary          1
19    5.400s  5.800s        to          1
20    5.800s      6s        do          1
21        6s  6.200s        so          1

>sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] googleLanguageR_0.2.0.9000

loaded via a namespace (and not attached):
 [1] fansi_0.3.0       utf8_1.1.4        digest_0.6.15     crayon_1.3.4      assertthat_0.2.0  R6_2.2.2          jsonlite_1.5     
 [8] magrittr_1.5      pillar_1.3.0      httr_1.3.1        cli_1.0.0         rlang_0.2.2       curl_3.2          rstudioapi_0.7   
[15] googleAuthR_0.6.3 tools_3.5.0       purrr_0.2.5       yaml_2.1.19       compiler_3.5.0    base64enc_0.1-3   memoise_1.1.0    
[22] openssl_1.0.2     tibble_1.4.2

from googlelanguager.

pehkawn commented on June 15, 2024

I tried with the test audio file, and this works. (Interestingly, it's transcribed to English, even if language is set to Norwegian.) My audio file is considerably longer (~36 min), but it shouldn't exceed the limit. Also, it was transcribed when I removed the customConfig arguments. Somehow that makes a difference.

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Curious :) Ok well from your error message I can see the transcribe works, it just the final parsing in R that fails. If it was my file I would put in a browser() command in the file before parsing so to inspect the object returned and be able to see what is failing. I'm happy to do this if you are able to make the file available to me

from googlelanguager.

pehkawn commented on June 15, 2024

Hi again. Sorry for the late reply. Due to some privacy issues regarding the content of this file, I'd have to check if I can make it available to you. I am pretty new to R. Can you explain to me how I can use the browser() command in this context?

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

I understand. In that case, could you try cloning this repo, then loading the package locally by opening it in another RStudio session (Build should be top right)

You then have a local copy of this library rather than relying on the GitHub version, so you can change the code (located in the R folder). browser() is a debugging tool that stops the execution of functions so you can inspect the state at that time. I suggest adding it to this line:
https://github.com/ropensci/googleLanguageR/blob/master/R/speech-to-text.R#L182

Rebuild the package, then try your API call. It should stop the function at the above line, then you can inspect the object x to see the data as it comes out raw from the API. I imagine the next lines trigger the error, hopefully you can diagnose why once you have the object x.

If not, just posting the structure of object x here should help me diagnose it. (str(x)) I suspect its because there are extra columns in the response that are not expected.

Change the code and reload the package to try again, if it works, open a pull request :)

from googlelanguager.

pehkawn commented on June 15, 2024

Ok, calling gl_speech_op() now gives the following output:

> gl_speech_op(o1g1k1) 
2018-11-06 14:13:55 -- Asychronous transcription finished. 
Called from: parse_speech(x$response) 
Browse[1]>  
debug at C:/Users/[path]/RTestProject/googleLanguageR/googleLanguageR-master/R/speech-to-text.R#183: transcript <- my_map_df(x$results$alternatives, ~as_tibble(cbind(transcript = ifelse(!is.null(.x$transcript),
      .x$transcript, NA), confidence = ifelse(!is.null(.x$confidence),
      .x$confidence, NA)))) 
Browse[2]>  
debug at C:/Users/[path]/RTestProject/googleLanguageR/googleLanguageR-master/R/speech-to-text.R#189: timings <- my_map_df(x$results$alternatives, ~.x$words[[1]]) 
Browse[2]>  
Error in rbind(deparse.level, ...) :
    numbers of columns of arguments do not match 
NULL 
Warning message: 
In call_api() :
   API Data failed to parse.  Returning parsed from JSON content.
                     Use this to test against your data_parse_function.

Does this help?

from googlelanguager.

tylerclover commented on June 15, 2024

I'm experiencing the same issue.

2019-01-08 17:58:23 -- Asychronous transcription finished.
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match
NULL
Warning message:
In call_api() :
  API Data failed to parse.  Returning parsed from JSON content.
                    Use this to test against your data_parse_function.

test file works

test_audio <- system.file(package = "googleLanguageR", "woman1_wb.wav")
my_config <- list(enableAutomaticPunctuation = TRUE, enableSpeakerDiarization = TRUE, diarizationSpeakerCount = 3)
o1g1k1 <- gl_speech(test_audio, encoding = c("FLAC"), languageCode = "nb-NO", sampleRateHertz = 48000, asynch = TRUE, customConfig = my_config)
gl_speech_op(o1g1k1)

but my own file throws the error above

test_gcs <- "gs://path/to/myfile.wav"
my_config <- list(encoding="MULAW", 
                  sampleRateHertz=8000, 
                  enableSpeakerDiarization = TRUE,
                  diarizationSpeakerCount = 2)
async <- gl_speech(test_gcs, async=TRUE, languageCode = "en-US", customConfig = my_config)

sessionInfo("googleLanguageR") output is

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
character(0)

other attached packages:
[1] googleLanguageR_0.2.0.9000

loaded via a namespace (and not attached):
 [1] zip_1.0.0                 Rcpp_1.0.0                pillar_1.3.1              compiler_3.5.1           
 [5] googleAuthR_0.7.0         methods_3.5.1             prettyunits_1.0.2         base64enc_0.1-3          
 [9] remotes_2.0.2             utils_3.5.1               tools_3.5.1               grDevices_3.5.1          
[13] testthat_2.0.1            digest_0.6.18             pkgbuild_1.0.2            pkgload_1.0.2            
[17] jsonlite_1.6              memoise_1.1.0             tibble_2.0.0              pkgconfig_2.0.2          
[21] rlang_0.3.1.9000          cli_1.0.1                 rstudioapi_0.8            curl_3.2                 
[25] yaml_2.2.0                withr_2.1.2               httr_1.4.0                googleCloudStorageR_0.4.0
[29] fs_1.2.6                  desc_1.2.0                graphics_3.5.1            datasets_3.5.1           
[33] stats_3.5.1               hms_0.4.2                 devtools_2.0.1            rprojroot_1.3-2          
[37] glue_1.3.0                base_3.5.1                R6_2.3.0                  processx_3.2.0           
[41] fansi_0.4.0               sessioninfo_1.1.1         readr_1.3.1               purrr_0.2.5              
[45] callr_3.0.0               magrittr_1.5              usethis_1.4.0             backports_1.1.3          
[49] ps_1.2.1                  assertthat_0.2.0          utf8_1.1.4                openssl_1.1              
[53] crayon_1.3.4

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Could you install remotes:install_github("MarkEdmondson1234/googleAuthR") then run the command again which should leave a gar_parse_error.rds in your working directory. Then call googleAuthR::gar_debug_parsing() so we can examine the object that is coming back from the API and failing to parse

from googlelanguager.

tylerclover commented on June 15, 2024

Ok, I have the gar_parse_error() object in memory.
What can I tell you about it?

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

There should be some output when you put it through the function above, in particular the contents of the $response is what is needed to diagnose what fields are not parsed.

from googlelanguager.

tylerclover commented on June 15, 2024

I can't share the content of $response because the audio is a phone call that contains Protected Health Information (PHI). I imagine that's what you want to see. Can you be more specific about what you are looking for? It looks like everything is there. Is there a function I can run to manually parse it?

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Ok fair enough, I really need the structure so a str() on the object response should be sufficient.

from googlelanguager.

tylerclover commented on June 15, 2024

Here's the output of str() with all of the word and transcript contents redacted.

List of 3
 $ data_parse_args: list()
 $ data_parse_func:function (x)  
 $ content        :List of 4
  ..$ name    : chr ""
  ..$ metadata:List of 4
  .. ..$ @type          : chr "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata"
  .. ..$ progressPercent: int 100
  .. ..$ startTime      : chr "2019-01-09T16:06:00.449507Z"
  .. ..$ lastUpdateTime : chr "2019-01-09T16:15:21.973116Z"
  ..$ done    : logi TRUE
  ..$ response:List of 2
  .. ..$ @type  : chr "type.googleapis.com/google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse"
  .. ..$ results:'data.frame':	117 obs. of  2 variables:
  .. .. ..$ alternatives:List of 117
  .. .. .. ..$ :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. ..$ transcript: chr 
  .. .. .. .. ..$ confidence: num 0.733
  .. .. .. .. ..$ words     :List of 1
  .. .. .. .. .. ..$ :'data.frame':	13 obs. of  3 variables:
  .. .. .. .. .. .. ..$ startTime: chr [1:13] "0.800s" "1.400s" "2.600s" "2.700s" ...
  .. .. .. .. .. .. ..$ endTime  : chr [1:13] "1.400s" "2.600s" "2.700s" "2.900s" ...
  .. .. .. .. .. .. ..$ word     : chr [1:13] 
  .. .. .. ..$ :'data.frame':	1 obs. of  3 variables:
  .. .. .. .. ..$ transcript: chr 
  .. .. .. .. ..$ confidence: num 0.684
  .. .. .. .. ..$ words     :List of 1
  .. .. .. .. .. ..$ :'data.frame':	18 obs. of  3 variables:
  .. .. .. .. .. .. ..$ startTime: chr [1:18] "7.700s" "8.300s" "8.600s" "8.700s" ...
  .. .. .. .. .. .. ..$ endTime  : chr [1:18] "8.300s" "8.600s" "8.700s" "9.100s" ...
  .. .. .. .. .. .. ..$ word     : chr [1:18] 
 ...etc....
  .. .. ..$ languageCode: chr [1:117]

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

Looks like the word output changes when enabling speechDiarization - from https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/speech/recognize#SpeechRecognitionAlternative

Output only. A list of word-specific information for each recognized word. Note: When enableSpeakerDiarization is true, you will see all the words from the beginning of the audio.

If I can get a good sample file of a few people talking (WAV) I will include it within the package to help test and cater for the above.

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

To account for the different data structure returned, gl_speech() now returns a list of two data.frames - $transcript and $timings with one entry per alternative transcript. It seems the speech diarization appears only for one of them, but perhaps that will be different for your sample. If it is present, then its within a speakerTagcolumn.

    ## Use a custom configuration
    my_config1 <- list(encoding = "LINEAR16",
                       enableSpeakerDiarization = TRUE,
                       diarizationSpeakerCount = 3)
    speaker_d_test <- "gs://mark-edmondson-public-read/boring_conversation.wav"
    t1 <- gl_speech(speaker_d_test,
                    languageCode = "en-US",
                    customConfig = my_config1,
                    asynch = TRUE)

    result3 <- gl_speech_op(t1)

giving output:

str(result3)
List of 2
 $ transcript:'data.frame':	2 obs. of  4 variables:
  ..$ transcript  : chr [1:2] "I can't even believe Michigan went for two against OSU it's just play it safe with the field goal is 10,000 yar"| __truncated__ " best wireless food looks amazing thank you let me just tell you yesterday like om freaking GA was crazy I didn"| __truncated__
  ..$ confidence  : chr [1:2] "0.8717281" "0.957514"
  ..$ languageCode: chr [1:2] "en-us" "en-us"
  ..$ channelTag  : logi [1:2] NA NA
 $ timings   :List of 2
  ..$ :'data.frame':	32 obs. of  3 variables:
  .. ..$ startTime: chr [1:32] "0s" "0.200s" "0.500s" "0.600s" ...
  .. ..$ endTime  : chr [1:32] "0.200s" "0.500s" "0.600s" "0.800s" ...
  .. ..$ word     : chr [1:32] "I" "can't" "even" "believe" ...
  ..$ :'data.frame':	145 obs. of  4 variables:
  .. ..$ startTime : chr [1:145] "0s" "0.200s" "0.500s" "0.600s" ...
  .. ..$ endTime   : chr [1:145] "0.200s" "0.500s" "0.600s" "0.800s" ...
  .. ..$ word      : chr [1:145] "I" "can't" "even" "believe" ...
  .. ..$ speakerTag: int [1:145] 1 1 1 1 1 1 1 1 1 1 ...

from googlelanguager.

MarkEdmondson1234 commented on June 15, 2024

The Recognition Object is different now, this is the best reference:
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#SpeakerDiarizationConfig

In R this is:

my_config <- list(encoding = "LINEAR16",
                  diarizationConfig = list(
                     enableSpeakerDiarization = TRUE,
                   minSpeakerCount = 2,
                   maxSpeakerCount = 3
                  ))

from googlelanguager.

Implement Speaker diarization about googlelanguager HOT 33 CLOSED

Comments (33)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent