Code Monkey home page Code Monkey logo

abbyyr's Introduction

Access Abbyy Cloud OCR from R

Build Status Appveyor Build status CRAN_Status_Badge codecov Research software impact Github Stars

Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.

The package provides access to the Abbyy Cloud OCR SDK API. Details about results of calls to the API can be found here.

Installation

To get the latest version on CRAN:

install.packages("abbyyR")

To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)

Using abbyyR

To get acquainted with some of the important functions, read the vignettes:

# Overview of the package
vignette("introduction", package = "abbyyR")
# some functions are used along with output
vignette("example", package = "abbyyR")
# how to scrape text from a folder of images
vignette("wiscads", package = "abbyyR")

The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.

License

Scripts are released under the MIT License.

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

abbyyr's People

Contributors

michaelchirico avatar mjlassila avatar saudiwin avatar soodoku avatar wosiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abbyyr's Issues

ocrFile() error

Hi.

When trying to use the ocrFile() function, I enter the below syntax:

ocrFile(file_path="~/#R_Projects/officeR/ImageOnly.pdf", output_dir="~/#R_Projects/officeR/abbyyR/", exportFormat = "pdfa", save_to_file = TRUE)

However, I get the following error: Error in curl_download(finishedlist$resultUrl[res$id == finishedlist$id], : Argument 'url' must be string.

I found a response to a similar issue at stackoverflow, but running that syntax didn't help either.

Not sure what is going wrong. Any help is greatly appreciated.

trouble using ocrFile()

I'm attempting to use the ocrFile() function to read pdf image of a table. If I use Abbyyr's "finreaderonline.com" with my pdfs I get a nice .csv table back.

When I use ocrFile() I get the following error message:
Error in Ops.factor(res$id, listFinishedTasks()$id) : level sets of factors are different

If I run getAppInfo()
getAppInfo() Name of Application: 1 No. of Pages Remaining: 1 No. of Fields Remaining: 1 Application Credits Expire on: 1 Type: 1

Any ideas about what I could check?

Error: HTTP failure: 403

I've got the error below.


Brief description of the problem

library(abbyyR)
setapp(c("tidyverse", "password"))

getAppInfo()

Error: HTTP failure: 403

Region option is not enabled for processImage() function

In processImage():

querylist <- list(language = language, letterSet = letterSet, 
    regExp = regExp, textType = textType, oneTextLine = oneTextLine, 
    oneWordPerTextLine = oneWordPerTextLine, markingType = markingType, 
    placeholdersCount = placeholdersCount, writingStyle = writingStyle, 
    description = description, pdfPassword = pdfPassword)

This omits the "region" parameters as per https://ocrsdk.com/documentation/api-reference/process-text-field-method/

Hence OCR on a region cannot be done.

ocrFile and processImage return 'Error: HTTP failure: 450'

Hello,

I'm getting the 450 error when working with both processImage and ocrFile functions. As far as I can tell, it does not depends on my app limitations nor my code repository connections, since I'm able to run all other functions that do not provide an immediate file download (such as submitImage, processDocument, etc..).

Also, I think these errors depends on some recent update in abbyyR package or in Abbyy cloud service, since I was able to run those functions up until April/May and I found the same issue posted on Abbyy forum in June (but no proper answer was given to that in the forum).

Any idea of what may be going on?

Thanks
Luca

New HTTP error 450....

Hello,

Thanks for making available your package to test Abbyy OCR.
I am processing a bunch of OCR images. I was able to submit around 100 images following the example you includes in this vignnete:

https://cloud.r-project.org/web/packages/abbyyR/vignettes/wiscads.html

But suddenly stops and responds with Error: HTTP failure: 450.

From that moment on, I cannot process any other image and even I get the same error with deleteTask() but not with listTasks(). I changed the way to connect to Internet with a mobile connection and it seemed to work again, but after submitting the 100 images, again the same problem with the same error.

I saw what you responded in this issue https://github.com/soodoku/abbyyR/issues/1 and I updated CRAN version with the one you have here in GitHub, but the problem persists.

If you know any workaround or alternative....

client error: (450) Blocked by Windows Parental Controls (Microsoft)

> library('EBImage')
> library('abbyyR')

> lnk <- 'http://www.theage.com.au/ffximage/2005/07/22/id_card1_gallery__502x329,0.jpg'
> pic <- readImage(lnk)
> display(pic)

> download.file(lnk,destfile=paste0(getwd(),'/pic.jpg'))
--2015-11-03 18:23:39--  http://www.theage.com.au/ffximage/2005/07/22/id_card1_gallery__502x329,0.jpg
Resolving www.theage.com.au (www.theage.com.au)... 104.86.110.66, 104.86.110.27
Connecting to www.theage.com.au (www.theage.com.au)|104.86.110.66|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23154 (23K) [image/jpeg]
Saving to: ‘/home/ryoeng/pic.jpg’
     0K .......... .....
> setapp(c('cloud_ocr', 'PgUbJEeFzlKMjeX/668puVFZ'))
> processPhotoId(file_path=paste0(getwd(),'/pic.jpg'), idType='auto', imageSource='auto')
..... ..                              100% 19.2M=0.001s
2015-11-03 18:23:39 (19.2 MB/s) - ‘/home/ryoeng/pic.jpg’ saved [23154/23154]
- Error in processPhotoId(file_path = paste0(getwd(), "/pic.jpg"), idType = "auto",  : 
-   client error: (450) Blocked by Windows Parental Controls (Microsoft)
> 
> file.remove(paste0(getwd(),'/pic.jpg'))
[1] TRUE

Just curous that is there any connection of your apps related to Microsoft since abbyyR is a web-base application I though? Any solution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.