Thanks @mekarpeles for getting the ball rolling on this. I've copied over relevant portions of your spec here.
Context
There are tens of millions of semantic entities in books stored on the internet archive, many of them with corresponding web resources. This spike aims to enable interaction with URL entities that have a wayback machine resource available.
Proposal & Constraints
The proposed solution (version 1) is to extend the Internet Archive BookReader with a new "Semantic" plugin which, on page-load:
Pulls a page of region-labeled, OCR’d + text using the https://api.archivelab.org/books/{identifier}/pages/{page}/ocr?mode=words API
Hits a new entities endpoint which identifies urls (and later other semantic entities) which returns a list of:
type: e.g. url
location: (x, y, w, h)
value: e.g. https://archive.org
Highlights the corresponding region on the book containing the link and makes the region clickable to a Save Page Now version of the link
I.e., once clicked, capture the webpage if we don’t already have it, or, in either case, bring the patron to a viewable version of this url