Code Monkey home page Code Monkey logo

translation-server's Introduction

Zotero Translation Server

Build Status

The Zotero translation server lets you use Zotero translators without the Zotero client.

Installation

Running via Docker

The easiest way to run a local instance of translation-server is via Docker.

docker pull zotero/translation-server
docker run -d -p 1969:1969 --rm --name translation-server zotero/translation-server

This will pull the latest image from Docker Hub and run it as a background process on port 1969. Use docker kill translation-server to stop it.

Running from source

First, fetch the source code and install Node dependencies:

  1. git clone --recurse-submodules https://github.com/zotero/translation-server

  2. cd translation-server

  3. npm install

Once you've set up a local copy of the repo, you can run the server in various ways:

Node.js

npm start

Docker (development)

Build from the local repo and run in foreground:

docker build -t translation-server .
docker run -ti -p 1969:1969 --rm translation-server

AWS Lambda

translation-server can also run on AWS Lambda and be accessed through API Gateway. You will need the AWS SAM CLI to deploy the server.

Copy and configure config file:

cp lambda_config.env-sample lambda_config.env

Test locally:

./lambda_local_test lambda_config.env

Deploy:

./lambda_deploy lambda_config.env

You can view the API Gateway endpoint in the Outputs section of the console output.

User-Agent

By default, translation-server uses a standard Chrome User-Agent string to maximize compatibility. This is fine for personal usage, but for a deployed service, it’s polite to customize User-Agent so that sites can identify requests and contact you in case of abuse.

You can do this by setting the USER_AGENT environment variable:

USER_AGENT='my-custom-translation-server/2.0 ([email protected])' npm start

If you find that regular requests are being blocked with a fully custom user-agent string, you can also add an identifier and contact information to the end of a standard browser UA string:

export USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 my-custom-translation-server/2.0 ([email protected])'
npm start

Proxy Support

You can configure translation-server to use a proxy server by setting the HTTP_PROXY and HTTPS_PROXY environment variables:

HTTP_PROXY=http://proxy.example.com:8080 HTTPS_PROXY=http://proxy.example.com:8080 npm start

If your proxy server uses a self-signed certificate, you can set NODE_TLS_REJECT_UNAUTHORIZED=0 to force Node to ignore certificate errors.

It’s also possible to opt out of proxying for specific hosts by using the NO_PROXY variable. See the Node request library documentation for more details.

Running tests

npm test

Endpoints

Web Translation

Retrieve metadata for a webpage:

$ curl -d 'https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html' \
   -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

Returns an array of translated items in Zotero API JSON format

Retrieve metadata for a webpage with multiple results:

$ curl -d 'https://www.ncbi.nlm.nih.gov/pubmed/?term=crispr' \
   -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

Returns 300 Multiple Choices with a JSON object:

{
	"url": "https://www.ncbi.nlm.nih.gov/pubmed/?term=crispr",
	"session": "9y5s0EW6m5GgLm0",
	"items": {
		"u30044970": {
			"title": "RNA Binding and HEPN-Nuclease Activation Are Decoupled in CRISPR-Cas13a."
		},
		"u30044923": {
			"title": "Knockout of tnni1b in zebrafish causes defects in atrioventricular valve development via the inhibition of the myocardial wnt signaling pathway."
		},
		// more results
	}
}

To make a selection, delete unwanted results from the items object and POST the returned data back to the server as application/json.

Search Translation

Retrieve metadata from an identifier (DOI, ISBN, PMID, arXiv ID):

$ curl -d 10.2307/4486062 -H 'Content-Type: text/plain' http://127.0.0.1:1969/search

Export Translation

Convert items in Zotero API JSON format to a supported export format (RIS, BibTeX, etc.):

$ curl -d @items.json -H 'Content-Type: application/json' 'http://127.0.0.1:1969/export?format=bibtex'

Import Translation

Convert items in any import format to the Zotero API JSON format:

$ curl --data-binary @data.bib -H 'Content-Type: text/plain' http://127.0.0.1:1969/import

translation-server's People

Contributors

adomasven avatar dependabot[bot] avatar dhimmel avatar dstillman avatar kerphi avatar mrtcode avatar mvolz avatar retorquere avatar wetneb avatar zuphilip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translation-server's Issues

Support 'format' parameter for /web and /search

Requested in #51 (comment)

The main problem with this is that if there are no results, the export translators could fail or produce something unexpected. Generally there either should be results or it should return a 501 (e.g., if we don't recognize an identifier), but at least when text search is enabled (which we use internally for ZoteroBib, but isn't used elsewhere) a text search can result in a 200 with no results. So I'm not sure what we would do in that case, but if we're not using this parameter that could probably stay undefined. We'd need to verify that there aren't other cases (web translation that doesn't throw an error but also doesn't return results?) where an empty array could be returned.

Automatic translator updates

Pull from the translators repo at startup and connect to the streaming server for immediate updates. These should be optional.

"ReferenceError: XPathResult is not defined" in multiple metadata translators (unAPI, doi, coins etc)

curl -d '{ "query": "https://www.wikidata.org/wiki/Q1771279" }' -H 'Content-Type: application/json' http://127.0.0.1:1969/search

(3)(+0000000): Translators initialized with 523 loaded

(3)(+0000005): Listening on 0.0.0.0:1969

(3)(+0025449): HTTP GET https://www.wikidata.org/wiki/Q1771279

(3)(+0000945): Translators: Looking for translators for https://www.wikidata.org/wiki/Q1771279

(4)(+0000013): Translate: Binding sandbox to https://www.wikidata.org/wiki/Q1771279

(4)(+0000001): Translate: Parsing code for Wikidata (eaef8d43-2f17-45b3-a5cb-affb49bc5e81, 2017-07-06 12:03:00)

(4)(+0000029): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2018-05-12 15:58:17)

(2)(+0000002): Translate: Detect using unAPI failed:
ReferenceError: XPathResult is not defined

ReferenceError: XPathResult is not defined
at getUnAPIIDs (eval at (/home/marielle/Code/translation-server-v2/src/translate/sandboxManager.js:65:4), :218:88)
at detectWeb (eval at (/home/marielle/Code/translation-server-v2/src/translate/sandboxManager.js:65:4), :301:12)
at Zotero.Translate.Web._detectTranslatorLoaded (/home/marielle/Code/translation-server-v2/src/translate.js:1731:47)
at Zotero.Translate.Web. (/home/marielle/Code/translation-server-v2/src/translate.js:1715:16)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.wikidata.org/wiki/Q1771279

(4)(+0000000): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2015-06-04 03:25:10)

(2)(+0000009): Translate: Detect using COinS failed:
ReferenceError: XPathResult is not defined

ReferenceError: XPathResult is not defined
at detectWeb (eval at (/home/marielle/Code/translation-server-v2/src/translate/sandboxManager.js:65:4), :28:140)
at Zotero.Translate.Web._detectTranslatorLoaded (/home/marielle/Code/translation-server-v2/src/translate.js:1731:47)
at Zotero.Translate.Web. (/home/marielle/Code/translation-server-v2/src/translate.js:1715:16)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.wikidata.org/wiki/Q1771279

(4)(+0000000): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-02-13 19:20:46)

(3)(+0000020): Translate: Embedded Metadata: found 11 meta tags.

(3)(+0000001): Translate: Creating translate instance of type import in sandbox

(4)(+0000001): Translate: Binding sandbox to https://www.wikidata.org/wiki/Q1771279

(4)(+0000002): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38)

(3)(+0000003): Translate: Initializing RDF data store

(4)(+0000004): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2016-11-05 10:57:01)

(2)(+0000001): Translate: Detect using DOI failed:
ReferenceError: XPathResult is not defined

ReferenceError: XPathResult is not defined
at getDOIs (eval at (/home/marielle/Code/translation-server-v2/src/translate/sandboxManager.js:65:4), :43:50)
at detectWeb (eval at (/home/marielle/Code/translation-server-v2/src/translate/sandboxManager.js:65:4), :72:14)
at Zotero.Translate.Web._detectTranslatorLoaded (/home/marielle/Code/translation-server-v2/src/translate.js:1731:47)
at Zotero.Translate.Web. (/home/marielle/Code/translation-server-v2/src/translate.js:1715:16)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.wikidata.org/wiki/Q1771279

(3)(+0000000): Translate: All translator detect calls and RPC calls complete:

(3)(+0000000): Embedded Metadata: 320

(5)(+0000000): Translate: Running handler 0 for translators

(5)(+0000000): Translate: Running handler 1 for translators

(4)(+0000001): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-02-13 19:20:46)

(3)(+0000002): Translate: Beginning translation with Embedded Metadata

(3)(+0000001): Translate: Embedded Metadata: found 11 meta tags.

(3)(+0000000): Translate: Creating translate instance of type import in sandbox

(4)(+0000001): Translate: Binding sandbox to https://www.wikidata.org/wiki/Q1771279

(4)(+0000001): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38)

(3)(+0000002): Translate: Initializing RDF data store

(3)(+0000011): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000000): Translate: Saving item

(5)(+0000000): Translate: Running handler 0 for itemDone

(3)(+0000532): Translate: Looking for authors in byline, vcard

(3)(+0000038): Translate: Found 0 elements with 'byline' class

(3)(+0000007): Translate: Found 0 elements with 'vcard' class

(3)(+0000001): Translate: No byline found.

(3)(+0000619): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000000): Translate: Saving item

(3)(+0000000): Translate: Translation successful

(5)(+0000000): Translate: Running handler 0 for done

(3)(+0000001): itemToAPIJSON: Discarded field libraryCatalog: field not valid for type webpage

Google book link only gets partial data

curl -d '{ "query": "http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9" }' -H 'Content-Type: application/json' http://127.0.0.1:1969/search
Internal Server Error

node src/server.js

(3)(+0000000): Translators initialized with 523 loaded

(3)(+0000006): Listening on 0.0.0.0:1969

(3)(+0052583): HTTP GET http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9

(1)(+0000203): Error: read ECONNRESET

Error: read ECONNRESET
    at _errnoException (util.js:1024:11)
    at TCP.onread (net.js:615:25)

InternalServerError: An error occurred retrieving the document

  at Object.throw (/home/marielle/Code/translation-server-v2/node_modules/koa/lib/context.js:93:11)
  at Object.handleURL (/home/marielle/Code/translation-server-v2/src/endpoints.js:164:23)
  at <anonymous>
  at process._tickCallback (internal/process/next_tick.js:188:7)

http://books.google.de/books?hl=en&lr=&id=Ct6FKwHhBSQC&oi=fnd&pg=PP9&dq=%2522Peggy+Eaton%2522&ots=KN-Z0-HAcv&sig=snBNf7bilHi9GFH4-6-3s1ySI9Q&redir_esc=y#v=onepage&q=%2522Peggy%2520Eaton%2522&f=false

Query parameter to force non-multiple save

In some contexts, we'll always want to save the page instead of using multiple, so we should take a query parameter for /web that goes through translators until it finds one that doesn't returns an item type or falls back to generic webpage saving.

single=1?

Wikidata translation failing "No title specified for item"

The error almost looks like a translator issue but it works in translation-server but not translation-server-v2 with the most updated version of the wikidata translator.

curl -d '{ "query": "https://www.wikidata.org/wiki/Q33415777" }' -H 'Content-Type: application/json' http://127.0.0.1:1969/search

[{"key":"E4MTYBL4","version":0,"itemType":"webpage","creators":[],"tags":[],"title":"Growth of Weil-Petersson Volumes and Random Hyperbolic Surface of Large Genus","url":"https://www.wikidata.org/wiki/Q33415777","language":"en","accessDate"

(3)(+0000371): Error: No title specified for item

Error: No title specified for item
    at Object._itemDone (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:682:32)
    at Object._itemDone (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at Zotero.Item.complete (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:1:306)
    at eval (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:237:8)
    at /home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/utilities_translate.js:332:5
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(2)(+0000000): Translate: Translation using Wikidata failed:
Error: No title specified for item

Error: No title specified for item
at Object._itemDone (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:682:32)
at Object._itemDone (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
at Zotero.Item.complete (eval at (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), :1:306)
at eval (eval at (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), :237:8)
at /home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/utilities_translate.js:332:5
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.wikidata.org/wiki/Q33415777

(5)(+0000000): Translate: Running handler 0 for error

(1)(+0000000): Translation using Wikidata failed

(1)(+0000000): Error: No title specified for item

Error: No title specified for item
    at Object._itemDone (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:682:32)
    at Object._itemDone (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at Zotero.Item.complete (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:1:306)
    at eval (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:237:8)
    at /home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/utilities_translate.js:332:5
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(4)(+0000000): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-07-01 18:02:27)

(3)(+0000002): Translate: Beginning translation with Embedded Metadata

(3)(+0000001): Translate: Embedded Metadata: found 7 meta tags.

(3)(+0000000): Translate: Creating translate instance of type import in sandbox

(4)(+0000000): Translate: Binding sandbox to https://www.wikidata.org/wiki/Q33415777

(4)(+0000001): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38)

(3)(+0000001): Translate: Initializing RDF data store

(3)(+0000005): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000001): Translate: Saving item

(5)(+0000000): Translate: Running handler 0 for itemDone

(3)(+0000037): Translate: Looking for authors in byline, vcard

(3)(+0000004): Translate: Found 0 elements with 'byline' class

(3)(+0000001): Translate: Found 0 elements with 'vcard' class

(3)(+0000000): Translate: No byline found.

(3)(+0000009): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000000): Translate: Saving item

(3)(+0000001): Translate: Translation successful

(5)(+0000000): Translate: Running handler 0 for done

(3)(+0000000): itemToAPIJSON: Discarded field libraryCatalog: field not valid for type webpage

Should /web and /search support GET requests?

We were curious regarding the reason to have the /web and /search endpoints encode the query as POST data rather than GET URL Parameters? The benefits of GET would be:

  1. Easier querying for casual users. For example, queries could be single URLs that would work in a browser as opposed to requiring curl.
  2. Easier caching via a reverse proxy. We are running tranlsation-server behind nginx and would like to cache repetitive queries, but caching responses based on POST data appears to be non-standard.

Would it be possible to support both POST data and GET URL parameters? Alternatively, is there a built-in caching solution to deal with 2?

More XPath weirdness

If I do npm start and then ./translate_search 9781421402833, I get this:

(3)(+0000000): Translate: Could not find a result using Library of Congress ISBN -- trying next translator

…and a small amount of metadata.

If I run translate_search again without restarting the server, the LoC ISBN translator succeeds and I get much more data. Same for subsequent runs.

This line in MARCXML.js is returning different results for the different runs — a length of 0 and 1, respectively — despite the exact same XML going into the parseFromString() above.

Failing test on master for search endpoint

As per comment here: https://gerrit.wikimedia.org/r/#/c/mediawiki/services/zotero/+/479020/

1) /search
       should perform a text search:

      AssertionError: expected 501 to equal 300
      + expected - actual

      -501
      +300
      
      at Context.<anonymous> (test/search_test.js:59:10)
      at process._tickCallback (internal/process/next_tick.js:68:7)

"The failing test is at https://github.com/zotero/translation-server/blob/master/test/search_test.js#L54. That should help narrow down what exactly is failing and how to fix it"

Open WorldCat ISBN search has extra authors

I queried metadata for my PhD thesis using it's ISBN:

curl --silent \
  --data 9781339919881 \
   --header 'Content-Type: text/plain' \
  http://127.0.0.1:1969/search | jq

This returned:

[
  {
    "key": "AFYL2BGB",
    "version": 0,
    "itemType": "book",
    "creators": [
      {
        "firstName": "Daniel S",
        "lastName": "Himmelstein",
        "creatorType": "author"
      },
      {
        "firstName": "San Francisco",
        "lastName": "University of California",
        "creatorType": "author"
      },
      {
        "name": "Biological and Medical Informatics",
        "creatorType": "author"
      },
      {
        "firstName": "San Francisco",
        "lastName": "University of California",
        "creatorType": "author"
      }
    ],
    "tags": [],
    "libraryCatalog": "Open WorldCat",
    "language": "English",
    "title": "The hetnet awakens: understanding complex diseases through data integration and open science.",
    "date": "2016",
    "ISBN": "9781339919881",
    "abstractNote": "Human disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open. To integrate data, we pioneered the hetnet---a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century. To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease---gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method. After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data.",
    "extra": "OCLC: 970819555",
    "shortTitle": "The hetnet awakens"
  }
]

Notice the three creator objects that have creatorType of author:

      {
        "firstName": "San Francisco",
        "lastName": "University of California",
        "creatorType": "author"
      },
      {
        "name": "Biological and Medical Informatics",
        "creatorType": "author"
      },
      {
        "firstName": "San Francisco",
        "lastName": "University of California",
        "creatorType": "author"
      }

Is this an upstream issue or are these attributes misinterpreted by translation-server?

Web query returns multiple results unexpectedly

The following query:

curl --silent \
  --data 'https://zietzm.github.io/Vagelos2017/' \
  --header 'Content-Type: text/plain' \
  https://translate.manubot.org/web

Returns multiple results:

{
    "url": "https://zietzm.github.io/Vagelos2017/",
    "session": "kSPz1b6essGWbyC",
    "items": {
        "10.1038/nbt.2786": "Clinical development success rates for investigational drugs",
        "10.1038/534314a": "Can you teach old drugs new tricks?",
        "10.1016/j.jhealeco.2016.01.012": "Innovation in the pharmaceutical industry: New estimates of R&D costs",
        "10.1038/nrd3681": "Diagnosing the decline in pharmaceutical R&D efficiency",
        "10.1016/S0167-6296(02)00126-1": "The price of innovation: new estimates of drug development costs",
        "10.1038/nrd3405": "The productivity crisis in pharmaceutical R&D",
        "10.1016/0167-6296(91)90001-4": "Cost of innovation in the pharmaceutical industry",
        "10.1021/acs.jcim.7b00028": "DeepPPI: Boosting Prediction of Protein\u2013Protein Interactions with Deep Neural Networks",
        "10.1371/journal.pcbi.1004259": "Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes"
    }
}

It looks like https://zietzm.github.io/Vagelos2017/ is being interpreted as a page of search results rather than a citeable work. Interestingly, translation-server does return metadata for https://greenelab.github.io/meta-review/.

So what causes a web URL to be interpreted as containing multiple choices. For our use case, we never want multiple choices. Is there an option to disable multiple choices, such that every web query is considered a single citeable work?

PDF recognition

Tentative plan:

  1. When downloading a URL, either make a HEAD request first to see if the URL is a PDF or, if possible, gracefully handle PDF downloads in Zotero.HTTP.request() with a maximum download size.

  2. Add another endpoint that accepts PDF data.

  3. Once we have the PDF data, upload that to a new recognizer-server endpoint.

  4. recognizer-server might send the PDF data to a Lambda for pdftotext processing, or it might be in Lambda itself if we move the DB from SQLite to MySQL

  5. translation-server gets back identifiers from recognizer-server, runs translation on them, and returns metadata

Issues in production - memory exhaustion, segfaults

We've been having some issues with this in production, namely memory exhaustion and also segfaults. Unfortunately I can't provide much more information /details than that at present. Have you been having similar issues?

We had memory exhaustion issues with the older version as well, it just filled up more slowly.

Probably addressing issue #2 would be a start helping to diagnose this.

Translate: TypeError: Cannot read property 'items' of undefined in handler 0 for select

I get the following error trying to request the URL https://httpbin.org/redirect-to?url=https://en.wikipedia.org/wiki/Zotero

(3)(+0000000): TypeError: Cannot read property 'items' of undefined

TypeError: Cannot read property 'items' of undefined
    at Object.select (/home/marielle/Code/translation-server-v2/src/endpoints.js:324:18)
    at translate.setHandler (/home/marielle/Code/translation-server-v2/src/endpoints.js:134:19)
    at Zotero.Translate.Web._runHandler (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1113:32)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:594:34)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at completeCOinS (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:120:10)
    at doWeb (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:231:3)
    at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>

Unfortunately this causes to Zotero to never return a response, so we have to wait for the connection to Zotero to time out. This error should be caught.

Setting up a transation-server for CSL JSON metadata generation?

Greetings, I'm a developer of the Manubot project for writing scholarly papers on GitHub. Our tool supports citation by persistent identifier where users directly write citations into their manuscript source like [doi:10.1098/rsif.2017.0387; @pmid:29424689; @pmcid:PMC5640425; @arxiv:1806.05726]. As such, we're always looking for the most reliable ways to retrieve metadata for various sources of persistent identifiers and convert it into CSL JSON format.

@adam3smith suggested zotero/translation-server to us as per manubot/manubot#70 and recently @zuphilip mentioned it again in aurimasv/z2csl#19.

Manubot is python package combined with a continuous integration workflow for building and deploying manuscripts. So we're looking for a way to use translation-server inside a Python. With that in mind, I've got the following questions:

  1. Can translation-server produce CSL JSON for a wide variety of citation sources?
  2. Is there a public API endpoint that provides access to translation-server?
  3. If no to 2, is it easy to set up translation-server locally? Is there a Docker image? Does it require secrets that would make every instance have setup overhead?

Thanks ahead of time for your time!

**Question** Search Functionality

I have been using this translation server for websites and it works great! I am having trouble searching for anything other than a web address though.

Is it even possible to supply the server a DOI number, or say an ISBN number?

Thanks,

"TypeError: ZU.XRegExp is not a constructor" when trying to export bibtex


(3)(+0000004): Listening on 0.0.0.0:1969

(4)(+0055140): Translate: Binding sandbox to http://www.example.com/

(4)(+0000001): Translate: Parsing code for BibTeX (9cb70025-a888-4a29-a210-93ec52da40d4, 2018-03-03 13:10:16)

(3)(+0000004): Translate: Beginning translation with BibTeX

(3)(+0000003): TypeError: ZU.XRegExp is not a constructor

    TypeError: ZU.XRegExp is not a constructor
        at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
        at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
        at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
        at <anonymous>
        at process._tickCallback (internal/process/next_tick.js:188:7)

(2)(+0000000): Translate: Translation using BibTeX failed: 
TypeError: ZU.XRegExp is not a constructor

TypeError: ZU.XRegExp is not a constructor
    at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
    at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(5)(+0000001): Translate: Running handler 0 for error

(5)(+0000000): Translate: Running handler 0 for done

(node:2997) UnhandledPromiseRejectionWarning: TypeError: ZU.XRegExp is not a constructor
    at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
    at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
(node:2997) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 85)
(node:2997) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(4)(+0003467): Translate: Binding sandbox to http://www.example.com/

(4)(+0000001): Translate: Parsing code for BibTeX (9cb70025-a888-4a29-a210-93ec52da40d4, 2018-03-03 13:10:16)

(3)(+0000005): Translate: Beginning translation with BibTeX

(3)(+0000001): TypeError: ZU.XRegExp is not a constructor

    TypeError: ZU.XRegExp is not a constructor
        at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
        at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
        at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
        at <anonymous>
        at process._tickCallback (internal/process/next_tick.js:188:7)

(2)(+0000000): Translate: Translation using BibTeX failed: 
TypeError: ZU.XRegExp is not a constructor

TypeError: ZU.XRegExp is not a constructor
    at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
    at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(5)(+0000000): Translate: Running handler 0 for error

(5)(+0000001): Translate: Running handler 0 for done

(node:2997) UnhandledPromiseRejectionWarning: TypeError: ZU.XRegExp is not a constructor
    at doExport (eval at <anonymous> (/home/marielle/code/translation-server-v2/src/translation/sandboxManager.js:68:4), <anonymous>:1234:19)
    at Zotero.Translate.Export.rest (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
(node:2997) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 90)

TypeError: Cannot read property 'replace' of undefined

I get the error for the url http://pediatrics.aappublications.org/cgi/doi/10.1542/peds.2007-2362. Fortunately it is able to continue translation after encountering this error.

(2)(+0000003): Translate: Detect using COinS failed:
TypeError: Cannot read property 'replace' of undefined

TypeError: Cannot read property 'replace' of undefined
at Zotero.Utilities.Translate.parseContextObject (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/openurl.js:312:45)
at Function.parseContextObject (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
at detectWeb (eval at (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), :33:34)
at Zotero.Translate.Web._detectTranslatorLoaded (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1741:47)
at Zotero.Translate.Web. (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1725:16)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => http://rspb.royalsocietypublishing.org/content/267/1453/1627

Inconsistent results in web endpoint

Downloading the same URL 10 times in a row results in about 2 correct and 8 empty results

#!/bin/bash
#
for i in `seq 1 10`;
do
        curl -d 'http://journals.sagepub.com/doi/abs/10.1177/0004865818786763?ai=2b4&mi=ehikzz&af=R' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web
done;

2 times

[{"key":"BF48MMAQ","version":0,"itemType":"journalArticle","creators":[{"firstName":"Suzanna","lastName":"Fay","creatorType":"author"},{"firstName":"Robert","lastName":"Crutchfield","creatorType":"author"}],"tags":[],"title":"Perceptions of “others,” risk, and counter terrorism-related informal social control","date":"August 8, 2018","DOI":"10.1177/0004865818786763","publicationTitle":"Australian & New Zealand Journal of Criminology","journalAbbreviation":"Australian & New Zealand Journal of Criminology","pages":"0004865818786763","abstractNote":"Anti-terrorism messages associate immigration and minorities with terrorism even if this link is not explicit. The consequence is the potential for racial profiling of minorities as threats to national security. Recent experiences or threats of domestic terrorism, in Australia, the US, and other industrialized countries, have led policy makers to encourage informal social control in terrorism prevention efforts by appealing to citizens to report suspicious behavior to authorities. The linking of ethnically different “others”—members of Australia’s population who are or are perceived to be outsiders to the mainstream—and terrorism is important because who is seen as threatening effects whether individuals engage in informal social control; the willingness of residents to recognize, intervene, and report suspicious behavior. However, the concept of “others” in relation to informal social control is more complicated than just immigration status and ethnic identity alone. This study examines whether perceptions of “others” are related to perceptions of terrorism risk and perceptions of informal social control in reporting national security threats.","ISSN":"0004-8658","url":"https://doi.org/10.1177/0004865818786763","language":"en","libraryCatalog":"SAGE Journals","accessDate":"2018-08-24T11:37:45Z"}]

8 times

[]

Even if you use different urls, we noticed that occasionaly some results are empty and some are not, but i think the same url is a good test case.
Sequence is not always the same, sometimes e.g. only attempt 3 and 7 are correct, sometimes only attempt 5, ...

/edit: This might be URL specific, using e.g. https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html instead leads to 10 of 10 successful results

Redirected URLs aren't used for target comparison or detectWeb()/doWeb()

Doesn't get the right metadata from this url, probably has something to do with the fact that there's a redirect here. But it does get partial metadata (unlike translation-server, which won't get any) so the behaviour is different between the two.

curl -d '{ "query": "http://www.journals.cambridge.org/abstract_S0305004100013554" }' -H 'Content-Type: application/json' http://127.0.0.1:1969/search

[{"key":"R7QFA7AQ","version":0,"itemType":"webpage","creators":[{"firstName":"E.","lastName":"Schrödinger","creatorType":"author"}],"tags":[],"title":"Discussion of Probability Relations between Separated Systems","websiteTitle":"Mathematical Proceedings of the Cambridge Philosophical Society","url":"/core/journals/mathematical-proceedings-of-the-cambridge-philosophical-society/article/discussion-of-probability-relations-between-separated-systems/C1C71E1AA5BA56EBE6588AAACB9A222D","abstractNote":"<div class="abstract" data-abstract-type="normal">

The probability relations which can occur between two separated physical systems are discussed, on the assumption that their state is known by a representative in common. The two families of observables, relating to the first and to the second system respectively, are linked by at least one match between two definite members, one of either family. The word match is short for stating that the values of the two observables in question determine each other uniquely and therefore (since the actual labelling is irrelevant) can be taken to be equal. In general there is but one match, but there can be more. If, in addition to the first match, there is a second one between canonical conjugates of the first mates, then there are infinitely many matches, every function of the first canonical pair matching with the same function of the second canonical pair. Thus there is a complete one-to-one correspondence between those two branches (of the two families of observables) which relate to the two degrees of freedom in question. If there are no others, the one-to-one correspondence persists as time advances, but the observables of the first system (say) change their mates in the way that the latter, i.e. the observables of the second system, undergo a certain continuous contact-transformation.

","date":"1935/10","language":"en","accessDate":"CURRENT_TIMESTAMP","extra":"DOI: 10.1017/S0305004100013554"}]

Problem in starting server

I am getting following message when I run npm start command. Please help me with this.

image

Here it is in text format, in case someone needs to copy-paste it somewhere.

ubuntu@ip-172-31-8-27:~/translation-server$ npm start

> [email protected] start /home/ubuntu/translation-server
> node src/server.js

/home/ubuntu/translation-server/node_modules/koa/lib/application.js:62
  listen(...args) {
         ^^^

SyntaxError: Unexpected token ...
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:374:25)
    at Object.Module._extensions..js (module.js:417:10)
    at Module.load (module.js:344:32)
    at Function.Module._load (module.js:301:12)
    at Module.require (module.js:354:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/home/ubuntu/translation-server/src/server.js:28:13)
    at Module._compile (module.js:410:26)
    at Object.Module._extensions..js (module.js:417:10)

npm ERR! Linux 4.4.0-1052-aws
npm ERR! argv "/usr/bin/nodejs" "/usr/bin/npm" "start"
npm ERR! node v4.2.6
npm ERR! npm  v3.5.2
npm ERR! code ELIFECYCLE
npm ERR! [email protected] start: `node src/server.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] start script 'node src/server.js'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the translation-server package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node src/server.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs translation-server
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls translation-server
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /home/ubuntu/translation-server/npm-debug.log
ubuntu@ip-172-31-8-27:~/translation-server$ 

Responds with "Internal Server Error" when trying google books search link with no results

This might not be appropriate to put here because it causes issues with translation-server as well, but the error should be handled because it causes the server to return "internal server error".

curl -d 'https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

Internal Server Error

(3)(+0060254): HTTP GET https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(3)(+0000538): Translators: Looking for translators for https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(4)(+0000000): Translate: Binding sandbox to https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(4)(+0000001): Translate: Parsing code for Google Books (3e684d82-73a3-9a34-095f-19b112d88bbf, 2017-12-03 04:20:33)

(4)(+0000001): Translate: Parsing code for Library Catalog (InnoPAC) (4fd6b89b-2316-2dc4-fd87-61a97dd941e8, 2017-09-26 22:26:13)

(4)(+0000004): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2018-05-12 15:58:17)

(4)(+0000001): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2015-06-04 03:25:10)

(4)(+0000003): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-07-01 18:02:27)

(3)(+0000003): Translate: Embedded Metadata: found 2 meta tags.

(4)(+0000000): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2016-11-05 10:57:01)

(3)(+0000003): Translate: All translator detect calls and RPC calls complete:

(3)(+0000000): Google Books: 100

(5)(+0000000): Translate: Running handler 0 for translators

(5)(+0000000): Translate: Running handler 1 for translators

(4)(+0000000): Translate: Parsing code for Google Books (3e684d82-73a3-9a34-095f-19b112d88bbf, 2017-12-03 04:20:33)

(3)(+0000001): Translate: Beginning translation with Google Books

(3)(+0000032): Error: Translator called select items with no items

Error: Translator called select items with no items
    at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at doWeb (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:75:10)
    at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(2)(+0000000): Translate: Translation using Google Books failed:
Error: Translator called select items with no items

Error: Translator called select items with no items
at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
at doWeb (eval at (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), :75:10)
at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(5)(+0000001): Translate: Running handler 0 for error

(1)(+0000000): Translation using Google Books failed

(1)(+0000000): Error: Translator called select items with no items

Error: Translator called select items with no items
    at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at doWeb (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:75:10)
    at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

TypeError: Cannot read property 'documentElement' of undefined
at module.exports.SearchSession.saveWebpage (/home/marielle/Code/translation-server-v2/src/webSession.js:233:32)
at module.exports.SearchSession.translate (/home/marielle/Code/translation-server-v2/src/webSession.js:210:10)
at
at process._tickCallback (internal/process/next_tick.js:188:7)

(3)(+0000236): HTTP GET https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(3)(+0000428): Translators: Looking for translators for https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(4)(+0000000): Translate: Binding sandbox to https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(4)(+0000001): Translate: Parsing code for Google Books (3e684d82-73a3-9a34-095f-19b112d88bbf, 2017-12-03 04:20:33)

(4)(+0000000): Translate: Parsing code for Library Catalog (InnoPAC) (4fd6b89b-2316-2dc4-fd87-61a97dd941e8, 2017-09-26 22:26:13)

(4)(+0000004): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2018-05-12 15:58:17)

(4)(+0000001): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2015-06-04 03:25:10)

(4)(+0000004): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-07-01 18:02:27)

(3)(+0000003): Translate: Embedded Metadata: found 2 meta tags.

(4)(+0000000): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2016-11-05 10:57:01)

(3)(+0000003): Translate: All translator detect calls and RPC calls complete:

(3)(+0000000): Google Books: 100

(5)(+0000000): Translate: Running handler 0 for translators

(5)(+0000001): Translate: Running handler 1 for translators

(4)(+0000000): Translate: Parsing code for Google Books (3e684d82-73a3-9a34-095f-19b112d88bbf, 2017-12-03 04:20:33)

(3)(+0000000): Translate: Beginning translation with Google Books

(3)(+0000026): Error: Translator called select items with no items

Error: Translator called select items with no items
    at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at doWeb (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:75:10)
    at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(2)(+0000000): Translate: Translation using Google Books failed:
Error: Translator called select items with no items

Error: Translator called select items with no items
at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
at doWeb (eval at (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), :75:10)
at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
at
at process._tickCallback (internal/process/next_tick.js:188:7)
url => https://www.google.co.uk/search?tbm=bks&hl=en&q=isbn%253A0596554141

(5)(+0000000): Translate: Running handler 0 for error

(1)(+0000000): Translation using Google Books failed

(1)(+0000000): Error: Translator called select items with no items

Error: Translator called select items with no items
    at Object.selectItems (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:541:11)
    at Object.selectItems (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:91:17)
    at doWeb (eval at <anonymous> (/home/marielle/Code/translation-server-v2/src/translation/sandboxManager.js:65:4), <anonymous>:75:10)
    at Zotero.Translate.Web.rest (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1389:49)
    at loadPromise.then (/home/marielle/Code/translation-server-v2/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1379:39)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

TypeError: Cannot read property 'documentElement' of undefined
at module.exports.SearchSession.saveWebpage (/home/marielle/Code/translation-server-v2/src/webSession.js:233:32)
at module.exports.SearchSession.translate (/home/marielle/Code/translation-server-v2/src/webSession.js:210:10)
at
at process._tickCallback (internal/process/next_tick.js:188:7)

Remove support for `multiple`?

The follow-up request to a 300 for multiple results (i.e., search results, generally) needs to hit the same instance or else it gets 409 due to the session not existing.

In Lambda, we need to call another Lambda that stores the state, due to Amazon VPC restrictions. We should probably also support a config option for non-Lambda installations that takes a Redis host to store in directly.

XMLSerializer is not defined

./translate_search 10.2307/4486062 gets data from DataCite instead of Crossref, which fails with XMLSerializer is not defined.

Translation error exporting csljson

I ran the following using aed6c83:

curl --silent \
  --data 'https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html' \
   --header 'Content-Type: text/plain' \
  http://127.0.0.1:1969/web | \
curl --silent \
  --data @- \
  --header 'Content-Type: application/json' \
  'http://127.0.0.1:1969/export?format=csljson'

The following was returned:

An error occurred during translation. Please check translation with the Zotero client.

HTTP GET for BigThink URLs of web endpoint is excessively slow

We noticed that we were getting intermittent 504 Gateway Time-out errors for some translation-server queries in manubot/manubot#87 (comment). We've configured translation-server behind an nginx reverse proxy. When the error occurs, the response from https://translate.manubot.org is:

<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.14.0 (Ubuntu)</center>
</body>
</html>

This can be triggered by running the following command many times:

curl \
  --header "Content-Type: text/plain" \
  --data 'https://bigthink.com/neurobonkers/a-pirate-bay-for-science' \
  'https://translate.manubot.org/web'

Here is are the log output from the translation-server (comment indicates where the delay happens, when we watch live with tail --follow):

(3)(+0038384): HTTP GET https://bigthink.com/neurobonkers/a-pirate-bay-for-science
## Big time delay here, node using 99% of CPU during this delay
(3)(+0025766): Translators: Looking for translators for https://bigthink.com/neurobonkers/a-pirate-bay-for-science

If I run curl https://bigthink.com/neurobonkers/a-pirate-bay-for-science locally or on our translation-server's system, there is no delay getting the response. Therefore, I don't know why HTTP GET is taking so long when performed by translation-server. Perhaps something else is going on that is causing the delay? CCing @dongbohu from our team.

"ReferenceError: TLDS not defined" when trying to get pubmed article

curl -d '{"query":"https://www.ncbi.nlm.nih.gov/pubmed/14656957"}' --header "Content-Type: application/json" localhost:1969/search

Get response:
[{"key":"YY373U6A","version":0,"itemType":"webpage","url":"https://www.ncbi.nlm.nih.gov/pubmed/14656957","title":"Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. - PubMed - NCBI","abstractNote":"Hypertension. 2003 Dec;42(6):1206-52. Epub 2003 Dec 1. Guideline; Practice Guideline; Research Support, U.S. Gov't, P.H.S.","accessDate":"2018-06-14T09:34:10Z"}]

So this error is causing translation to fall back on basic data. Here's the output from translation-server-2:

node src/server.js

(3)(+0000000): Translators initialized with 523 loaded

(3)(+0000010): Listening on 0.0.0.0:1969

(3)(+0372220): HTTP GET https://www.ncbi.nlm.nih.gov/pubmed/14656957

(3)(+0001623): ReferenceError: TLDS is not defined

ReferenceError: TLDS is not defined
    at Object.getPotentialProxies (/home/marielle/Code/translation-server-v2/src/proxy.js:94:6)
    at Object.getWebTranslatorsForLocation (/home/marielle/Code/translation-server-v2/src/translators.js:185:39)
    at Zotero.Translate.Web._getTranslatorsGetPotentialTranslators (/home/marielle/Code/translation-server-v2/src/translate.js:2040:28)
    at Zotero.Translate.Web.<anonymous> (/home/marielle/Code/translation-server-v2/src/translate.js:1167:32)
    at Zotero.Translate.Web.getTranslators (/home/marielle/Code/translation-server-v2/src/promise.js:38:17)
    at HTTP.processDocuments (/home/marielle/Code/translation-server-v2/src/endpoints.js:146:25)
    at Zotero.HTTP.request.then (/home/marielle/Code/translation-server-v2/src/http.js:167:12)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

(3)(+0000001): Translate: All translator detect calls and RPC calls complete:

(3)(+0000000): No suitable translators found

(5)(+0000000): Translate: Running handler 0 for translators

Redirect / Translator selection problem (JavaScript, www.redi-bw.de => ebscohost)

URL:
http://www.redi-bw.de/db/ebsco.php/search.ebscohost.com/login.aspx%3fdirect%3dtrue%26db%3dreh%26AN%3dATLAiGFE171113003879%26site%3dehost-live

Result:
[{"key":"8VGWTYLW","version":0,"itemType":"webpage","url":"http://web.b.ebscohost.com/plink?key=10.81.11.197_8000_1381733013&scope=site&db=reh&AN=ATLAiGFE171113003879&site=ehost-live","title":"","accessDate":"2018-11-07T14:46:22Z"}]

If you open the above URL in Browser, there is a redirect to this URL:
http://web.a.ebscohost.com/ehost/detail/detail?vid=0&sid=210b94b5-d4c1-41a3-bc71-a267c3a20ce2%40sessionmgr4007&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#AN=ATLAiGFE171113003879&db=reh

If you use web endpoint on this URL directly, you get the following result:
[{"key":"M9FHFV2Q","version":0,"itemType":"journalArticle","creators":[{"firstName":"Evgenia","lastName":"Fotiou","creatorType":"author"},{"firstName":"Diana","lastName":"Riboli","creatorType":"author"},{"firstName":"Davide","lastName":"Torri","creatorType":"author"},{"firstName":"Dimitra Mari","lastName":"Varvarezou","creatorType":"author"}],"tags":[{"tag":"International Society for Academic Research on Shamanism","type":1},{"tag":"Shamanism -- Study and teaching","type":1},{"tag":"Animism","type":1},{"tag":"Peer reviewed","type":1}],"title":"The First Conference of the International Society for Academic Research on Shamanism (ISARS), Delphi, Greece, in 2015","date":"2017","journalAbbreviation":"Shaman","volume":"25","issue":"1-2","pages":"5-14","ISSN":"1216-7827","libraryCatalog":"EBSCOhost","publicationTitle":"Shaman"}]

Is there a redirect problem?

Configuration of proxy server

I am trying to figure out how to configure an outgoing proxy for translation-server and I 'll admit I am a bit confused.

Assuming I want to access https://www.nytimes.com/2018/06/11/technology/net-neutrality-repeal.html and my proxy is myproxy.example.org:8080 how should I be going around for that?

ISBN search returns extra note object causing CSL export to fail

The following search query is for an ISBN:

curl \
  --header "Content-Type: text/plain" \
  --data 'isbn:9780262517638' \
  'https://translate.manubot.org/search'

The output is:

[
    {
        "key": "ZGT6YNIL",
        "version": 0,
        "itemType": "book",
        "creators": [
            {
                "firstName": "Peter",
                "lastName": "Suber",
                "creatorType": "author"
            }
        ],
        "tags": [
            {
                "tag": "Open access publishing",
                "type": 1
            }
        ],
        "ISBN": "9780262517638",
        "title": "Open access",
        "place": "Cambridge, Mass",
        "publisher": "MIT Press",
        "date": "2012",
        "numPages": "242",
        "series": "MIT Press essential knowledge series",
        "callNumber": "Z286.O63 S83 2012",
        "extra": "OCLC: ocn754518563",
        "libraryCatalog": "Library of Congress ISBN"
    },
    {
        "itemType": "note",
        "note": "What is open access? -- Motivation -- Varieties -- Policies -- Scope -- Copyright -- Economics -- Casualties -- Future -- Self-help"
    }
]

Notice that the output array contains two objects. The second one is bizarre:

    {
        "itemType": "note",
        "note": "What is open access? -- Motivation -- Varieties -- Policies -- Scope -- Copyright -- Economics -- Casualties -- Future -- Self-help"
    }

If we pass the full JSON output to /format with the following command:

curl \
 --header "Content-Type: application/json" \
 --data @zotero-data.json \
 'https://translate.manubot.org/export?format=csljson'

We get notified that "An error occurred during translation." The translation-server stdout log contains:

(4)(+0000000): Translate: Parsing code for CSL JSON (bc03b4fe-436d-4a1f-ba59-de4d2d7a63f7, 2017-07-05 19:32:38)

(3)(+0000001): Translate: Beginning translation with CSL JSON

(3)(+0000000): TypeError: Cannot read property 'noteToTitle' of undefined

    TypeError: Cannot read property 'noteToTitle' of undefined
        at Zotero.Utilities.Translate.itemToCSLJSON (/home/translate/translation-server/modules/zotero/chrome/content/zotero/xpcom/utilities.js:1706:33)
        at Function.itemToCSLJSON (/home/translate/translation-server/src/translation/sandboxManager.js:94:17)
        at doExport (eval at <anonymous> (/home/translate/translation-server/src/translation/sandboxManager.js:68:4), <anonymous>:111:43)
        at Zotero.Translate.Export.rest (/home/translate/translation-server/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1406:49)
        at loadPromise.then (/home/translate/translation-server/modules/zotero/chrome/content/zotero/xpcom/translation/translate.js:1396:39)
        at <anonymous>
        at process._tickDomainCallback (internal/process/next_tick.js:228:7)

What's the best solution? Should the ISBN search omit the second note object? Should the CSL exporter ignore this note?

Are there other cases where the /search endpoint returns multiple results? Should we always take the first one on our end, before passing to /export?

PubMed page not detected

curl -d '{"query":"https://www.ncbi.nlm.nih.gubmed/14656957"}' --header "Content-Type: application/json" localhost:1969/search

returns

[{"key":"ZJUJ2U6S","version":0,"itemType":"webpage","creators":[{"firstName":"National Center for Biotechnology","lastName":"Information","creatorType":"author"},{"firstName":"U. S. National Library of Medicine 8600 Rockville","lastName":"Pike","creatorType":"author"},{"firstName":"Bethesda","lastName":"MD","creatorType":"author"},{"firstName":"20894","lastName":"Usa","creatorType":"author"}],"tags":[],"url":"https://www.ncbi.nlm.nih.gov/pubmed/14656957","title":"Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. - PubMed - NCBI","language":"en","accessDate":"CURRENT_TIMESTAMP"}]

Looks like the pubmed translator is failing and it's falling back on embedded metadata which doesn't do a great job.

Trace:

(3)(+0000000): Translators initialized with 523 loaded

(3)(+0000006): Listening on 0.0.0.0:1969

(3)(+0018420): HTTP GET https://www.ncbi.nlm.nih.gov/pubmed/14656957

(3)(+0001502): Translators: Looking for translators for https://www.ncbi.nlm.nih.gov/pubmed/14656957

(4)(+0000011): Translate: Binding sandbox to https://www.ncbi.nlm.nih.gov/pubmed/14656957

(4)(+0000002): Translate: Parsing code for PubMed (3d0231ce-fd4b-478c-b1d3-840389e5b68c, 2015-09-07 18:20:45)

(4)(+0000018): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2018-05-12 15:58:17)

(4)(+0000001): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2015-06-04 03:25:10)

(4)(+0000002): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-02-13 19:20:46)

(3)(+0000003): Translate: Embedded Metadata: found 24 meta tags.

(3)(+0000001): Translate: Creating translate instance of type import in sandbox

(4)(+0000001): Translate: Binding sandbox to https://www.ncbi.nlm.nih.gov/pubmed/14656957

(4)(+0000000): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38)

(3)(+0000006): Translate: Initializing RDF data store

(4)(+0000003): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2016-11-05 10:57:01)

(3)(+0000008): Translate: All translator detect calls and RPC calls complete:

(3)(+0000000): Embedded Metadata: 320

(3)(+0000000): DOI: 400

(5)(+0000000): Translate: Running handler 0 for translators

(5)(+0000001): Translate: Running handler 1 for translators

(4)(+0000001): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2018-02-13 19:20:46)

(3)(+0000002): Translate: Beginning translation with Embedded Metadata

(3)(+0000000): Translate: Embedded Metadata: found 24 meta tags.

(3)(+0000001): Translate: Creating translate instance of type import in sandbox

(4)(+0000000): Translate: Binding sandbox to https://www.ncbi.nlm.nih.gov/pubmed/14656957

(4)(+0000000): Translate: Parsing code for RDF (5e3ad958-ac79-463d-812b-a86a9235c28f, 2018-05-08 19:39:38)

(3)(+0000002): Translate: Initializing RDF data store

(3)(+0000005): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000001): Translate: Saving item

(5)(+0000000): Translate: Running handler 0 for itemDone

(3)(+0000007): Translate: Title was not found in meta tags. Using document title as title

(3)(+0000001): Translate: Looking for authors in byline, vcard

(3)(+0000003): Translate: Found 0 elements with 'byline' class

(3)(+0000001): Translate: Found 1 elements with 'vcard' class

(3)(+0000001): Translate: Extracting author(s) from byline: National Center for Biotechnology Information, U.S. National Library of Medicine 8600 Rockville Pike, Bethesda MD, 20894 USA

(3)(+0000002): Translate: Promise not available in sandbox in _itemDone()

(3)(+0000000): Translate: Saving item

(3)(+0000001): Translate: Translation successful

(5)(+0000000): Translate: Running handler 0 for done

(3)(+0000000): itemToAPIJSON: Discarded field libraryCatalog: field not valid for type webpage

should be more like

[{"itemType":"journalArticle","creators":[{"firstName":"Aram V.","lastName":"Chobanian","creatorType":"author"},{"firstName":"George L.","lastName":"Bakris","creatorType":"author"},{"firstName":"Henry R.","lastName":"Black","creatorType":"author"},{"firstName":"William C.","lastName":"Cushman","creatorType":"author"},{"firstName":"Lee A.","lastName":"Green","creatorType":"author"},{"firstName":"Joseph L.","lastName":"Izzo","creatorType":"author"},{"firstName":"Daniel W.","lastName":"Jones","creatorType":"author"},{"firstName":"Barry J.","lastName":"Materson","creatorType":"author"},{"firstName":"Suzanne","lastName":"Oparil","creatorType":"author"},{"firstName":"Jackson T.","lastName":"Wright","creatorType":"author"},{"firstName":"Edward J.","lastName":"Roccella","creatorType":"author"},{"name":"Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. National Heart, Lung, and Blood Institute","creatorType":"author"},{"name":"National High Blood Pressure Education Program Coordinating Committee","creatorType":"author"}],"notes":[],"tags":[{"tag":"Adult","type":1},{"tag":"Aged","type":1},{"tag":"Antihypertensive Agents","type":1},{"tag":"Blood Pressure","type":1},{"tag":"Blood Pressure Determination","type":1},{"tag":"Cardiovascular Diseases","type":1},{"tag":"Female","type":1},{"tag":"Humans","type":1},{"tag":"Hypertension","type":1},{"tag":"Male","type":1},{"tag":"Middle Aged","type":1},{"tag":"Risk Factors","type":1}],"title":"Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure","pages":"1206-1252","ISSN":"1524-4563","journalAbbreviation":"Hypertension","publicationTitle":"Hypertension (Dallas, Tex.: 1979)","volume":"42","issue":"6","date":"Dec 2003","language":"eng","abstractNote":"The National High Blood Pressure Education Program presents the complete Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Like its predecessors, the purpose is to provide an evidence-based approach to the prevention and management of hypertension. The key messages of this report are these: in those older than age 50, systolic blood pressure (BP) of greater than 140 mm Hg is a more important cardiovascular disease (CVD) risk factor than diastolic BP; beginning at 115/75 mm Hg, CVD risk doubles for each increment of 20/10 mm Hg; those who are normotensive at 55 years of age will have a 90% lifetime risk of developing hypertension; prehypertensive individuals (systolic BP 120-139 mm Hg or diastolic BP 80-89 mm Hg) require health-promoting lifestyle modifications to prevent the progressive rise in blood pressure and CVD; for uncomplicated hypertension, thiazide diuretic should be used in drug treatment for most, either alone or combined with drugs from other classes; this report delineates specific high-risk conditions that are compelling indications for the use of other antihypertensive drug classes (angiotensin-converting enzyme inhibitors, angiotensin-receptor blockers, beta-blockers, calcium channel blockers); two or more antihypertensive medications will be required to achieve goal BP (<140/90 mm Hg, or <130/80 mm Hg) for patients with diabetes and chronic kidney disease; for patients whose BP is more than 20 mm Hg above the systolic BP goal or more than 10 mm Hg above the diastolic BP goal, initiation of therapy using two agents, one of which usually will be a thiazide diuretic, should be considered; regardless of therapy or care, hypertension will be controlled only if patients are motivated to stay on their treatment plan. Positive experiences, trust in the clinician, and empathy improve patient motivation and satisfaction. This report serves as a guide, and the committee continues to recognize that the responsible physician's judgment remains paramount.","DOI":"10.1161/01.HYP.0000107251.49515.c2","extra":"PMID: 14656957","libraryCatalog":"PubMed"}]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.