Code Monkey home page Code Monkey logo

pyspotlight's Introduction

pyspotlight

is a thin python wrapper around DBpedia Spotlight's REST Interface.

This package is tested against DBpedia Spotlight version 0.7. As long as there are no major API overhauls, this wrapper might also work with future versions. If you encounter a bug with a newer DBpedia Spotlight version, feel free to create an issue here on github.

Note that we're trying to track DBpedia Spotlight release version numbers, so you can easily see which pyspotlight version has been tested with which Spotlight release. For example, all pyspotlight 0.6.x releases are compatible with Spotlight 0.6.x, etc. While we aim for backwards-compatibility with older Spotlight releases, it is not guaranteed. If you're using an older Spotlight version, you may need to use an older pyspotlight version as well.

Installation

The newest stable release can be found on the Python Package Index (PyPI).

Therefore installation is as easy as:

pip install pyspotlight

Older releases can be installed by specifying a version:

pip install pyspotlight~=0.6.1

Requirements for installation from source/github

This module has been tested with Python 2.7 and Python 3.5.

As long as you use the setup.py for the installation (python setup.py install), you'll be fine because Python takes care of the dependencies for you.

If you decide not to use the setup.py you will need the requests library.

All of these packages can be found on the Python PackageIndex and easily installed via either easy_install or, the recommended, pip.

Using pip it is especially easy because you can just do this:

pip install -r requirements.txt

and it will install all package dependencies listed in that file.

Usage

Usage is simple and easy, just as the API is:

>>> import spotlight
>>> annotations = spotlight.annotate('http://localhost/rest/annotate',
...                                  'Your test text',
...                                  confidence=0.4, support=20)

This should return a list of all resources found within the given text. Assuming we did this for the following text:

President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials.

We might get this back:

>>> spotlight.annotate('http://localhost/rest/annotate', sample_txt)
[
  {
    'URI': 'http://dbpedia.org/resource/Presidency_of_Barack_Obama',
    'offset': 0,
    'percentageOfSecondRank': -1.0,
    'similarityScore': 0.10031112283468246,
    'support': 134,
    'surfaceForm': 'President Obama',
    'types': 'DBpedia:OfficeHolder,DBpedia:Person,Schema:Person,Freebase:/book/book_subject,Freebase:/book,Freebase:/book/periodical_subject,Freebase:/media_common/quotation_subject,Freebase:/media_common'
  },
  …(truncated remaining elements)…
]

Any additional filter parameters that are supported by the Spotlight API can be passed to the filters argument in a dictionary.

For example:

>>> only_person_filter = {
...     'policy': "whitelist",
...     'types': "DBpedia:Person",
...     'coreferenceResolution': False
... }

>>> spotlight.annotate(
...     "http://localhost/rest/annotate",
...     "Any collaboration between Shakira and Metallica seems highly unlikely.",
...     filters=only_person_filter
... )

[{
    'URI': 'http://dbpedia.org/resource/Shakira',
    'offset': 26,
    'percentageOfSecondRank': 1.511934771738109e-09,
    'similarityScore': 0.9999999984880361,
    'support': 2587,
    'surfaceForm': 'Shakira',
    'types': 'Schema:MusicGroup,DBpedia:Agent,Schema:Person,DBpedia:Person,DBpedia:Artist,DBpedia:MusicalArtist'
}]

The same parameters apply to the spotlight.candidates function, which returns a list of all matching candidate entities rather than only the top candidate.

Note that the Spotlight API may support other interfaces that have not been implemented in pyspotlight. Feel free to contribute :-)!

Running DBpedia Spotlight

If you just want to play around with Spotlight, there is an interactive demo available at demo.dbpedia-spotlight.org. To submit pyspotlight requests to the demo servers, you may use the endpoints found in sites.xml.

For any significant Spotlight usage, it is strongly recommended to run your own server. Please follow the installation instructions.

Exceptions

The following exceptions can occur:

  • ValueError when:

    • the JSON response could not be decoded.
  • SpotlightException when:

    • the JSON response did not contain any needed fields or was not formed as excepted.
    • You forgot to explicitly specify a protocol (http/https) in the API URL.

    Usually the exception's message tells you exactly what is wrong. If not, we might have forgotten some error handling. So just open up an issue on github if you encounter unexpected exceptions.

  • requests.exceptions.HTTPError

    Is thrown when the response http status code was not 200. This could happen if you have a load balancer like nginx in front of your spotlight cluster and there is not a single server available, so nginx throws a 502 Bad Gateway.

Tips

We highly recommend playing around with the confidence and support values. Furthermore it might be preferable to filter out more annotations by looking at their similiarityScore (read: contextual score).

If you want to change the default values, feel free to use itertools.partial to create a little wrapper with simplified signature:

>>> from spotlight import annotate
>>> from functools import partial
>>> api = partial(annotate, 'http://localhost/rest/annotate',
...               confidence=0.4, support=20,
...               spotter='SpotXmlParser')
>>> api('This is your test text. This function uses a non-default
...      confidence, support, and spotter. Furthermore all calls go
...      directly to localhost/rest/annotate.')

As you can see this reduces the function's complexity greatly. Pyspotlight provides an interface based on functions rather than classes, to avoid an unnecessary layer of indirection.

Tests

If you want to run the tests, you will have to install nose2 (~0.6) from PyPI. Then you can simply run nose2 from the command line in this or the spotlight/ directory.

All development and regular dependencies can be installed with a single command:

pip install -r requirements-dev.txt

Bugs

In case you spot a bug, please open an issue and attach the raw response you sent. Have a look at ubergrape/pyspotlight#3 for an example on how to file a good bug report.

pyspotlight's People

Contributors

aolieman avatar originell avatar pablomendes avatar shomyliu avatar sk7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pyspotlight's Issues

connection problem

Dear @aolieman
ANy help to solve this issue please


  File "/project/6008168/tamouze/Python_directory/pyspotlight/test.py", line 2, in <module>
    annotations = spotlight.annotate('http://localhost/rest/annotate', 'Your test text',  confidence=0.4, support=20)
  File "/project/6008168/tamouze/Python_directory/pyspotlight/spotlight/__init__.py", line 192, in annotate
    pydict = _post_request(address, payload, filters, headers)
  File "/project/6008168/tamouze/Python_directory/pyspotlight/spotlight/__init__.py", line 48, in _post_request
    response = requests.post(address, data=payload, headers=reqheaders)
  File "/home/tamouze/.local/easybuild/software/2017/Core/miniconda2/4.3.27/lib/python2.7/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/tamouze/.local/easybuild/software/2017/Core/miniconda2/4.3.27/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/tamouze/.local/easybuild/software/2017/Core/miniconda2/4.3.27/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/tamouze/.local/easybuild/software/2017/Core/miniconda2/4.3.27/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/tamouze/.local/easybuild/software/2017/Core/miniconda2/4.3.27/lib/python2.7/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /rest/annotate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2b70b3e99850>: Failed to establish a new connection: [Errno 111] Connection refused',))

is there any to ignore the SpotlightException in your tool

hi, I came across a problem when using your tool.
I have to process a list of sentence. Every sentence have to be linked to the API. SO I don't want to the SpotlightException to interrupt the whole process.
is there any to ignore the SpotlightException?

here is my code:

with open('/home/tfang/data/corpus/mr.clean.txt', "r") as f:
	for line in f.readlines():
		doc_list.append(line.strip())
		text = line.strip()

		entity_current = []

		annotations = spotlight.annotate('http://localhost:2222/rest/annotate', \
		text, confidence=0.5, support=0)
		
		if annotations !=None:
			for resource_i in annotations:
				entity_current.append(resource_i["URI"][28:])
			print(entity_current)
		entity_list.append(entity_current)
with open("/home/tfang/data/process/entity_mr", "wb") as f:
	json.dump(entity_list, f)

here is the output in the console:
Traceback (most recent call last): File "entity_link.py", line 21, in <module> text, confidence=0.5, support=0) File "/home/tfang/anaconda3/envs/torch1.6/lib/python3.6/site-packages/spotlight/__init__.py", line 196, in annotate 'No Resources found in spotlight response: %s' % pydict spotlight.SpotlightException: No Resources found in spotlight response: {'@text': 'idiotic and ugly', '@confidence': '0.5', '@support': '0', '@types': '', '@sparql': '', '@policy': 'whitelist'}

Using pyspotlight fails

Hello,

I tried pyspotlight. First I installed it:

sudo pip3 install pyspotlight

I also tried just using pip, not pip3.

Then I ran the example code. It says

ModuleNotFoundError: No module named 'spotlight'

Can you give me a hint whats wrong? I tried python 2.7 and python 3.7.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.