opensanctions / opensanctions Goto Github PK
View Code? Open in Web Editor NEWAn open database of international sanctions data, persons of interest and politically exposed persons
Home Page: https://www.opensanctions.org
License: MIT License
An open database of international sanctions data, persons of interest and politically exposed persons
Home Page: https://www.opensanctions.org
License: MIT License
The csv files are all 403 and the Scraper Status and Code Repo links 404.
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
I imagine this is related to the previous issue that was posted here regarding scraper status. At this point, the master files are unavailable (for example the master entities.csv file used to be found at http://data.opensanctions.org/v1/sources/master/latest/entities.csv) for download and all currently listed files generate a 403 (access forbidden) error when clicked for download. We were just getting ready to launch a site that utilizes the data that you provide but this has forced us to stop the rollout. Is there any update on what is happening with your system? Can we depend on this site for reliable data access in the future or should we go back to building/consolidating our own lists? By the way, when it is working this is an excellent resource. Thanks for any information you can provide.
John
When running the us_ofac
crawler it breaks along the way with a KeyError
during the parse stage. The message is '345'.
I have looked into this file: https://www.treasury.gov/ofac/downloads/sanctions/1.0/sdn_advanced.xml now contains a new 'featuretype'. with the ID 345, crashing the crawler (I suspect 344 is new too.)
I'm filing an issue instead of a PR since I'm a bit of a newbie when it comes to debugging and fixing the crawlers. Maybe you can help me? What I've done:
clone repo
docker-compose up
docker-compose run worker ash
memorious --debug run us_ofac
Now at the last step I do NOT see output from the us_ofac
crawler but a snippet of the other crawlers running. In the terminal in which I ran docker-compose up
I DID see some OFAC related output.
It feels I'm not doing it right.
This could be a possible way of getting access to family and close associates.
Also have to check out what Dima did: https://github.com/dchaplinsky/pep.org.ua/blob/8633a65fb657d7f04dbdb12eb8ae705fa6be67e3/pepdb/tasks/management/commands/match_to_wiki.py
Where do they exist? So far, I've seen the World Bank, Moldova.
Interpol red notice does not returning full set of data for each search. The data is limited to 160 entry for search.
The aleph_emit
does not seem to be available where it is used in the eu_eeas_sanctions
source:
worker_1 | INFO:eu_eeas_sanctions.emit_doc:[eu_eeas_sanctions->emit_doc(aleph_emit)]: 0a37407ce59d44d6bf7648fb11ab312c
worker_1 | ERROR:eu_eeas_sanctions.emit_doc:('Unknown method: %s', 'aleph_emit')
worker_1 | Traceback (most recent call last):
worker_1 | File "/memorious/memorious/logic/context.py", line 77, in execute
worker_1 | return self.stage.method(self, data)
worker_1 | File "/memorious/memorious/logic/stage.py", line 26, in method
worker_1 | raise ValueError("Unknown method: %s", self.method_name)
worker_1 | ValueError: ('Unknown method: %s', 'aleph_emit')
you've got nice CSV either here #4 or on archive.pudo.org - why not data package?
memorious run us_ofac
works, but it seems to be producing some invalid data, because afterwards balkhash iterate -d $crawler
fails with the error:
Traceback (most recent call last):
File "/usr/bin/balkhash", line 10, in <module>
sys.exit(cli())
File "/usr/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/balkhash/cli.py", line 55, in iterate
for entity in dataset.iterate(entity_id=entity):
File "/usr/lib/python3.7/site-packages/balkhash/dataset.py", line 50, in iterate
entity.merge(partial)
File "/usr/lib/python3.7/site-packages/followthemoney/proxy.py", line 267, in merge
self.schema = model.common_schema(self.schema, other.schema)
File "/usr/lib/python3.7/site-packages/followthemoney/model.py", line 89, in common_schema
raise InvalidData(msg % (left, right))
followthemoney.exc.InvalidData: No common ancestor: <Schema('Ownership')> and <Schema('Person')>
The error goes away if I prevent the crawler from emitting any Ownership objects.
Its not too big after all ;-)
Makes it nice to track over time etc.
Hi,
Thank you for all the fantastic work that you've done. It is a really great tool, and I look forward to working with it.
I have managed to get it to do most sources that are available, but it keeps failing on a "Identity unknown" subject on the scrape of the zz_interpol redlist.
Here is the response:
make -C zz_interpol scrape
make[2]: Entering directory '/var/www/wiirkspace/beta/opennames-master/sources/zz_interpol'
mkdir -p /20170304
python scrape.py /20170304/zz_interpol.json
Storing to JSON: /20170304/zz_interpol.json
Wanted: RICHARD, NICHOLAS T.
Wanted: FERNANDEZ TORRES, PEDRO RAMON
Wanted: ANTUNEZ, JUAN ALBERTO
Wanted: BENITEZ, CARMELO JUAN
Wanted: BENITEZ, FABIAN CARMELO
Traceback (most recent call last):
File "scrape.py", line 55, in
scrape(sys.argv[1])
File "scrape.py", line 48, in scrape
cases.append(scrape_case(case_url))
File "scrape.py", line 28, in scrape_case
if len(name):
TypeError: object of type 'NoneType' has no len()
Makefile:7: recipe for target '/20170304/zz_interpol.json' failed
make[2]: *** [/20170304/zz_interpol.json] Error 1
make[2]: Leaving directory '/var/www/wiirkspace/beta/opennames-master/sources/zz_interpol'
Makefile:15: recipe for target 'zz_interpol.scrape' failed
make[1]: *** [zz_interpol.scrape] Error 2
make[1]: Leaving directory '/var/www/wiirkspace/beta/opennames-master/sources'
Makefile:5: recipe for target 'scrape' failed
make: *** [scrape] Error 2
It seems to work fine on the full record, but fails with type 'NoneType' error when the name isn't a normal name.
I have tried changing line 28 from "if len(name):" to "if not len(name):" which sometimes fixes when the len() hangs on type 'NoneType', but that didn't work.
Anyone have some thoughts on how we could get the scraper to tolerate this?
Thanks.
we tried to load the Every Politician file.
Hello,
How can I contribute to integrate the french sanction list ?
It can be downloaded from this url in a zip that contains an excel file.
Right now crawler metadata is spread out between the crawlers YAMLs and the YAML file in docs, this should be consolidated into one location.
None of the links in the Developer sections for each data source listed at https://www.opensanctions.org are currently working.
The GitHub links all take you to repositories that were in the https://github.com/opensanctions/ organisation, and this seems to have been removed/closed (404)?
The scraper links all take you to scrapers under https://morph.io/opensanctions/ which now shows as having no scrapers?
When we first started looking at this project (with a view to using and contributing) all these links worked, and it was very easy to audit how the data was being collected/produced (open source code), and how up to date it was (scraper logs and status). Without this level of transparency, we're back to relying on commercial vendors.
ValueError: time data '1965' does not match format '%Y/%m/%d'
at _strptime (/usr/lib/python3.7/_strptime.py:359)
at _strptime_datetime (/usr/lib/python3.7/_strptime.py:577)
at parse_date (/memorious/src/opensanctions/opensanctions/crawlers/interpol_red_notices.py:18)
at parse_notice (/memorious/src/opensanctions/opensanctions/crawlers/interpol_red_notices.py:78)
at execute (/memorious/memorious/logic/context.py:77)
The URL for eu_meps appears to have changed. Update the URL in config/eu_meps.yml to the following: http://www.europarl.europa.eu/meps/en/full-list/xml/a
Updated config/eu_meps.yml
name: eu_meps
description: "[OSANC] Members of the European Parliament"
schedule: weekly
pipeline:
init:
method: seed
params:
url: 'http://www.europarl.europa.eu/meps/en/full-list/xml/a'
handle:
pass: fetch
fetch:
method: fetch
handle:
pass: parse
parse:
method: opensanctions.crawlers.eu_meps:parse
handle:
pass: store
store:
method: opensanctions.helpers:store_entity
when i run: sudo docker-compose up
i get the following error for each list:
ModuleNotFoundError: No module named 'ftmstore'
Traceback is the following:
Traceback (most recent call last):
worker_1 | File "/memorious/memorious/logic/context.py", line 77, in execute
worker_1 | return self.stage.method(self, data)
worker_1 | File "/memorious/memorious/logic/stage.py", line 28, in method
worker_1 | module = import_module(package)
worker_1 | File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
worker_1 | return _bootstrap._gcd_import(name[level:], package, level)
worker_1 | File "", line 1014, in _gcd_import
worker_1 | File "", line 991, in _find_and_load
worker_1 | File "", line 975, in _find_and_load_unlocked
worker_1 | File "", line 671, in _load_unlocked
worker_1 | File "", line 783, in exec_module
worker_1 | File "", line 219, in _call_with_frames_removed
worker_1 | File "/opensanctions/opensanctions/crawlers/au_dfat_sanctions.py", line 8, in
worker_1 | from ftmstore.memorious import EntityEmitter
worker_1 | ModuleNotFoundError: No module named 'ftmstore'
I've recently had a discussion with @sunu about whether we should drop the use of memorious
in this project. We've basically adopted it because all of OCCRP's crawlers are written in it. Yet for this project, there seems to be a real interest from the community to contribute, and memorious constitutes a bit of a hurdle to entry.
So I'm wondering if we should just switch to writing some plain scrapers, maybe with a bit of a custom utility toolchain for caching stuff a bit, and uploading it to a bucket at the end.
When exporting the results of the eu_eeas_sanctions
an exception is thrown and the process crashes. This is with the latest master (8ec6faf). No changes have been made to the code. The export process was taken from the README:
/ # ftm store iterate -d eu_eeas_sanctions | alephclient write-entities -f eu_eeas_sanctions
INFO:alephclient.cli:[eu_eeas_sanctions] Bulk load entities: 1000...
INFO:alephclient.cli:[eu_eeas_sanctions] Bulk load entities: 2000...
INFO:alephclient.cli:[eu_eeas_sanctions] Bulk load entities: 3000...
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/followthemoney/proxy.py", line 247, in merge
self.schema = model.common_schema(self.schema, other.schema)
File "/usr/lib/python3.8/site-packages/followthemoney/model.py", line 94, in common_schema
raise InvalidData(msg % (left, right))
followthemoney.exc.InvalidData: No common schema: <Schema('Organization')> and <Schema('Person')>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/ftm", line 8, in <module>
sys.exit(cli())
File "/usr/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.8/site-packages/ftmstore/cli.py", line 66, in iterate
iterate_stream(dataset, outfile)
File "/usr/lib/python3.8/site-packages/ftmstore/cli.py", line 31, in iterate_stream
for entity in dataset.iterate(entity_id=entity_id):
File "/usr/lib/python3.8/site-packages/ftmstore/dataset.py", line 134, in iterate
entity.merge(partial)
File "/usr/lib/python3.8/site-packages/followthemoney/proxy.py", line 250, in merge
raise InvalidData(msg % (self.id, e))
followthemoney.exc.InvalidData: Cannot merge entities with id eeas-116106: No common schema: <Schema('Organization')> and <Schema('Person')>
Hi,
Unfortunately, EveryPolitician list does not feature the current world leaders.
To complete it, I wrote a scraper based on the following list:
http://www.worldpresidentsdb.com/list/current/
You can find the scraper code under my GitHub account:
https://github.com/joeyinbox/worldpresidentsdb
Here is the scraper working on morph.io:
https://morph.io/joeyinbox/worldpresidentsdb
I would be more than happy if you could consider adding it to the other OpenSanctions sources.
Best regards,
Joey
Hi
I noticed that the field "entity" that allow the linking of a an entity to a sanction is now missing.
Same goes for the "holder" field, that link a person to a passport, and the organisation/owner
Is this a bug? Otherwise, does that mean the the link between these entities in now broken for some reason?
Thanx
Perhaps a package got removed from the memorious build that now needs to be added to opensanctions?
➜ opensanctions git:(master) git log | head
commit e1dae92b2cd22e2f6249b0f68cd88405719e3df7
Merge: acc4b55 8865880
Author: Friedrich Lindenberg <[email protected]>
Date: Thu Jul 11 15:59:38 2019 +0200
Merge pull request #39 from BusinessOptics/master
Cache installing dependencies in docker build
commit 886588022f1f3bb255411db0c124c8c5cb3fa7bd
➜ opensanctions git:(master) docker pull alephdata/memorious:latest
latest: Pulling from alephdata/memorious
Digest: sha256:355da68db845b9325213b612ddf3db1aba96eda1e4612d6e6eb8deba6ccaf0aa
Status: Image is up to date for alephdata/memorious:latest
➜ opensanctions git:(master) docker-compose build --no-cache ui
Building ui
Step 1/7 : FROM alephdata/memorious
---> 8768724c668a
Step 2/7 : RUN apk add --no-cache --virtual .build-deps gcc python3-dev postgresql-dev
---> Running in 931795a880d1
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
(1/13) Installing binutils (2.32-r0)
(2/13) Installing gmp (6.1.2-r1)
(3/13) Installing isl (0.18-r0)
(4/13) Installing libatomic (8.3.0-r0)
(5/13) Installing mpfr3 (3.1.5-r1)
(6/13) Installing mpc1 (1.1.0-r0)
(7/13) Installing gcc (8.3.0-r0)
(8/13) Installing pkgconf (1.6.1-r1)
(9/13) Installing python3-dev (3.7.3-r0)
(10/13) Installing openssl-dev (1.1.1c-r0)
(11/13) Installing postgresql-libs (11.4-r0)
(12/13) Installing postgresql-dev (11.4-r0)
(13/13) Installing .build-deps (20190715.092734)
Executing busybox-1.30.1-r2.trigger
OK: 338 MiB in 94 packages
Removing intermediate container 931795a880d1
---> fb44b610b36b
Step 3/7 : COPY setup.py /opensanctions/setup.py
---> 363c0fff12de
Step 4/7 : RUN pip install -e /opensanctions
---> Running in 5ea4957bf273
Obtaining file:///opensanctions
Collecting followthemoney>=1.9.2 (from opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/87/82/e2c866f9ee9aa9b1148361df4dabe040e6d3b27089f213e647874adf5509/followthemoney-1.14.1-py2.py3-none-any.whl (228kB)
Collecting balkhash[sql]>=0.3.0 (from opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/6d/68/97d981072a03cbd8f14416756ce5aaaf248e16be7f91e7fbb838c5d383e7/balkhash-1.0.0-py2.py3-none-any.whl
Requirement already satisfied: memorious>=0.8 in /memorious (from opensanctions==1.99) (0.14.2)
Collecting countrynames (from opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/ab/ee/4778c4c59c2a65fead8e77c28978c5f5819702acba2acbb677be3e882b51/countrynames-1.6.0-py3-none-any.whl (228kB)
Collecting xlrd (from opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/b0/16/63576a1a001752e34bf8ea62e367997530dc553b689356b9879339cf45a4/xlrd-1.2.0-py2.py3-none-any.whl (103kB)
Requirement already satisfied: sqlalchemy>=1.2.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (1.3.5)
Requirement already satisfied: pytz>=2018.5 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (2019.1)
Requirement already satisfied: normality>=1.0.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (1.0.0)
Collecting python-levenshtein>=0.12.0 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
Requirement already satisfied: stringcase>=1.2.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (1.2.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (5.1.1)
Collecting python-stdnum>=1.10 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/ad/df/07a3d26fa0a7e02d98b1a207a684ec611af19b8e15a9a0d63a06472b590b/python_stdnum-1.11-py2.py3-none-any.whl (778kB)
Collecting pantomime>=0.3.2 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/8a/0a/123fdbddc14f1cc316183f072d3444c454d170050eec9ec5c83b4fbf2135/pantomime-0.3.3-py2.py3-none-any.whl
Requirement already satisfied: requests[security]>=2.21.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (2.22.0)
Requirement already satisfied: banal>=0.4.2 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (0.4.2)
Requirement already satisfied: click>=7.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (7.0)
Collecting openpyxl>=2.6.0 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/ba/06/b899c8867518df19e242d8cbc82d4ba210f5ffbeebb7704c695e687ab59c/openpyxl-2.6.2.tar.gz (173kB)
Collecting phonenumbers>=8.9.11 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/34/3f/ca2193d6e93074b78cebdbd73c31815c82a50c8975d0517a7ff0b0337430/phonenumbers-8.10.14-py2.py3-none-any.whl (2.6MB)
Requirement already satisfied: urlnormalizer>=1.2.0 in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (1.2.3)
Collecting rdflib>=4.2.2 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/3c/fe/630bacb652680f6d481b9febbb3e2c3869194a1a5fc3401a4a41195a2f8f/rdflib-4.2.2-py3-none-any.whl (344kB)
Collecting languagecodes>=1.0.4 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/d7/82/0273d2d101ca48fa6f3db0b9a9b83a1b8b8f73a3f243877cc1dc06696e23/languagecodes-1.0.4-py3-none-any.whl (88kB)
Requirement already satisfied: babel in /usr/lib/python3.7/site-packages (from followthemoney>=1.9.2->opensanctions==1.99) (2.7.0)
Collecting networkx>=2.3 (from followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/85/08/f20aef11d4c343b557e5de6b9548761811eb16e438cee3d32b1c66c8566b/networkx-2.3.zip (1.7MB)
Collecting psycopg2-binary>=2.7; extra == "sql" (from balkhash[sql]>=0.3.0->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/80/91/91911be01869fa877135946f928ed0004e62044bdd876c1e0f12e1b5fb90/psycopg2-binary-2.8.3.tar.gz (378kB)
Requirement already satisfied: requests_ftp in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (0.3.1)
Requirement already satisfied: lxml>=3 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (4.3.4)
Requirement already satisfied: tabulate in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (0.8.3)
Requirement already satisfied: dataset>=1.0.8 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (1.1.2)
Requirement already satisfied: servicelayer>=1.5.3 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (1.5.3)
Requirement already satisfied: celestial>=0.2.0 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (0.2.3)
Requirement already satisfied: dateparser in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (0.7.1)
Requirement already satisfied: python-redis-rate-limit>=0.0.5 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (0.0.7)
Requirement already satisfied: blinker>=1.4 in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (1.4)
Requirement already satisfied: flask in /usr/lib/python3.7/site-packages (from memorious>=0.8->opensanctions==1.99) (1.1.1)
Requirement already satisfied: six in /usr/lib/python3.7/site-packages (from countrynames->opensanctions==1.99) (1.12.0)
Requirement already satisfied: chardet in /usr/lib/python3.7/site-packages (from normality>=1.0.0->followthemoney>=1.9.2->opensanctions==1.99) (3.0.4)
Requirement already satisfied: pyicu>=1.9.3 in /usr/lib/python3.7/site-packages (from normality>=1.0.0->followthemoney>=1.9.2->opensanctions==1.99) (2.3.1)
Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from python-levenshtein>=0.12.0->followthemoney>=1.9.2->opensanctions==1.99) (41.0.1)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/lib/python3.7/site-packages (from requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.7/site-packages (from requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (2019.6.16)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/lib/python3.7/site-packages (from requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (1.25.3)
Requirement already satisfied: cryptography>=1.3.4; extra == "security" in /usr/lib/python3.7/site-packages (from requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (2.6.1)
Requirement already satisfied: pyOpenSSL>=0.14; extra == "security" in /usr/lib/python3.7/site-packages (from requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (19.0.0)
Collecting jdcal (from openpyxl>=2.6.0->followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/f0/da/572cbc0bc582390480bbd7c4e93d14dc46079778ed915b505dc494b37c57/jdcal-1.4.1-py2.py3-none-any.whl
Collecting et_xmlfile (from openpyxl>=2.6.0->followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/22/28/a99c42aea746e18382ad9fb36f64c1c1f04216f41797f2f0fa567da11388/et_xmlfile-1.0.1.tar.gz
Collecting pyparsing (from rdflib>=4.2.2->followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/dd/d9/3ec19e966301a6e25769976999bd7bbe552016f0d32b577dc9d63d2e0c49/pyparsing-2.4.0-py2.py3-none-any.whl (62kB)
Collecting isodate (from rdflib>=4.2.2->followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/9b/9f/b36f7774ff5ea8e428fdcfc4bb332c39ee5b9362ddd3d40d9516a55221b2/isodate-0.6.0-py2.py3-none-any.whl (45kB)
Collecting decorator>=4.3.0 (from networkx>=2.3->followthemoney>=1.9.2->opensanctions==1.99)
Downloading https://files.pythonhosted.org/packages/5f/88/0075e461560a1e750a0dcbf77f1d9de775028c37a19a346a6c565a257399/decorator-4.4.0-py2.py3-none-any.whl
Requirement already satisfied: alembic>=0.6.2 in /usr/lib/python3.7/site-packages (from dataset>=1.0.8->memorious>=0.8->opensanctions==1.99) (1.0.11)
Requirement already satisfied: redis>=3.2.1 in /usr/lib/python3.7/site-packages (from servicelayer>=1.5.3->memorious>=0.8->opensanctions==1.99) (3.2.1)
Requirement already satisfied: fakeredis>=1.0.3 in /usr/lib/python3.7/site-packages (from servicelayer>=1.5.3->memorious>=0.8->opensanctions==1.99) (1.0.3)
Requirement already satisfied: regex in /usr/lib/python3.7/site-packages (from dateparser->memorious>=0.8->opensanctions==1.99) (2019.6.8)
Requirement already satisfied: python-dateutil in /usr/lib/python3.7/site-packages (from dateparser->memorious>=0.8->opensanctions==1.99) (2.8.0)
Requirement already satisfied: tzlocal in /usr/lib/python3.7/site-packages (from dateparser->memorious>=0.8->opensanctions==1.99) (1.5.1)
Requirement already satisfied: Jinja2>=2.10.1 in /usr/lib/python3.7/site-packages (from flask->memorious>=0.8->opensanctions==1.99) (2.10.1)
Requirement already satisfied: Werkzeug>=0.15 in /usr/lib/python3.7/site-packages (from flask->memorious>=0.8->opensanctions==1.99) (0.15.4)
Requirement already satisfied: itsdangerous>=0.24 in /usr/lib/python3.7/site-packages (from flask->memorious>=0.8->opensanctions==1.99) (1.1.0)
Requirement already satisfied: asn1crypto>=0.21.0 in /usr/lib/python3.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (0.24.0)
Requirement already satisfied: cffi!=1.11.3,>=1.8 in /usr/lib/python3.7/site-packages (from cryptography>=1.3.4; extra == "security"->requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (1.11.5)
Requirement already satisfied: Mako in /usr/lib/python3.7/site-packages (from alembic>=0.6.2->dataset>=1.0.8->memorious>=0.8->opensanctions==1.99) (1.0.13)
Requirement already satisfied: python-editor>=0.3 in /usr/lib/python3.7/site-packages (from alembic>=0.6.2->dataset>=1.0.8->memorious>=0.8->opensanctions==1.99) (1.0.4)
Requirement already satisfied: sortedcontainers in /usr/lib/python3.7/site-packages (from fakeredis>=1.0.3->servicelayer>=1.5.3->memorious>=0.8->opensanctions==1.99) (2.1.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask->memorious>=0.8->opensanctions==1.99) (1.1.1)
Requirement already satisfied: pycparser in /usr/lib/python3.7/site-packages (from cffi!=1.11.3,>=1.8->cryptography>=1.3.4; extra == "security"->requests[security]>=2.21.0->followthemoney>=1.9.2->opensanctions==1.99) (2.19)
Installing collected packages: python-levenshtein, python-stdnum, pantomime, jdcal, et-xmlfile, openpyxl, phonenumbers, countrynames, pyparsing, isodate, rdflib, languagecodes, decorator, networkx, followthemoney, psycopg2-binary, balkhash, xlrd, opensanctions
Running setup.py install for python-levenshtein: started
Running setup.py install for python-levenshtein: finished with status 'error'
ERROR: Complete output from command /usr/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-pwxewkfe/python-levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-zgzbi8nq/install-record.txt --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/Levenshtein
copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.7/Levenshtein
copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.7/Levenshtein
running egg_info
writing python_Levenshtein.egg-info/PKG-INFO
writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
writing entry points to python_Levenshtein.egg-info/entry_points.txt
writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
writing requirements to python_Levenshtein.egg-info/requires.txt
writing top-level names to python_Levenshtein.egg-info/top_level.txt
reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*pyc' found anywhere in distribution
warning: no previously-included files matching '*so' found anywhere in distribution
warning: no previously-included files matching '.project' found anywhere in distribution
warning: no previously-included files matching '.pydevproject' found anywhere in distribution
writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.7/Levenshtein
copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.7/Levenshtein
running build_ext
building 'Levenshtein._levenshtein' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/Levenshtein
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/include/python3.7m -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.7/Levenshtein/_levenshtein.o
In file included from Levenshtein/_levenshtein.c:99:
/usr/include/python3.7m/Python.h:11:10: fatal error: limits.h: No such file or directory
#include <limits.h>
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command "/usr/bin/python3.7 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-pwxewkfe/python-levenshtein/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-zgzbi8nq/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-pwxewkfe/python-levenshtein/
ERROR: Service 'ui' failed to build: The command '/bin/sh -c pip install -e /opensanctions' returned a non-zero code: 1
When do you use the patronymic, is it useful to include this in generated SECO names?
Hello.
as I'm switching from the exploded lists (person.csv, membership.csv...) to the single list (eu_meps.json), I noticed that the downloader I developped to automate the update process fails du a 403error from google accounts, which does not happen with the exploded lists. Is this volontary ? could you explain why?
Also, i've been looking for ways to login to google with my user account useing java but to now avail :(
Thank you!
I would like to push result of the scrapers to Aleph. How ever I do not know what the values of the variables ALEPHCLIENT_API_KEY
and ALEPHCLIENT_HOST
should be.
Is the ALEPHCLIENT_API_KEY
the same as the ALEPH_SECRET_KEY
as set in the aleph.env
(from here)?
And where should ALEPHCLIENT_HOST
point to? I'm running both Aleph and Opensanctions in their respective docker environments.
I'm getting following error when installing.
error: Setup script exited with error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
I'm using a windows 10 OS, and Python 3.6.4
Appreciate if someone can help for this.
Do the join on the SQL level instead of passing around lists of uids resulting in an IN query with a huuuuuge list.
The problems start in dump.py
line 115.
Hello,
When running most files like memorious run us_ofac, we get the following error:
Any idea how to solve it ?
Error:
INFO:ca_dfatd_sema_sanctions.store:[ca_dfatd_sema_sanctions->store(balkhash_put)]: 393ae9b89d8e11e98d580242ac110002
ERROR:ca_dfatd_sema_sanctions.store:'NoneType' object has no attribute 'upper'
Traceback (most recent call last):
File "/memorious/memorious/logic/context.py", line 75, in execute
return self.stage.method(self, data)
File "/usr/lib/python3.7/site-packages/balkhash/memorious.py", line 21, in balkhash_put
writer = get_dataset(context)
File "/usr/lib/python3.7/site-packages/balkhash/memorious.py", line 15, in get_dataset
return init(**config)
File "/usr/lib/python3.7/site-packages/balkhash/init.py", line 5, in init
backend = backend.upper()
AttributeError: 'NoneType' object has no attribute 'upper'
Hi, I'm having trouble building the docker image. It's failing with the following PIP error when collecting pandas:
ModuleNotFoundError: No module named 'numpy'
Also, are the csvs files on opensanctions.org no longer being updated? What's the best current way to to get the data?
Thanks.
Hello,
It seems the data files have not been updated since May 2018. Is there an alternate repository to get the latest data?
https://archive.org/download/opensanctions
Thanks,
Each dataset now has its own folder with the source data in it, the exports of FtM entities should go there, too.
the ftm_load_aleph method was not working (ftm_store version 2.2.3) until I changed the aggregator method from:
aggregator:
method: ftm_load_aleph
to:
aggregator:
method: ftmstore.memorious:ftm_load_aleph
since the aggregator-method in memorious/logic/crawler.py expects a colon:
memorious/logic/crawler.py
@property
def aggregator_method(self):
if self.aggregator_config:
method = self.aggregator_config.get("method")
if not method:
return
if ':' in method:
package, method = method.rsplit(':', 1)
module = import_module(package)
return getattr(module, method)
Or is there something else I missed in the setup?
It seems the EU EEAS Sanctions list has been moved from
http://ec.europa.eu/external_relations/cfsp/sanctions/list/version4/global/global.xml
to
https://data.europa.eu/euodp/es/data/dataset/consolidated-list-of-persons-groups-and-entities-subject-to-eu-financial-sanctions
The list in the old URL has not been updated in the last months. Also, the XML file structure has been modified.
This is a fairly new data set that may be worth adding: http://lib.law.virginia.edu/Garrett/corporate-prosecution-registry/index.html
While running on docker I get the following error.
ERROR:coe_assembly.init:(psycopg2.OperationalError) could not translate host name "postgres" to address: Name does not resolve
Using my DNS, I cannot access http://opensanctions.org (as linked in header of project), only http://www.opensanctions.org
After docker-compose up
:
ui_1 | [2019-07-09 14:55:03 +0000] [10] [ERROR] Exception in worker process
ui_1 | Traceback (most recent call last):
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/arbiter.py", line 578, in spawn_worker
ui_1 | worker.init_process()
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/workers/base.py", line 126, in init_process
ui_1 | self.load_wsgi()
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/workers/base.py", line 135, in load_wsgi
ui_1 | self.wsgi = self.app.wsgi()
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
ui_1 | self.callable = self.load()
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 65, in load
ui_1 | return self.load_wsgiapp()
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
ui_1 | return util.import_app(self.app_uri)
ui_1 | File "/usr/lib/python3.7/site-packages/gunicorn/util.py", line 352, in import_app
ui_1 | __import__(module)
ui_1 | File "/memorious/memorious/ui/__init__.py", line 1, in <module>
ui_1 | from memorious.ui.views import app
ui_1 | File "/memorious/memorious/ui/views.py", line 8, in <module>
ui_1 | from memorious.core import settings, manager, init_memorious
ui_1 | File "/memorious/memorious/core.py", line 54, in <module>
ui_1 | storage = init_archive()
ui_1 | File "/usr/lib/python3.7/site-packages/servicelayer/archive/__init__.py", line 21, in init_archive
ui_1 | return FileArchive(path=path)
ui_1 | File "/usr/lib/python3.7/site-packages/servicelayer/archive/file.py", line 17, in __init__
ui_1 | raise ValueError('No archive path is set.')
ui_1 | ValueError: No archive path is set.
ui_1 | [2019-07-09 14:55:03 +0000] [10] [INFO] Worker exiting (pid: 10)
opensanctions_ui_1 exited with code 1
At some point I thought I had fixed this with the env file, particularly:
ARCHIVE_TYPE: file
ARCHIVE_PATH: /data/archive
Not sure what changed.
CIA has redone their site. The current specified in the CIA World Leaders scraper config, https://www.cia.gov/library/publications/resources/world-leaders-1/index.html, returns a 404. The lists now reside at https://www.cia.gov/resources/world-leaders/foreign-governments/.
http://www.international.gc.ca/sanctions/assets/office_docs/sema-lmes.xml has 797 persons for example. However, we only return 19 persons in this project?
Hi,
I wrote a scraper based on the following list:
http://www.international.gc.ca/sanctions/consolidated-recapitulative.aspx?lang=eng
You can find the scraper code under my GitHub account:
https://github.com/joeyinbox/dfatd-sema
Here is the scraper working on morph.io:
https://morph.io/joeyinbox/dfatd-sema
I would appreciate if you could consider adding it to the other OpenSanctions sources.
Best regards,
Joey
The current url specified in the EU EEAS Sanctions config file, https://data.europa.eu/euodp/es/data/dataset/consolidated-list-of-persons-groups-and-entities-subject-to-eu-financial-sanctions, is no longer available, returning a 404. It seems the datasets have been moved behind a login, as per https://eeas.europa.eu/headquarters/headquarters-homepage_en/8442/Consolidated%20list%20of%20sanctions.
I found two errors. us_ofac three months us_ofac.py has been modified.
One problem is that there is no Vessel Owner in the Dictionary of REGISTRATIONS.
Another problem is the None type problem.
If len(attr): code indicates an error because attr is NoneType.
(ex. "UN/LOCODE": (None, 'LegalEntity'),
"MICEX Code": (None, 'Company'),
"Digital Currency Address - XBT": (None, 'LegalEntity'))
I just fixed it to if attr:
Hello,
few crawlers: eu_cor_members
, gb_coh_disqualified
, interpol_yellow_notices
give error:
root@b07be929e868:/opt/followthemoney/docs# opensanctions crawl eu_cor_members
2021-03-10 06:07:28 [info ] Begin crawl [eu_cor_members]
2021-03-10 06:07:28 [error ] Crawl failed [eu_cor_members]
Traceback (most recent call last):
File "/opensanctions/opensanctions/core/context.py", line 69, in crawl
self.dataset.method(self)
File "/opensanctions/opensanctions/core/dataset.py", line 59, in method
raise RuntimeError("The dataset has no entry point!")
RuntimeError: The dataset has no entry point!
Should I add something like entry_point: opensanctions.crawlers.eu_cor_members
to each of the /metadata/*.yml if entry_point
is missing?
Hello,
I was trying to download the United Kingdom Consolidated List of Targets in the website but it is down.
Could you please upload it back?
Hello,
running a crawler give error:
root@b07be929e868:/opt/followthemoney/docs# opensanctions crawl eu_cor_members
2021-03-10 06:20:21 [info ] Begin crawl [eu_cor_members]
2021-03-10 06:20:21 [error ] Crawl failed [eu_cor_members]
Traceback (most recent call last):
File "/opensanctions/opensanctions/core/context.py", line 69, in crawl
self.dataset.method(self)
File "/opensanctions/opensanctions/core/dataset.py", line 62, in method
module = import_module(package)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opensanctions/opensanctions/crawlers/eu_cor_members.py", line 2, in <module>
from ftmstore.memorious import EntityEmitter
ModuleNotFoundError: No module named 'ftmstore.memorious'
The pip freeze
output:
alephclient==2.2.2
Babel==2.9.0
banal==1.0.5
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
colorama==0.4.4
countrynames==1.7.1
cryptography==3.4.6
decorator==4.4.2
et-xmlfile==1.0.1
fingerprints==1.0.3
# Editable install with no version control (followthemoney==2.3.2)
-e /opt/followthemoney
# Editable install with no version control (followthemoney-enrich==2.3.2)
-e /opt/followthemoney/enrich
followthemoney-store==3.0.2
idna==2.10
isodate==0.6.0
jdcal==1.4.1
languagecodes==1.0.9
lxml==4.6.2
networkx==2.5
normality==2.1.3
openpyxl==3.0.6
# Editable install with no version control (opensanctions==3.0.0)
-e /opensanctions
pantomime==0.4.1
phonenumbers==8.12.18
psycopg2==2.8.4
psycopg2-binary==2.8.6
pycparser==2.20
pycrypto==2.6.1
PyICU==2.4.2
pyOpenSSL==20.0.1
pyparsing==2.4.7
python-Levenshtein==0.12.2
python-stdnum==1.16
pytz==2021.1
PyYAML==5.4.1
rdflib==5.0.0
redis==3.5.3
requests==2.25.1
requests-cache==0.5.2
requests-toolbelt==0.9.1
six==1.15.0
SQLAlchemy==1.3.23
stringcase==1.2.0
structlog==21.1.0
text-unidecode==1.3
urllib3==1.26.3
urlnormalizer==1.2.5
xlrd==2.0.1
Is it something wrong with the followthemoney
versions?
Before the system reorganization I was making use of the master lists that were provided. The master lists are still referenced in your FAQs but are no longer available on the site. Is there any plan to bring them back or do I just need to start working with the individual lists on the site? Thanks.
John
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.