Code Monkey home page Code Monkey logo

sodapy's Introduction

PyPI version Build Status Code Coverage

sodapy - UNMAINTAINED

๐Ÿšจ NOTICE: sodapy still works well, but is unmaintained as of Aug 31, 2022. No new features or bugfixes will be added. Use at your own risk.

sodapy is a python client for the Socrata Open Data API.

Installation

You can install with pip install sodapy.

If you want to install from source, then clone this repository and run python setup.py install from the project root.

Requirements

At its core, this library depends heavily on the Requests package. All other requirements can be found in requirements.txt. sodapy is currently compatible with Python 3.5, 3.6, 3.7, 3.8, 3.9, and 3.10.

Documentation

The official Socrata Open Data API docs provide thorough documentation of the available methods, as well as other client libraries. A quick list of eligible domains to use with this API is available via the Socrata Discovery API or Socrata's Open Data Network.

This library supports writing directly to datasets with the Socrata Open Data API. For write operations that use data transformations in the Socrata Data Management Experience (the user interface for creating datasets), use the Socrata Data Management API. For more details on when to use SODA vs the Data Management API, see the Data Management API documentation. A Python SDK for the Socrata Data Management API can be found at socrata-py.

Examples

There are some jupyter notebooks in the examples directory with usage examples of sodapy in action.

Interface

Table of Contents

client

Import the library and set up a connection to get started.

>>> from sodapy import Socrata
>>> client = Socrata(
        "sandbox.demo.socrata.com",
        "FakeAppToken",
        username="[email protected]",
        password="mypassword",
        timeout=10
    )

username and password are only required for creating or modifying data. An application token isn't strictly required (can be None), but queries executed from a client without an application token will be subjected to strict throttling limits. You may want to increase the timeout seconds when making large requests. To create a bare-bones client:

>>> client = Socrata("sandbox.demo.socrata.com", None)

A client can also be created with a context manager to obviate the need for teardown:

>>> with Socrata("sandbox.demo.socrata.com", None) as client:
>>>    # do some stuff

The client, by default, makes requests over HTTPS. To modify this behavior, or to make requests through a proxy, take a look here.

datasets(limit=0, offset=0)

Retrieve datasets associated with a particular domain. The optional limit and offset keyword args can be used to retrieve a subset of the datasets. By default, all datasets are returned.

>>> client.datasets()
[{"resource" : {"name" : "Approved Building Permits", "id" : "msk6-43c6", "parent_fxf" : null, "description" : "Data of approved building/construction permits",...}, {resource : {...}}, ...]

get(dataset_identifier, content_type="json", **kwargs)

Retrieve data from the requested resources. Filter and query data by field name, id, or using SoQL keywords.

>>> client.get("nimj-3ivp", limit=2)
[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitude': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'occurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {...}]

>>> client.get("nimj-3ivp", where="depth > 300", order="magnitude DESC", exclude_system_fields=False)
[{u'geolocation': {u'latitude': u'-15.563', u'needs_recoding': False, u'longitude': u'-175.6104'}, u'version': u'9', u':updated_at': 1348778988, u'number_of_stations': u'275', u'region': u'Tonga', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T21:16:43', u':id': 132, u'source': u'us', u'depth': u'328.30', u'magnitude': u'4.8', u':meta': u'{\n}', u':updated_meta': u'21484', u'earthquake_id': u'c000cnb5', u':created_at': 1348778988}, {...}]

>>> client.get("nimj-3ivp/193", exclude_system_fields=False)
{u'geolocation': {u'latitude': u'21.6711', u'needs_recoding': False, u'longitude': u'142.9236'}, u'version': u'C', u':updated_at': 1348778988, u'number_of_stations': u'136', u'region': u'Mariana Islands region', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T11:19:07', u':id': 193, u'source': u'us', u'depth': u'300.70', u'magnitude': u'4.4', u':meta': u'{\n}', u':updated_meta': u'21484', u':position': 193, u'earthquake_id': u'c000cmsq', u':created_at': 1348778988}

>>> client.get("nimj-3ivp", region="Kansas")
[{u'geolocation': {u'latitude': u'38.10', u'needs_recoding': False, u'longitude': u'-100.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Kansas', u'occurred_at': u'2010-09-19T20:52:09', u'number_of_stations': u'15', u'depth': u'300.0', u'magnitude': u'1.9', u'earthquake_id': u'00189621'}, {...}]

get_all(dataset_identifier, content_type="json", **kwargs)

Read data from the requested resource, paginating over all results. Accepts the same arguments as get(). Returns a generator.

>>> client.get_all("nimj-3ivp")
<generator object Socrata.get_all at 0x7fa0dc8be7b0>

>>> for item in client.get_all("nimj-3ivp"):
...     print(item)
...
{'geolocation': {'latitude': '-15.563', 'needs_recoding': False, 'longitude': '-175.6104'}, 'version': '9', ':updated_at': 1348778988, 'number_of_stations': '275', 'region': 'Tonga', ':created_meta': '21484', 'occurred_at': '2012-09-13T21:16:43', ':id': 132, 'source': 'us', 'depth': '328.30', 'magnitude': '4.8', ':meta': '{\n}', ':updated_meta': '21484', 'earthquake_id': 'c000cnb5', ':created_at': 1348778988}
...

>>> import itertools
>>> items = client.get_all("nimj-3ivp")
>>> first_five = list(itertools.islice(items, 5))
>>> len(first_five)
5

get_metadata(dataset_identifier, content_type="json")

Retrieve the metadata associated with a particular dataset.

>>> client.get_metadata("nimj-3ivp")
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "http://foo.bar.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

update_metadata(dataset_identifier, update_fields, content_type="json")

Update the metadata for a particular dataset. update_fields should be a dictionary containing only the metadata keys that you wish to overwrite.

Note: Invalid payloads to this method could corrupt the dataset or visualization. See this comment for more information.

>>> client.update_metadata("nimj-3ivp", {"attributionLink": "https://anothertest.com"})
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "https://anothertest.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

Download all attachments associated with a dataset. Return a list of paths to the downloaded files.

>>> client.download_attachments("nimj-3ivp", download_dir="~/Desktop")
    ['/Users/xmunoz/Desktop/nimj-3ivp/FireIncident_Codes.PDF', '/Users/xmunoz/Desktop/nimj-3ivp/AccidentReport.jpg']

create(name, **kwargs)

Create a new dataset. Optionally, specify keyword args such as:

  • description description of the dataset
  • columns list of fields
  • category dataset category (must exist in /admin/metadata)
  • tags list of tag strings
  • row_identifier field name of primary key
  • new_backend whether to create the dataset in the new backend

Example usage:

>>> columns = [{"fieldName": "delegation", "name": "Delegation", "dataTypeName": "text"}, {"fieldName": "members", "name": "Members", "dataTypeName": "number"}]
>>> tags = ["politics", "geography"]
>>> client.create("Delegates", description="List of delegates", columns=columns, row_identifier="delegation", tags=tags, category="Transparency")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

publish(dataset_identifier, content_type="json")

Publish a dataset after creating it, i.e. take it out of 'working copy' mode. The dataset id id returned from create will be used to publish.

>>> client.publish("2frc-hyvj")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

set_permission(dataset_identifier, permission="private", content_type="json")

Set the permissions of a dataset to public or private.

>>> client.set_permission("2frc-hyvj", "public")
<Response [200]>

upsert(dataset_identifier, payload, content_type="json")

Create a new row in an existing dataset.

>>> data = [{'Delegation': 'AJU', 'Name': 'Alaska', 'Key': 'AL', 'Entity': 'Juneau'}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 1, u'By RowIdentifier': 0}

Update/Delete rows in a dataset.

>>> data = [{'Delegation': 'sfa', ':id': 8, 'Name': 'bar', 'Key': 'doo', 'Entity': 'dsfsd'}, {':id': 7, ':deleted': True}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 1, u'Rows Updated': 1, u'By SID': 2, u'Rows Created': 0, u'By RowIdentifier': 0}

upsert's can even be performed with a csv file.

>>> data = open("upsert_test.csv")
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 1, u'By SID': 1, u'Rows Created': 0, u'By RowIdentifier': 0}

replace(dataset_identifier, payload, content_type="json")

Similar in usage to upsert, but overwrites existing data.

>>> data = open("replace_test.csv")
>>> client.replace("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 12, u'By RowIdentifier': 0}

create_non_data_file(params, file_obj)

Creates a new file-based dataset with the name provided in the files tuple. A valid file input would be:

files = (
    {'file': ("gtfs2", open('myfile.zip', 'rb'))}
)
>>> with open(nondatafile_path, 'rb') as f:
>>>     files = (
>>>         {'file': ("nondatafile.zip", f)}
>>>     )
>>>     response = client.create_non_data_file(params, files)

replace_non_data_file(dataset_identifier, params, file_obj)

Same as create_non_data_file, but replaces a file that already exists in a file-based dataset.

Note: a table-based dataset cannot be replaced by a file-based dataset. Use create_non_data_file in order to replace.

>>>  with open(nondatafile_path, 'rb') as f:
>>>      files = (
>>>          {'file': ("nondatafile.zip", f)}
>>>      )
>>>      response = client.replace_non_data_file(DATASET_IDENTIFIER, {}, files)

delete(dataset_identifier, row_id=None, content_type="json")

Delete an individual row.

>>> client.delete("nimj-3ivp", row_id=2)
<Response [200]>

Delete the entire dataset.

>>> client.delete("nimj-3ivp")
<Response [200]>

close()

Close the session when you're finished.

>>> client.close()

Run tests

$ pytest

Contributing

See CONTRIBUTING.md.

Meta

This package uses semantic versioning.

Source and wheel distributions are available on PyPI. Here is how I create those releases.

python3 setup.py bdist_wheel
python3 setup.py sdist
twine upload dist/*

sodapy's People

Contributors

afeld avatar chrismetcalf avatar dependabot-preview[bot] avatar dogrdon avatar james-ohara avatar johnclary avatar matthewwritter avatar mrkriss avatar nathanhilbert avatar remram44 avatar ryan-hall avatar timwis avatar xmunoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sodapy's Issues

Needs update: Example 01: Basic Queries

Hi there.

It seems like the domain has moved away from https://opendata.socrata.com to evergreen or tyler tech? Which results in most of the links in this notebook to be 404.

Could the contributor look into this?

A lot of links from Socrata webpage are also 404, that makes me think if Socrate has moved away to other platform? If so, could someone shared with me where can I get the data?

Many thanks!!

Update metadata

Just noticed there's a create method and a get_metadata method using the same endpoint, but no update method. Here's some relevant info from Socrata on the /api/views.json endpoint:

PUT to /api/views/unique_id.json with a subset of what you want to update. For instance, if you want to add tags to the previously created dataset (currently this is the only way to add tags), you can simply PUT to the endpoint with:

{
  "tags": ["these", "words", "are", "tags"]
}

You'll probably also want to make this dataset public and publish it, so you'll want to perform the two following operations as well:
PUT /views/unique_id?method=setPermission&value=public.read
POST /api/views/unique_id/publication.json

One important thing to note about updates is that if you want to update something within the "custom metadata", you will have to copy the entire metadata object since it does a replace.

Note that I've heard over the year(s) that the views endpoint is/will be deprecated, but until there's a clear alternative it probably makes sense to continue to use it.

Anyway, I may get a chance to work on this, but wanted to post the details here in case anyone else is interested.

Feature request: Option to disable the throttling limit warning

Hi,

When no app_token is provided a Warning is logged:

Requests made without an app_token will be subject to strict throttling limits.

Would it be possible to disable that warning? I know that working without an app_token limits the amount of requests to an OpenData API, and I don't want my logfile to fill up with Warnings that I already know about.

Missing tag - 1.5.3

Congrats on the release, and thank you for your continued work on this!

It seems that no tag exists on GitHub for 1.5.3, although I see that you pushed it to PyPI. 1.2.0 seems to be missing as well (looks like it should be 433c946).

KeyError: 'attachments' when calling 'download_attachments'

When calling 'download_attachments' I face a KeyError problem. Could someone help please.

from sodapy import Socrata

client = Socrata("sandbox.demo.socrata.com", None)
Warning: requests made without an app_token will be subject to strict throttling
limits.

client.get("nimj-3ivp", limit=2)
[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitud
e': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'o
ccurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7
.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {u'geolocation': {u'
latitude': u'34.525', u'needs_recoding': False, u'longitude': u'-118.1527'}, u'v
ersion': u'0', u'source': u'ci', u'region': u'Southern California', u'occurred_a
t': u'2012-09-14T22:14:45', u'number_of_stations': u'35', u'depth': u'10.60', u'
magnitude': u'1.5', u'earthquake_id': u'15215753'}]

client.download_attachments(dataset_identifier="nimj-3ivp", download_dir="~/
Desktop")
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\sodapy__init__.py", line 137, in download
_attachments
attachments = metadata['metadata']['attachments']
KeyError: 'attachments'

Can't access from behind proxy

I'm trying to use python behind proxy and I get this error:

HTTPSConnectionPool(host='hostname', port=443): Max retries exceeded with url: /api/views/hdsc-ubkz.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fc509240110>: Failed to establish a new connection: [Errno 113] No route to host',))

Is there a way to specify proxy settings with sodapy?

python 3 install instructions incomplatable with build tools

Thanks for this library! It will greatly simplify my workflow if I can just resolve one thing. I'm trying to use this library in python 3 inside of a docker container. Because errors during the Docker build process cause the build to stop, having to run python install setup.py twice isn't a solution. I imagine this is a problem with any build tool similar to docker. Is it possible to eliminate the need for errors in the install process?

UnicodeEncodeError on exception

While uploading some data to Socrata with sodapy 1.2.0 and the upsert method, an Exception was raised due to status response, but the code failed as it seems the text coming from the server including UTF-8 characters.

Below is the traceback

  File "/Users/secastro/miniconda/lib/python2.7/site-packages/sodapy/__init__.py", line 221, in upsert
    return self._perform_update("post", resource, payload)
  File "/Users/secastro/miniconda/lib/python2.7/site-packages/sodapy/__init__.py", line 238, in _perform_update
    data=json.dumps(payload))
  File "/Users/secastro/miniconda/lib/python2.7/site-packages/sodapy/__init__.py", line 284, in _perform_request
    _raise_for_status(response)
  File "/Users/secastro/miniconda/lib/python2.7/site-packages/sodapy/__init__.py", line 332, in _raise_for_status
    http_error_msg += ".\n\t{0}".format(more_info)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 90: ordinal not in range(128)

I'm not uploading UTF-8 data, so I haven't tested specifically for that.

Unable to get data on map

I am running into an issue when trying to pull points from a map.

Here is the dataset:
https://data.austintexas.gov/dataset/Address-Points/mkjr-t5r2

Code (python 3.x)
from sodapy import Socrata
client = Socrata("data.austintexas.gov", None)
addresspoints = client.get("mkjr-t5r2")

Response:
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Traceback (most recent call last):
File "C:\Scripts\Socrata\getSocrataData.py", line 3, in <module>
addresspoints = client.get("mkjr-t5r2")
File "C:\Python\Python36\lib\site-packages\sodapy\__init__.py", line 237, in get
params=params)
File "C:\Python\Python36\lib\site-packages\sodapy\__init__.py", line 352, in _perform_request
_raise_for_status(response)
File "C:\Python\Python36\lib\site-packages\sodapy\__init__.py", line 406, in _raise_for_status
raise requests.exceptions.HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: 404 Client Error: Not Found.
Dataset '_mkjr-t5r2' was not found

I've tested other datasets without issue.

Read timeout error

when I run results = client.get("s6ew-h6mp") in python, I get below error, any idea how I can fix it?

ReadTimeout: HTTPSConnectionPool(host='data.consumerfinance.gov', port=443): Read timed out. (read timeout=10)

Query to return all dataset identifiers from an endpoint?

Hi,

I'm not sure if I'm missing something, but can this library perform a query to return all dataset identifiers from a given endpoint, so that I can then loop through them and perform get requests?

I could obviously just make a request to https://data.edmonton.ca/api/catalog/v1?domains=data.edmonton.ca and parse the JSON manually to get the resource ID from every top-level object, but it seems like there should be an easier way...

Importing blob files

I am attempting to replace a zip file in socrata. This file may be generated on a nightly basis. The Google Transit Feed System file format has a number of csv files of different schemas, so it doesn't make sense for a direct CSV upsert.

Safe FME has provided a method for pushing blob files to Socrata. It seems to hit the not-so-documented api end point of POST https://data.texas.gov/api/imports2?method=blob&fileUploaderfile=['C:', 'myfile.zip'] with the usual header files.

I was thinking about providing a PR that incorporates this functionality in sodapy. Do you think it would make sense to put a blob parameter in replace and upsert. If blob is true, then use the endpoint above to push the file?

I was curious if there is a reason why api/imports2 is not used and if there are potentially other methods.

Fails installation via pip

Was trying to install the API in my virtual env, and got the following failure, other packages install and work just fine. Any suggestions?

(datamanager) Zeitgeist:datamanager spector$ pip install sodapy
Collecting sodapy
Using cached sodapy-0.1.6.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 20, in
File "/private/var/folders/kh/q9217zqs1wdf6g7yzczz09br0000gn/T/pip-build-jkmfj12g/sodapy/setup.py", line 4, in
execfile("sodapy/version.py")
NameError: name 'execfile' is not defined

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/kh/q9217zqs1wdf6g7yzczz09br0000gn/T/pip-build-jkmfj12g/sodapy

Loosen requirements.txt versions

Typically a pip installable package should not have pinned version requirements. If possible, please remove the version declarations from requirements.txt. Users may want to use different versions of the requirements.

Support to pass in variables to query.

I am not sure if this is possible but it would be really nice to have this functionality:

client = Socrata(....)
var x = ....
result = client.get(some_identifier, query='SELECT * WHERE something = %s, (x))

Is there a way to pass in variables to the sql query?

Tutorials and Usage Examples

New users, particularly those without experience with Socrata, might benefit from a tutorial, plus a few template projects.

I'd be happy to start a PR if this sounds interesting! Let me know if there are any data sources or pieces of functionality that you think would be particularly good. Otherwise I'll dive in and figure something out.

Delete By Row ID Not Working

I'm trying to delete individual rows by row_id using the following:

client.delete("data-set", row_id=2)

where but am getting the following result

HTTPError: 400 Client Error: Bad Request.
	Column is maltyped; for field ':id' we expected 'row_identifier' but got '"2"'

I don't have an explicit Row Identifier on the table. Unfortunately it's a private data set as its in a Tyler instantiation for criminal justice data, so I can't share the URL for reproduction. I've verified that delete works (removes the entire table), so I don't think it's a permission issue. Any help is appreciated.

Support for Socrata Headers?

Sorry if this isn't the appropriate place to ask about this. Please close it if not. I'm working on my very first python script and I've been looking for a way to get the field names and data types for just the record set returned by my query.

client.get_metadata() returns all fields, but I want to build a dictionary using just the fields in my results.

Response headers X-SODA2-Fields (and X-SODA2-Types) should return a set limited to just the fields returned by the get() request.

Is this something that could be supported? I can see in utils.py the response header is processed for response codes, but cannot find anything that processes these fields. Or maybe I should just use requests instead?

content-type fails

I am having trouble with the content-type in sodapy. Specifically - this is for the RDW dataset.

Anyway, running

from sodapy import Socrata
client = Socrata("opendata.rdw.nl", "token1234", "[email protected]", password="XXX")
client.get("/resource/m9d7-ebf2.csv", limit=4)

From this I get the response:

Exception: Unknown response format: text/csv;charset=utf-8

There seems to be a missing space here in the header response so that the if statement fails on line 205 in __init__.py

text/csv;charset=utf-8

Instead of

text/csv; charset=utf-8

The same happens for .json instead of .csv

Is this an RDW issue for excluding a space from the header? Is there any workaround for this?

get_all() has inconsistent pagination

After running:

results = client.get_all("y8as-bmzj")
with open("coc test.csv", "a", newline="", encoding="utf-8") as line_file:
    csv_writer = DictWriter(line_file, fieldnames=["id","sample_site","sample_date",
                                                   "parameter","numeric_result","result_qualifier",
                                                   "formatted_result","result_units","latitude_degrees","longitude_degrees","site_key"])
    for row in data:
        csv_writer.writerow(row)      

I was receiving different final files each time. Sometimes these would include duplicate rows, other times I would have rows missing. I reran my code using pagination over the OData API instead with no issues. From my debugging the only thing I could figure was that the get_all() function wasn't returning the data correctly

support automatic pagination

Hey, nice library! I was thinking, it would be useful for the package to support pagination for get() (maybe other methods), to make it easier to retrieve large datasets. Maybe an optional paginate=True keyword argument? Thanks!

Confusing error for csv upsert

I'm using SodaPy to do a simple upsert of a data file to an empty dataset (contains columns but no data). However, I'm getting a confusing error message that Socrata cannot recognize the payload type file, even though that's supposed to be supported. This is a properly formatted comma-separated file with Unix line endings. Traceback of the error in Python 2.7 is below. The error is Python 3 is different and attached here.

Traceback (most recent call last):
File "dtg_data_upload.py", line 21, in
conn.upsert("8ect-6jqj ", uploadcsv)
File "C:\Python27\lib\site-packages\sodapy_init_.py", line 249, in upsert
return self.perform_update("post", resource, payload)
File "C:\Python27\lib\site-packages\sodapy_init
.py", line 307, in _perform_update
" and file-types are supported.".format(type(payload)))
Exception: Unrecognized payload <type 'file'>. Currently only list-, dictionary-, and file-types are supported.
python3error.txt

Support for querying more than 50000 results

If I try to retrieve more than 50,000 records I get the following error:

WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Traceback (most recent call last):
  File "somesoda.py", line 23, in <module>
    result = client.get(medic_identifier, query=finalQuery)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 291, in get
    params=params)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 406, in _perform_request
    _raise_for_status(response)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 460, in _raise_for_status
    raise requests.exceptions.HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: 400 Client Error: Bad Request.
        length must be <= 50000

The snippet of code I am using is:


finalQuery = 'SELECT * WHERE year = ' + maxyear +' AND quarter = '+maxQuarter+' ORDER BY ndc DESC LIMIT 50001'
finalQuery2 = 'SELECT COUNT(*) WHERE year = ' + maxyear +' AND quarter = '+maxQuarter
result = client.get(medic_identifier, query=finalQuery)

The dataset I am using has 600,000+ results. Is there a way to get all of them?

SoQL query support for get_all()

While I was trying to submit a SoQL query using the get_all() method, it kept throwing an error:

HTTPError: 400 Client Error: Bad Request.
        If $query is used, all options - [$offset] should not be specified in $query.

The official documentation isn't super detailed It looks like the problem is that all options must be in the SoQL query if you use a SoQL query.

My code using get_all is below. But you can reproduce this error with the get() method by including a query parameter with any other SoQL clause.

I came up with a functional solution, but wasn't sure how to update the test or documentation.

However, I thought I would open an issue before making a pull request. For most cases, filtering with the other API parameters is ok. Perhaps the simplest solution is changing the get_all() docstring to note SoQL queries aren't supported.

import json
from sodapy import Socrata

client_params = {'domain': 'health.data.ny.gov',
                'app_token': None,
                'timeout': 10} 

request_params = {'dataset_identifier': 'vn5v-hh5r',
                  'content_type': 'json',
                  'query': """SELECT facility_name, fac_desc_short
                              WHERE fac_desc_short like '%HOSP%'"""}

with Socrata(**client_params) as client:
    response = client.get_all(**request_params)

# Iterating over the generator throws HTTPError error
resp_list = [r for r in response]

# If it actually works, put data here
with open('query_test.json', 'w') as f:
    json.dump(resp_list, f, indent=4)

replace() method using CSV

Hi
I'm trying to replace a dataset using .replace() method. I use a CSV that contains headers. does replace() method accept CSV with headers?or no?
Thanks

`get_all()` does not paginate correctly, returning duplicate rows

When using get_all() on large datasets, the results are not paginated correctly. The returned response has the correct total amount of rows but approximately 10% of the rows are duplicates of other rows. If the API call does not explicitly order rows, there is no guarantee that each page of results is a unique chunk of the total rows in the dataset. This could be resolved by creating an API call with limit greater than or equal to the total number of rows in the dataset.

Support discovery API

The discovery API has more filtering methods than just listing all the datasets in the domain, for example categories, tags, types, license, column name, and even full-text query terms.

It would be nice if this was exposed in the .dataset() method.

I can take a crack at adding this if there is interest.

Documentation: Include `timeout` param in client example

I'd like to suggest highlighting the timeout param it in the client example. As a data publisher I find that I almost always need to increase the timeout when upserting more than a 100 records, particularly when I'm chunking my payloads and hitting the API in quick succession.

add linting

Currently, there is no way for contributors to know what the style guidelines are for this project. Find a way for contributors to easily lint their contributions before opening a PR, which will save reviewers and contributors time in the end. Some ideas:

  1. A local git hook that contributors could install
  2. Something integrated into travis-ci that runs on every commit

Also, CONTRIBUTING.md should be updated with information about how/where/when to run linter.

[documentation] List of Eligible Domains is Out of Date

As of today, the link in the README.md returns a 404 error: https://opendata.socrata.com/dataset/Socrata-Customer-Spotlights/6wk3-4ija

I suspect an equivalent list of list of eligible domains might be in the response for this endpoint: https://socratadiscovery.docs.apiary.io/#reference/0/count-by-domain/count-by-domain?console=1

Suggested next steps

  1. Find out if there's an updated Socrata view to link to instead
  2. Updated README to point to somewhere in the Apiary docs instead of to the old link

Apply date filter in where field

I am trying to apply a date filter in my call like this:

client = Socrata("data.sfgov.org", None, timeout=60)

results = client.get("i98e-djp9", where="YEAR(DATE_FORMAT(permit_creation_date, \"%Y\")) > 2020")

This throws a HTTPError: 400 Client Error: Bad Request.. How do I apply date filter as the columns are not stored as date type?

`Query coordinator error: query.soql.no-such-function; No such function 'DATE_FORMAT'; arity=2; position: Map(row -> 1, column -> 889, line -> "SELECT `permit_number`, `permit_type`, `permit_type_definition`, `permit_creation_date`, `block`, `lot`, `street_number`, `street_number_suffix`, `street_name`, `street_suffix`, `unit`, `unit_suffix`, `description`, `status`, `status_date`, `filed_date`, `issued_date`, `completed_date`, `first_construction_document_date`, `structural_notification`, `number_of_existing_stories`, `number_of_proposed_stories`, `voluntary_soft_story_retrofit`, `fire_only_permit`, `permit_expiration_date`, `estimated_cost`, `revised_cost`, `existing_use`, `existing_units`, `proposed_use`, `proposed_units`, `plansets`, `tidf_compliance`, `existing_construction_type`, `existing_construction_type_description`, `proposed_construction_type`, `proposed_construction_type_description`, `site_permit`, `supervisor_district`, `neighborhoods_analysis_boundaries`, `zipcode`, `location`, `record_id` WHERE YEAR(DATE_FORMAT(`permit_creation_date`, '%Y')) > 2020\n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ^")`

This repository will be archived on Aug 31

Hello friends,

7 years ago I started developing this project. Since then, it has become the most popular client library for the SODA API. At that time, I was just beginning my journey as a software developer. 7 years later, I am a busy consultant juggling many different clients. As such, I no longer have the time or energy to maintain this project as a volunteer.

I have reached out to Tyler Technologies to see if they would like to continue maintaining sodapy, or pay me to do so via Github Sponsors. The reply they sent me indicates that they do not want to do either of those things, so I will be locking this repository to any changes in the next two weeks. Before I do that, I would be happy to review and merge Pull Requests that address any existing issues. Additionally, I will also cut one final release.

Feel free to comment below if you have any concerns. If you have contributed to or used this project in any capacity over the last 7 years, thank you!

-Cristina

error when running examples

Hi there!
I'm trying to run the examples provided in README, as

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)
>>> from sodapy import Socrata
>>> client = Socrata("sandbox.demo.socrata.com", None)
>>> client.get("nimj-3ivp", limit=2)

But it keeps on erroring

>>> client.get("nimj-3ivp", limit=2)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 835, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 377, in wrap_socket
    _context=self)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 752, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 988, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 633, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 54] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 643, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 334, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 835, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 377, in wrap_socket
    _context=self)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 752, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 988, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 633, in do_handshake
    self._sslobj.do_handshake()
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sodapy/__init__.py", line 237, in get
    params=params)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sodapy/__init__.py", line 341, in _perform_request
    response = getattr(self.session, request_type)(uri, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 501, in get
    return self.request('GET', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 473, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

Seems like a issue with Requests. I have searched around but haven't find anything helpful to resolve it online. Have you guys ran into it before?

Thanks!
Bowen

Exception: Unknown response format: application/json

I deployed an application three months ago using sodapy. It was fully functional up until Monday, November 28. On my deployed instance, I get "502 Bad Gateway". Running locally, I get the following error:

File "/Users/File/Path/Crime.py", line 12, in get_crimes
crimes = client.get("cuks-n6tp",limit=3000)

File "/usr/local/lib/python2.7/site-packages/sodapy/__init__.py", line 237, in get
params=params)

File "/usr/local/lib/python2.7/site-packages/sodapy/__init__.py", line 368, in _perform_request
.format(content_type))

Exception: Unknown response format: application/json

I don't understand the exception. The data set is here and clearly json: https://data.sfgov.org/resource/cuks-n6tp.json

Let me know if this is the wrong place to post this question.

Thanks,
Alex

Issue with upsert/replace

I keep getting the same error when using upsert or replace:

Traceback (most recent call last):
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 1321, in getresponse
    response.begin()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 296, in begin
    version, status, reason = self._read_status()
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\http\client.py", line 257, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 307, in recv_into
    raise timeout('The read operation timed out')
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Users\tristanya\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='mydata.iadb.org', port=443): Read timed out. (read timeout=10)

I have tried using csv and json as the data format, neither work. Any idea what is going on?

Downloading Attachments fails for a small number of datasets

This is only tested on nyc open data portal (https://nycopendata.socrata.com), but there are a handful of datasets for which the the attachment metadata either has no assetId property (see: 3av7-txd8) or has an empty assetId property (see: 7isb-wh4c).

In either case, the path to the asset comes in the form {base url}/api/assets/{blobId}?download=true instead of {base url}/api/views/{identifier}/files/{assetId}?download=true&filename={filename}

If it has no assetId property, there is a Key Error and the client fails. If the assetId property is present, but empty, it appears that the blobId contains the asset ID, but since it is formatting the URL to a nonexistent resource, it is downloading an empty file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.