reagentx / purple_air_api Goto Github PK
View Code? Open in Web Editor NEWPython package to get and transform PurpleAir data
Home Page: https://pypi.org/project/purpleair/
License: GNU General Public License v3.0
Python package to get and transform PurpleAir data
Home Page: https://pypi.org/project/purpleair/
License: GNU General Public License v3.0
Hello,
Using 1.2.1 and trying to run the demo code to import all the available sensors:
from purpleair.network import SensorList
p = SensorList() # Initialized 11,220 sensors!
df = p.to_dataframe(sensor_filter='all',
channel='parent')
And it returns:
ValueError: No sensor data returned from PurpleAir: An empty querystring is not permitted. Please contact PurpleAir at [email protected] for assistance.
Is there another way to do this?
Thanks!
Right now the FAQ is quite unorganized.
Right now we use a dict
to pass in parameters, but in the future we should require a more formal data structure to prevent issues.
Common problems so far:
requests_cache
sqlite file unreadableHi, the error "ValueError: Invalid JSON data returned from network!" is not resolved with re-running and also removing the "cache.sqlite". Is there any other solution for this error?
Thanks
Right now, the parameter sensor_filter
of to_dataframe()
only filters on the options {'useful', 'outside', 'all'}
. We could include more filters here:
None
in the given column nameStats
The column names of the Dataframe disagree with the (documentation). The fields are actually referring to the column names until 20 Oktober 2019. For example, PM1.0_CF_ATM_ug/m3
is actually PM1.0_CF1_ug/m3
according to documentation and field5 of PARENT_PRIMARY_COLS
is RSSI
instead of ADC
.
purple_air_api/purpleair/api_data.py
Lines 20 to 70 in 031f7cc
Per the docs, the key is now an integer e.x. 3
instead of previous string, e.x. field3
.
It would be nice to still be able to cite/reference this library, and to have a DOI entry. Instructions for doing so are linked here: https://guides.github.com/activities/citable-code/
Currently, all sample code is just in the readme, but it is incomplete and does not actually document anything.
I've run into the "Child 5843 lists parent 5842, but parent does not exist!" error and I read through the FAQ and found that deleting the cache.sqlite
file might help, but unfortunately it did not. I also see that this is an issue on purpleair's side, not this program, but it entirely prevents SensorList from working while the issue persists. Would it be viable for SensorList to accept a flag that would tell it to ignore these kinds of mismatches when constructing the sensor network?
Hi Christopher,
Thanks for a useful package!
I've noticed that the information that the package pulls when you specify the B sensor in Sensor.get_historical()
is actually the secondary dataset for the A sensor.
Something like this in Sensor.get_data()
would get the b sensor.
response = requests.get(f'{API_ROOT}?show={self.identifier}')
data = json.loads(response.content)
a_channel = data['results'][0]
b_channel = data['results'][1]
If you check the 'Label' key for a_channel
and b_channel
, you'll see that the B sensor's label has "B" appended to it. The data returned from the keys in the B sensor in this way are also consistent with the column headers. The thingspeak csv requested using data['results'][1]['THINGSPEAK_PRIMARY_ID']
and data['results'][1]['THINGSPEAK_PRIMARY_ID_READ_KEY']
lines up nicely with the primary B sensor header from https://www2.purpleair.com/community/faq#!hc-primary-and-secondary-data-header, but the "unused" column for Sensor.get_historical(sensor_channel='b', weeks_to_get=1)
contains data at the moment, because the function is pulling data from the wrong source.
Thanks again. I'm happy to collaborate on this if you'd like.
Charlie
Thank you for putting together a great python tool that works with panda dataframes. When I load the library with "from purpleair.network import Sensor" or from "purpleair.network import SensorList" I get the following error sqlite3.OperationalError: unable to open database file at the end of a long Traceback. Do I need to use an API key or something to gain access? Thank you.
Hi,
this code
from purpleair.network import SensorList
p = SensorList() # Initialized 11,220 sensors!
print(len(p.useful_sensors)) # 10047, List of sensors with no defects
should be
from purpleair.network import SensorList
p = SensorList() # Initialized 11,220 sensors!
print(len(p.all_sensors))
release
branch is for releases onlydevelop
branch is for development and PRsmaster
branchCurrently, get_historical
collects some number of weeks of data relative to "today". There could be special events which would motivate collecting data from specific sensors on/around specific dates. The following would support this use case:
Channel.get_historical
should support either a relative time frame, or a custom start and end period.Thank you for all your work on this wrapper, it has saved a lot of time.
User report: https://twitter.com/Frank_Hayes/status/1529034318785421312
I just started using it to look at data from se7488. The
parent.get_historical()
method with weeks=1
stops on May 10th.
{
"ID": 60333,
"Label": "Burbank",
"DEVICE_LOCATIONTYPE": "outside",
"THINGSPEAK_PRIMARY_ID": "1108161",
"THINGSPEAY_ID_READ_KEY": "Y9HNPG5JGYXR5Q3A",
"THINGSPEAK_SECONDARY_ID": "1108162",
"THINGSPEAK_SECONDARY_ID_READ_KEY": "NJZ5F2M4NH9CCD6Q",
"Lat": 37.769549,
"Lon": -122.271873,
"PM2_5Value": "4.07",
"LastSeen": 1600291467,
"Type": "PMS5003+PMS5003+BME280",
"Hidden": "false",
"isOwner": 0,
"humidity": "35",
"temp_f": "91",
"pressure": "1017.05",
"AGE": 1,
"Stats": "{\"v\":4.07,\"v1\":4.25,\"v2\":4.29,\"v3\":3.95,\"v4\":11.08,\"v5\":58.25,\"v6\":39.72,\"pm\":4.07,\"lastModified\":1600291467183,\"timeSinceModified\":119995}"
}
This is an artifact of not previously abstracting channels.
Hey @ReagentX
Heads up that the fine folks at PurpleAir deprecated the use of the purpleair.com/json endpoint and developed their own API. https://api.purpleair.com/
My job requires me to process a large amount of PurpleAir data quickly, and at a high frequency. I was going to spike out on a new package.
Per the Nominatim guidelines, the maximum speed is 1 request per second.
I am trying to pull data for the first six months of 2020 for all of the Bay Area, and it appears that I am not able to query that far back. I checked with PurpleAir and the data should be accessible for that time period. Is there a restriction applied through this API that could be creating this issue?
Thanks for the help!
Hi,
Using this package, I can only download data for specific time-period only using sensor specifications. But I want to look at all the sensors present in a particular geography and use their data for specific time periods. In short, I need to subset PA data both spatially and temporally.
Thanks for your help,
Praful
A new update in the package requests_cache (https://github.com/reclosedev/requests-cache/blob/master/requests_cache/core.py)
is throwing up an error "AttributeError: module 'requests_cache' has no attribute 'core'"
What is the workaround?
develop
develop
This will remove the requirement to manually install this package through git.
>>> cse = Sensor(6643)
Child sensor requested, acquiring parent instead.
>>> cse.identifier
6643
The identifier should be the parent’s ID (6642
), because cse
is filled with the data from 6642
:
purple_air_api/purpleair/sensor.py
Lines 66 to 69 in bac87df
However, the library naively assigns the identifier from the Sensor.__init__()
construction call, regardless of what data it gets filled with.
purple_air_api/purpleair/sensor.py
Line 25 in bac87df
I think they discontinued the "/json" and "/data.json URLs" a few days ago, and that's why your "Purpleair" library is not working.
Link: https://community.purpleair.com/t/discontinuation-of-the-json-and-data-json-urls/713
"After a few years of grace period, we are now redirecting these two URLs (www.purpleair.com/json and www.purpleair.com/data.json ) to a server that will not respond.
Please contact us if you have any questions or need any help getting going on our new API, at https://api.purpleair.com ."
hitting a ValueError with JSON return:
reproduced with
from purpleair.network import SensorList
p = SensorList()
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 5999909 (char 5999908)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/bmosley/code/py_aqi/env/lib/python3.8/site-packages/purpleair/network.py", line 23, in __init__
self.get_all_data()
File "/Users/bmosley/code/py_aqi/env/lib/python3.8/site-packages/purpleair/network.py", line 38, in get_all_data
raise ValueError(
ValueError: Invalid JSON data returned from network!
Using
from purpleair.purpleair import PurpleAir
feels repetitive. Perhaps we should call it network
, although
from purpleair.network import Network
does not seem much better.
Installing this package in an environment that uses Python 3.9.0
will result in the following error:
ImportError: Failed to import test module: purpleair
Traceback (most recent call last):
File "/Users/chris/.pyenv/versions/3.9.0/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "/Users/chris/.pyenv/versions/3.9.0/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
__import__(name)
File "/Users/chris/Documents/Code/Python/purple_air_api/tests/test_sensor.py", line 3, in <module>
from purpleair import sensor
File "/Users/chris/Documents/Code/Python/purple_air_api/purpleair/sensor.py", line 15, in <module>
from .channel import Channel
File "/Users/chris/Documents/Code/Python/purple_air_api/purpleair/channel.py", line 9, in <module>
import pandas as pd
File "/Users/chris/Documents/Code/Python/purple_air_api/venv/lib/python3.9/site-packages/pandas/__init__.py", line 11, in <module>
__import__(dependency)
File "/Users/chris/Documents/Code/Python/purple_air_api/venv/lib/python3.9/site-packages/numpy/__init__.py", line 286, in <module>
raise RuntimeError(msg)
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, see site.cfg.example for information. Otherwise report this to the vendor that provided NumPy.
RankWarning: Polyfit may be poorly conditioned
This is because there are no wheel
s built for NumPy yet. Since this package depends on numpy
(it is a dependency of pandas
, it will not work until numpy
works on Python 3.9.0
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.