paloaltonetworks / pan-cortex-data-lake-python Goto Github PK

Python idiomatic SDK for Cortex™ Data Lake.

Home Page: https://cortex.pan.dev/docs/data_lake/develop/cdl_python_installation

License: ISC License

Python 100.00%

pancloud paloaltonetworks applicationframework python sdk api rest-api panw pan logging-service event-service directory-sync-service directory-sync paloalto logging directory event cortex datalake data

pan-cortex-data-lake-python's Introduction

Palo Alto Networks Cortex™ Data Lake SDK

Python idiomatic SDK for the Cortex™ Data Lake.

The Palo Alto Networks Cortex Data Lake Python SDK was created to assist developers with programmatically interacting with the Palo Alto Networks Cortex™ Data Lake API.

The primary goal is to provide full, low-level API coverage for the following Cortex™ Data Lake services:

Query Service

The secondary goal is to provide coverage, in the form of helpers, for common tasks/operations.

Log/event pagination
OAuth 2.0 and token refreshing

Resources:

Documentation: https://cortex.pan.dev
Free software: ISC license

Features

HTTP client wrapper for the popular Requests library with full access to its features.
Language bindings for Query Service.
Helper methods for performing common tasks, such as log/event pagination.
Support for OAuth 2.0 grant code authorization flow.
Library of example scripts illustrating how to leverage the SDK.
Support for API Explorer Developer Tokens for easier access to API!

Status

The Palo Alto Networks Cortex™ Data Lake Python SDK is considered beta at this time.

Installation

From PyPI:

pip install pan-cortex-data-lake

From source:

pip install .

To run tests:

pip install .[test]

Obtaining and Using OAuth 2.0 Tokens

If you're an app developer, work with your Developer Relations representative to obtain your OAuth2 credentials. API Explorer may optionally be used to generate a Developer Token, which can also be used to authenticate with the API. For details on API Explorer developer tokens, please visit https://cortex.pan.dev/docs/data_lake/learn/developer_tokens.

Example

from pan_cortex_data_lake import Credentials, QueryService


c = Credentials()
qs = QueryService(credentials=c)
query_params = {
    "query": "SELECT * FROM `1234567890.firewall.traffic` LIMIT 1",
}
q = qs.create_query(query_params=query_params)
results = qs.get_job_results(job_id=q.json()['jobId'])
print(results.json())

Contributors

pan-cortex-data-lake-python's People

Contributors

Stargazers

Watchers

Forkers

fg-pan jtschichold mishas jeffreyleeon adewealthgit sserrata jepsenwan hvt jassik27 smit2896 jabielecki faqa steven-deboer aetheriaxai cdot65 byronjwatson globalprotect

pan-cortex-data-lake-python's Issues

HTTPClient default headers breaks HTTP persistence

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

The pattern used to add {'Accept': 'application/json'} to the default Session() header inadvertently overrides the {'Connection': 'keep-alive'} header, effectively disabling HTTP persistence.

Proposed solution

Apply a dict update pattern to preserve the default Session() headers.

iter_job_results() overrides credentials in get_job_results()

Describe the bug

The QueryService iter_job_results() incorrectly overrides the request credentials value.

Expected behavior

Credentials should only be overridden at the request() level when explicitly passed.

Current behavior

Credentials are overridden by iter_job_results() even when they are not explicitly passed.

Possible solution

Refrain from passing credentials kwarg to underlying get_job_results() method unless they are explicitly passed to iter_job_results() method.

Steps to reproduce

Screenshots

Context

Your Environment

Version used: alpha10
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python 3.7.7
Operating System and version (desktop or mobile): Mac OSX
Link to your project:

_apply_credentials doesn't respect auto_refresh parameter

Palo Alto Networks Cloud Python SDK version: v1.4.0
Python version: 2.7, 3.5+
Operating System: any

Description

The HTTPClient class _apply_credentials() method currently performs refresh() if access_token is None or expired without factoring in whether auto_refresh is enabled. This is problematic because the auto_refresh setting is effectively ignored, meaning there is no way to disable it.

Proposed solution

Pass scope-level auto_refresh parameter to _apply_credentials() so that it can be evaluated prior to performing refresh().

repr for Credentials class

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

Credentials class lacks a proper __repr__ method.

Proposed solution

Add a suitable __repr__ method to the Credentials class that masks access_token, refresh_token and client_secret when called. Should also allow for printing any kwargs passed to implicit HTTPClient.

Authorization header updated inefficiently

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

The HTTPClient class presents three opportunities to update the authorization header: when a Session() is first created, each time the request() method is called and immediately after an auto_refresh is performed. Ideally, the authorization header should only be updated once for each opportunity.

Example scenarios:

The authorization header is updated once, when a Session() is created and only updated again when an access_token expires, prompting an auto_refresh.
The authorization header is updated once, for an individual request() due to the developer intentionally passing a credentials argument to a service method, effectively choosing to temporarily override the Session() level credentials.

The current implementation causes the authorization header to be updated each time the request() method is called, regardless if credentials were passed in a service method. This is not the intended behavior.

Proposed solution

Refactor the _apply_credentials() method to update self.session.headers instead of returning a headers dict. This is the optimal approach.
Remove lines 253-256 which unnecessarily update the authorization header each time the request() method is called.

Other considerations

cache_token should only influence the writing/updating of the access_token and not determine if and when the authorization header should be updated

Update examples

Palo Alto Networks Cloud Python SDK version: v1.4.0
Python version: 2.7, 3.5+
Operating System: any

Description

Update all existing examples and add examples as needed.

Support for Developer Tokens

Palo Alto Networks Cloud Python SDK version: v1.4.0
Python version: 2.7, 2.5+
Operating System: any

Description

Until self-service API key/token generation arrives the SDK can be extended to support Developer Tokens generated by API Explorer. These Developer Tokens can be used to request a valid Application Framework access_token which is required for authentication/authorization. The use case is as follows:

User activates API Explorer instance in CSP
User authorizes API Explorer to access instance
User generates a Developer Token for the authorized instance
User instantiates a Credentials object with a Developer Token, using either the PAN_DEVELOPER_TOKEN envar or the developer_token constructor argument.
Upon recognizing the presence of a Developer Token, the Credentials object uses the Developer Token to authenticate with API Explorer in order to perform a token refresh().
API Explorer responds with a valid access_token which is cached by the Credentials object.

Proposal

Add support for PAN_DEVELOPER_TOKEN envar and developer_token constructor argument to the Credentials class.
Maintain full, backwards compatibility support for client_id, client_secret and refresh_token.

Token caching not working as designed

Palo Alto Networks Cloud Python SDK version: v1.2.2
Python version: 2.7, 3.5+
Operating System: Any

Description

Credentials get_credentials() method incorrectly attempts to resolve access_token before checking instance attribute access_token_ which results in applying the same expired/invalid access_token to HTTP header.
Credentials access_token() property method does not include attempting to resolve access_token in the event that the access_token_ instance attribute is None.
Credentials refresh() method compares passed access_token to instance attribute access_token_ instead of the property method which would otherwise attempt to resolve the access_token.

Proposed Fixes

Reverse the access_token lookup order in get_credentials() method.
Add _resolve_credential() to access_token() property method.
Compare passed access_token in refresh() method to property method.

Directory Sync API Count Endpoint Missing "domain"

Palo Alto Networks Cloud Python SDK version: v1.0.3
Python version: 2.7, 3.5, 3.6
Operating System: Any

Description

Directory Sync count endpoint is missing domain key-word argument/parameter.

Resolution

Add params key-word argument to support domain URL query argument.

Merge summit.py command-line utility

Palo Alto Networks Cloud Python SDK version: 1.0.2
Python version: 2.7.x, 3.5.x, 3.6.x
Operating System: Any

Description

Merge summit.py into master - a command-line pancloud wrapper.

Credentials get_authorization_url() returns invalid type for state

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

The Credentials() get_authorization_url() method returns a UUID object instead of a str.

Proposed solution

Return str(state) instead of state.

Other considerations

This could be a breaking change for any apps using this method to derive the auth_base_url and state, meaning a state parameter is not provided by user.

Raised exception in iter_job_results() prevents API errors from returning

Describe the bug

The following line in iter_job_results() method prevents API errors from being properly returned:

https://github.com/PaloAltoNetworks/pan-cortex-data-lake-python/blob/master/pan_cortex_data_lake/query.py#L239

Expected behavior

Like successful responses, failed/error responses from the API should be passed to the client.

Current behavior

In iter_job_results(), failed API responses are not returned to the client due to the raised exception.

Possible solution

Refrain from raising an exception for non HTTP status 200 responses.

Steps to reproduce

Screenshots

Context

Your Environment

Version used: alpha11
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python 3.7.7
Operating System and version (desktop or mobile): Mac OSX
Link to your project:

Support custom credentials store path

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

With the current implementation, the $HOME or %HOMEPATH% environment variables can be used to influence/change the read/write path of the credentials.json file. Although this functions as expected, requiring developers to change their $HOME or %HOMEPATH% settings may not be practical in every situation, especially when these variables are relied upon by other environment runtime processes.

Feature request

Provide a way for developers to specify the read/write path of the credentials.json file and other applicable storage file types, e.g. database path.

Proposed solution

Add a path argument to the Credentials class that can be passed to a storage adapter during initialization and used to override the default $HOME or %HOMEPATH% read/write path. If no path is specified then we should default to $HOME or %HOMEPATH%.

iter_job_results() yields for waiting states

Describe the bug

Today, the iter_job_results() method yields for all possible states, including "RUNNING" and "PENDING." Although useful for exploring or validating the API, yielding for waiting states is not necessary when the primary goal is to consume data.

Expected behavior

The iter_job_results() method should (probably) yield only when results/records are available.

Current behavior

See description.

Possible solution

Yield only for "DONE" or "FAILED" states.
Yield a tuple that includes (result, state)

Steps to reproduce

N/A

Screenshots

Context

Yielding for waiting states requires additional JSON parsing in order to determine that no results are available yet. This additional overhead can be eliminated if we avoid yielding results for waiting states.

Your Environment

Version used: alpha13
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python 3.7.7
Operating System and version (desktop or mobile): Mac OSX
Link to your project:

Update Library of Example Scripts

Palo Alto Networks Cloud Python SDK version: 1.0.2
Python version: 2.7.x, 3.5.x, 3.6.x
Operating System: All

Description

Update existing example scripts to according to recent API changes.
Add additional example scripts as needed.

Refresh incorrectly attempted when both static access_token and developer_token are present

Describe the bug

When both a static access_token and developer_token are present, a refresh() is incorrectly triggered when the access_token expires.

Expected behavior

Defining a static access_token should take precedence over a developer_token and suppress auto_refresh behavior.

Current behavior

When both a static access_token and developer_token are present, a refresh() is incorrectly triggered when the access_token expires.

Possible solution

The general approach would be to add a check for existence of a static access_token if a developer_token is present. If the access_token exists, the refresh() should be bypassed. The result should then be a 401 authentication error.

Steps to reproduce

Export both a PAN_DEVELOPER_TOKEN and PAN_ACCESS_TOKEN
Wait for the access_token to expire.
Attempt a refresh() and observe the refresh attempt.

Screenshots

Context

Your Environment

Version used: alpha8
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3):
Operating System and version (desktop or mobile):
Link to your project:

Add prefix to envars to avoid naming collisions

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

Today, the Credentials class looks for the following environment variables:

ACCESS_TOKEN
CLIENT_ID
CLIENT_SECRET
REFRESH_TOKEN

The issue with these variables names is that they follow the common OAuth 2.0 nomenclature, which opens the possibility to collisions with other applications that implement similar environment variable names.

Proposed solution

Add a prefix to the environment variable names to better distinguish them and help avoid collisions with other applications.

New names:

PAN_ACCESS_TOKEN
PAN_CLIENT_ID
PAN_CLIENT_SECRET
PAN_REFRESH_TOKEN

The new names will only apply to how Credentials reads/resolves environment variables and will not change the way these items are referred to as constructor variables, instance variables, or anywhere else in the codebase unrelated to environment variables. The documentation and docstrings will also be updated as needed.

Developer token auth doesnt work anymore

Cortex Data Lake Python SDK version: Latest
Python version: 3.7
Operating System: Mac

Description

The developer token authorization no longer works. You get a 400 Bad Request.

{
"msg": "Refresh operation failed: Token refresh failed: {"error_description":"unknown, invalid, or expired refresh token","error":"invalid_grant"}"
}

What I Did

curl -H "Authorization: Bearer $PAN_DEVELOPER_TOKEN" -X POST https://app.apiexplorer.rocks/request_token

Credentials OAuth 2.0 helpers not working as expected

Palo Alto Networks Cloud Python SDK version: v1.1.0
Python version: 2.7, 3.5+
Operating System: Any

Description

Attempted to utilize the OAuth 2.0 helper methods included in the Credentials class and encountered some unexpected behavior and inconsistencies.

The get_authorization_url method currently accepts the following args:

      instance_id (str): App Instance ID. Defaults to ``None``.
      redirect_uri (str): Redirect URI. Defaults to ``None``.
      region (str): App Region. Defaults to ``None``.
      scope (str): Permissions. Defaults to ``None``.
      state (str): UUID to detect CSRF. Defaults to ``None``.

What's not immediately apparent is that client_id is also required, which means it should either
be passed in the Credentials() constructor, be present in the credentials store/file or be accepted
as a key-word argument.

The fetch_tokens method currently accepts the following args:
```
      code (str): Authorization code. Defaults to ``None``.
      redirect_uri (str): Redirect URI. Defaults to ``None``.
```
The problem is that for the authorization code grant type, client_id and client_secret must also
be included in the payload (HTTP basic authentication is not currently supported). Additionally, the
HTTP header Content-Type must be set to application/x-www-form-urlencoded. Due to these
omissions, the fetch_tokens method does not function.

Suggested Fixes

Add client_id key-word argument to get_authorization_url method with support for resolving if passed as Credentials() kwarg or if present in credentials store/file.
Add client_id and client_secret kwargs to fetch_tokens method and add headers argument to set HTTP header Content-Type to application/x-www-form-urlencoded.

Assign default HTTPClient URL

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

With the GA release of the Application Framework APIs it is now possible to assign a default to the HTTPClient() url instance variable. Currently, the Application Framework supports URLs for two regions (US and Europe):

Given the options, a sane default would be https://api.us.paloaltonetworks.com.

Proposal

In HTTPClient class, update self.url to the following:

self.url = kwargs.pop('url', 'https://api.us.paloaltonetworks.com')

https://github.com/PaloAltoNetworks/pancloud/blob/master/pancloud/httpclient.py#L92

The behavior of the LoggingService, EventService and DirectorySyncService classes should be to accept the url value passed directly to the class constructor, the url value assigned to a passed-in HTTPClient/session object or the default url value.

Credential Resolver

Palo Alto Networks Cloud Python SDK version: v1.1.0+
Python version: 2.7, 3.5+
Operating System: Any

Description

Today, the Credentials class automatically looks for individual credentials, i.e. client_id, client_secret, or refresh_token, in particular places following a particular lookup order of precedence:

Credentials passed as Credentials constructor key-word arguments
Credentials stored as environment variables
Credentials stored in a credentials file/store, e.g. ~/.config/pancloud/credentials.json

Note: where #1 has a higher precedence than #2 followed by #3

With this design, it's possible to store and resolve credentials in an overlay/union fashion, similar to technologies like OverlayFS, where credentials may be merged with "upper" values taking precedence over "lower" values.

Location	Client ID	Client Secret	Refresh Token
Constructor	✔️
ENVAR		✔️
Credentials Store	✔️	✔️	✔️

Results/Winners:

client_id: Constructor
client_secret: ENVAR
refresh_token: Credentials Store

Issues with this approach

Envars could be abused leading to unexpected, potentially disruptive consequences. For example:
- A malicious actor could "poison" envars to disrupt functionality
- A developer could overlook the presence of envars leading to writing logs to the wrong Logging Service instance
This behavior might be the least intuitive approach to solving for how credentials should be resolved
Greater flexibility leaves more room for error

Potential solutions (not mutually exclusive)

Implement support for pinning to a particular lookup layer, e.g. provider='envar' or provider='store'
Add a constructor kwarg for disabling the envar layer, e.g. disable_envar, which can be False by default
- Heavily document the default behavior and recommend disable_envar=True for production
Change behavior such that resolution stops at a layer/source after the first credential is detected.
- If that source/layer happens to have an incomplete set of credentials, raise PartialCredentialsError
- If that source/layer only contains an access_token, the following outcomes are expected:
  - If access_token is valid the request will be accepted
  - If access_token is invalid the request will result in an HTTP 401 which will prompt a refresh() (assuming auto_refresh is enabled). The refresh() will result in a PartialCredentialsError due to the other credentials not being available.
  - This solution eliminates/replaces the current merge/overlay behavior

Other considerations

Envars support exists mainly for convenience purposes (development, testing, debugging), and for supporting environments like Jupyter notebooks, where you can set an access_token using something like %env ACCESS_TOKEN=123
#3 requires iterating through each possible source/layer to determine if credentials are present at each source/layer

Starrify authorization header in repr overrides HTTPClient header

Palo Alto Networks Cloud Python SDK version: 1.0.2
Python version: 2.7.x, 3.5.x, 3.6.x
Operating System: All

Description

Starrifying self.kwargs['headers']['authorization'] in __repr__ overrides access_token in HTTPClient header when __str__ or __repr__ are called.

Possible Fixes

Refrain from updating the value of self.kwargs['headers']['authorization'].
Switch to deepcopy() instead of a shallow copy().

Considerations

deepcopy() might be less performant than copy() or even json.dumps().
Updating __repr__ so that it doesn't need to update value of self.kwargs['headers']['authorization'] might result in "uglier" pattern.
Solution needs to be compatible with all service and HTTPClient classes.

Revoke decision to override requests data parameter behavior - PLEASE SUBSCRIBE IF YOU USE PANCLOUD

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: Any

Description

Early in pancloud's development, a decision was made to override the default behavior of the requests data parameter, so that it behaved more like the json parameter, meaning json.dumps() would be applied to dict and list payloads passed using data.

Although it seemed the right choice at the time, making data behave like json introduced two key issues:

It fundamentally alters the behavior of the data parameter, which could be confusing to those deeply familiar with requests library.
The json parameter still functions as expected (when passed as a kwarg), which means there’s now two parameters that can accomplish, essentially, the same thing.

Proposed Changes

Stop overriding data parameter behavior by removing lines 236-241 in httpclient.py module, including if/elif statement.
Update service methods to use json in place of data where appropriate.
Update all example scripts to use json in place of data where appropriate.

Additional Considerations

Update API Explorer views.py module to use json in place of data where appropriate.
Update summit.py to use json in place of data where appropriate.
Update slack bot appropriately.

Migrating from `data` to `json`

Please migrate from using data to json as soon as possible to avoid any issues or service disruption. Although json is not yet a defined method parameter it is fully supported as a kwarg, which means you should be able to seamlessly switch from data to json.

Subsequent refresh() attempts fail when using developer_token

A new bug was introduced in alpha9 (#150) to address issue #147. The new bug behavior occurs after performing the first refresh() with a developer_token. The refresh updates the self.access_token_ attribute which results in subsequent developer_token refresh attempts not matching the following if statement:

https://github.com/PaloAltoNetworks/pan-cortex-data-lake-python/blob/master/pan_cortex_data_lake/credentials.py#L480

Expected behavior

Refreshing with a developer_token should support more than a single refresh().

Current behavior

The refresh() method fails after the first refresh is performed with a developer_token.

Possible solution

Switch to evaluating whether self._credentials_found_in_instance is False instead of self.access_token_.

Steps to reproduce

Export a developer_token envar.
Import Credentials and assign to a variable.
Perform <variable>.refresh() twice.

Screenshots

Context

Your Environment

Version used: alpha10
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python 3.7.7
Operating System and version (desktop or mobile): Mac OSX
Link to your project:

Check access_token expiration when determining if a refresh is needed

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

Today, HTTPClient does not consider expiration as a criteria for auto_refresh. This can lead to unnecessary API requests. For example, if pancloud sends a request using an expired access_token the server will respond with an HTTP 401 error, which can result in the following:

An automatic token refresh if auto_refresh is True
An automatic retry of the failed request if auto_retry is True

In either case, an additional API request can be avoided if access_token expiration is assessed before a request is sent.

Proposed solutions

Use the JWT exp timestamp to determine if an access_token is expired.
Use the expires_in value (seconds) to calculate an expiration time (relative to local time) to determine if an access_token is expired.

#1 appears to be highly dependent on an accurate local time. For example, a local time that is ahead would result in prematurely flagging an access_token as expired.
#2 likely requires storing the calculated expiration timestamp in a shared location, i.e. credentials store.

Proposed changes to default storage adapter

Proposal to change default storage adapter

Introduction

The original idea behind storage adapters was to provide a way to adapt the underlying credentials storage layer in order to:

Read/write from/to a custom credentials store.
Maintain the same internal interface between the Credentials class and the storage adapter, i.e. maintain compatibility.

credentials.json

Since the early pancloud days, the default storage adapter class has been TinyDBStore, which currently reads/writes from/to the ~/.config/pan_cortex_data_lake/credentials.json file. The reasoning behind this was simple: provide a low-friction way for users to persist OAuth2 credentials during their exploration and proof-of-concept phases. With support for credentials profiles, the experience was intended to resemble that of other SDK/libraries, e.g. boto3 et al.

Beyond the exploratory/PoC phases, many developers are simply not interested in using the credentials.json file. Furthermore, there isn't an intuitive way to disable it, which can lead to some issues. For example, some serverless runtimes, e.g. AWS Lambda, implement read-only filesystems. Although the Credentials class and default TinyDBStore provide a way to specify a dbfile path via storage_params or the PAN_CREDENTIALS_DBFILE envar, this approach is neither intuitive or obvious (nor is it documented very well). It also doesn't "disable" the credentials.json file - it merely moves it to a different location (e.g. /tmp). If your production app/integration doesn't require the credentials.json file (and, arguably, it really shouldn't) then it seems counterproductive to not have an easy/obvious way to disable it.

Current Workaround(s)

Today, if one wanted to disable the use of the credentials.json file, you could:

Write and implement a custom storage adapter
Subclass Credentials, pass another storage class using self.storage and override the inherited class methods (using self.storage for invoking your read/write calls).

Proposals

Ship a second storage adapter class with the CDL Python SDK called MemoryStore. As the name implies, MemoryStore would read/write credentials from/to memory or in-memory. It just so happens that tinydb ships with a MemoryStorage class.
Make MemoryStore the new default storage adapter.
Use cortex.pan.dev to document this change and the full usage around Credentials and storage adapters, including examples and explanation of when developers should subclass vs write a new storage adapter.
Continue shipping only the TinyDBStore adapter but provide a way to activate "in-memory mode" using storage_params. For example:

from pan_cortex_data_lake import Credentials

c = Credentials(storage_params={"memory_storage": True})

The latter would be my preferred option as it introduces the fewest lines of code, by far.

Other ideas

Adopt the "credentials providers" approach used in the NodeJS and Java versions of the SDK. This would require the following:

Eliminate storage adapters altogether (see second bullet).
Simplify the Credentials class and API to become an interface only for accessing credentials. This means also moving the exiting OAuth2 helper methods, e.g. refresh(), fetch_credentials(), etc., to a future, separate provider module/class.

Reasons to avoid this path:

Shipping an SDK with no provider, i.e. requiring the provider to be installed and imported separately, introduces additional friction to the exploratory and PoC phases. The python community tends to appreciate a more "batteries included" approach.
Provider packages will need to be maintained and versioned separately, complete with testing. Although this might help simplify the CDL Python SDK code base, it will introduce additional maintenance overhead - actually, the maintenance/overhead would just get punted elsewhere.
The concept of "providers" could lead to an endless family of provider packages, contributed both internally and by the community. These packages would have to be maintained/updated/supported and documented, which could be difficult to keep up with.
The current approach already offers two, arguably cleaner, ways to customize behavior: 1) subclassing and overriding methods and 2) storage adapters. The patterns and 3rd-party storage adapter modules can still be shared throughout the community but users would be responsible for adding and maintaining these patterns/modules in their code base.
Shipping two storage adapters with the CDL Python SDK (and the code necessary to support them) occupies a relatively small footprint in the library. IMO, the overhead they introduce is small - small enough to suggest that maintaining them as separate packages might be unnecessary/overkill (and arguably more work/overhead).
If we adopt the storage_params approach to activating in-memory mode, then we can maintain backward compatibility and gain the functionality with only a few lines of code.

Similar/related issues

#111
#87

TinyDBStore fetch_credential() returns empty str instead of None

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

TinyDBStore fetch_credential() returns empty str by default, instead of None, when TinyDB search fails to find a match. This leads to the _apply_credentials() method incorrectly determining that an access_token exists, causing it to forgo a refresh() operation that would otherwise prevent a status 401.

Proposed solution

Refactor fetch_credential() method to return None instead of empty str.

Method for updating individual credentials

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

The current implementation of pancloud does not support an obvious, clean way to update individual credentials stored in the credentials store, e.g. credentials.json. For example, if a refresh_token or client_secret needs to be updated a developer would be forced to accomplish this one of three ways:

Manually edit/update the credentials.json file or credentials store.
Create an instance of Credentials class ensuring arguments for ALL credentials are passed in the constructor, followed by executing the write_credentials() method. Note ALL credential arguments are required due to the strict enforcement of location, i.e. instance vars, envars, store.
Directly override instance variable for credential requiring update/edit, e.g. client_id_ = 'foo', followed by a write_credentials(). This bypasses the _credentials_found_in_instance() check performed during class instantiation. Note that the refresh() method uses this approach for updating/caching access_token.

Although these three methods may be viable work arounds they are not very obvious/intuitive and leave much room for error. For example, option 1 could lead to corruption of the credentials.json file. Option 2 could lead to inadvertently nulling credentials that are left out of the constructor. Although option 3 is implemented by the refresh() method, it's not very intuitive.

Proposed solution

Create a setter for each property method, i.e. client_id, client_secret, refresh_token, access_token and jwt_exp, to provide a more intuitive path to updating instance variables. Afterwards, replace all instances in Credentials class where instance variables are overridden with setters.

Invalid comparison leads to "PENDING" state returning False

Describe the bug

In the iter_job_results() method, an invalid comparison statement leads to "PENDING" state returning False.

Expected behavior

Both "RUNNING" and "PENDING" states should evaluate to True.

Current behavior

Currently, if a "PENDING" state returned by the Query Service API would evaluate to False, potentially resulting in a CortexError exception being thrown.

Possible solution

As proposed by @mishas, refactoring the comparison to elif r.json()["state"] in ("RUNNING", "PENDING") will allow both "RUNNING" and "PENDING" to evaluate to True.

Steps to reproduce

n/a

Screenshots

n/a

Context

n/a

Your Environment

Version used: cortex-data-lake-python alpha7
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python 3.7.7
Operating System and version (desktop or mobile): desktop
Link to your project:

Incorrect clientType and clientVersion defined in create_query() method

Describe the bug

The create_query() method statically defines the wrong clientType and clientVersion values. Note that these values differ from the UserAgent values added to the HTTPClient headers.

Expected behavior

These values should represent both the actual name of the library/SDK as well as the current version.

Current behavior

See description.

Possible solution

Change the clientType value to "cortex-data-lake-python"
Change the clientVersion value to be the library __version__

Steps to reproduce

Screenshots

Context

Your Environment

Version used: alpha8
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3):
Operating System and version (desktop or mobile):
Link to your project:

In v2 of the SDK, the auto_refresh parameter of HTTPClient is not respected.

The documentation for the auto_refresh parameter of HTTPClient says:

auto_refresh (bool): Perform token refresh following HTTP 401 response from server. Defaults to True.

But there seems not to be any code to catch 401 and refresh the token.

Furthermore, in the request method, on line 223, there's credentials = kwargs.pop("credentials", None), where it should probably be credentials = kwargs.pop("credentials", self._credentials).

Expected behavior

When there's a 401, a new token should be generated, and a retry should happen.

Current behavior

I'm starting to get requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://api.us.cdl.paloaltonetworks.com:443/query/v2/jobs after ~1 hour of use.

Your Environment

Version used: 2.0.0a9
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3): python3.7
Operating System and version (desktop or mobile): Debian 10

Add logging to HTTPClient and Credentials

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

Although the underlying requests library is capable of low-level logging the pancloud HTTPClient and Credentials classes do not currently offer any additional logging to assist with troubleshooting and debugging.

Proposed solution

Add sensible logging to HTTPClient and Credentials classes for operations unique to pancloud, such as:

Applying credentials to header
Setting default headers
Auto-refresh and auto-retry
Token refresh
Credential resolution
Fetch tokens
Initializing storage adapter
Removing profiles
Revoke access/refresh token
Write credentials
Checking for expired access token
Decoding exp field from access token

Enhancement: poll_all() and iter_poll() improved loop control

Palo Alto Networks Cloud Python SDK version: 1.0.3

Description

Currently the poll_all() and iter_poll() methods loop through all the pages of results until the last page is found. There are two suggestions for improvements to allow better control of this loop:

Suggestion 1:

Add a timeout= keyword argument with a sensible default. This would act as a timeout for the whole operation which allows a developer to restrict how long a poll job can run. This can be done currently with iter_poll() where the developer manages their own timeout, but we can make it easier by adding the argument.

Suggestion 2:

Currently, the loop uses else to assume the query status is RUNNING. This could result in an infinite loop if the API changes or fails. To better future-proof the library, check for RUNNING explicitly and use else to raise an exception such as Unexpected query status: {}.

More flexible python requirements?

Is your feature request related to a problem?

Not exactly. I'm trying to fix the use of this package into a context with a lot of different packages, where the precise requirements of this package make it very difficult to reconcile everything.

Describe the solution you'd like

Ideally, requirements would only be constrained to the level of minor versions, for instance, and without locking down upstream dependencies.

Describe alternatives you've considered

I am aware that in many contexts, the workaround (or rather, default use case) is simply to install this package in a virtualenv. For our particular setup, this does not work. Of course an alternative solution would be to simply fork this package, but I'd rather ask you guys before doing that.

Add support for Event Service API flush endpoint

Palo Alto Networks Cloud Python SDK version: v1.4.0
Python version: 2.7, 3.5+
Operating System: any

Description

A recent update to the Event Service API added a new flush endpoint, which can be used to flush a channel. Flushing a channel effectively discards all existing events in the bus at the time the endpoint is called.

Next steps

Add support for the flush API endpoint to the EventService class following the endpoint specifications. Ensure the flush endpoint is included in any and all applicable tests.

Refresh token caching

Palo Alto Networks Cloud Python SDK version: 1.2.3
Python version: 2.7, 3.5+
Operating System: Any

Description

The current behavior of the Credentials class is to persist the refresh_token whenever the write_credentials() method is called.

If and when rolling refresh tokens is implemented, a sensible approach would be to update and cache the refresh_token, along with the access_token, whenever the refresh() method is called (current behavior only caches the access_token).

Proposed Changes

Add lines to update refresh_token in refresh() method try:except block, prior to write_credentials() being called.

self.refresh_token_ = r.json().get('refresh_token', '')

https://github.com/PaloAltoNetworks/pancloud/blob/master/pancloud/credentials.py#L316

Support for regions

Palo Alto Networks Cloud Python SDK version: v1.5.0
Python version: 2.7, 3.5+
Operating System: any

Description

There are two scenarios where the concept of region is relevant:

The region extracted from the base64 params provided by the Cortex Hub redirect which is used by the get_authorization_url() method, e.g. americas or europe.
The region specified in the API gateway URL, e.g. api.us.paloaltonetworks.com or api.eu.paloaltonetworks.com (currently defaults to us).

The first is relevant to the identity provider while performing authorization whereas the second determines what regional datacenter to direct API requests to (which should correspond to where the Cortex data lake tenant, et al. reside).

Currently, pancloud is missing a way to set or define a default region that could be used to determine which regional datacenter to direct API requests to. With the current API, you're forced to supply the full API gateway url to direct API requests to a region other than api.us.paloaltonetworks.com, which is the default.

Proposals

Add a default_region kwarg to the HTTPClient class. The value provided, e.g. us or eu, could be used to construct the default url used in all API requests made with that HTTPClient() object. Note that default_region would not be applicable to the Credentials class.
Add support for a PAN_DEFAULT_REGION environment variable. The behavior would be similar to proposal 1 except that the default_region constructor argument would take precedence over the envar.
Add support for a default_region or region to the credentials.json file or credentials store. This one feels a bit out of place, since the region would not necessarily be applicable to Credentials. Again, the region used by get_authorization_url() should normally be extracted from the base64 params passed by the Cortex Hub. It's worth noting that AWS boto3 credentials allow for specifying region. Note that the PAN_DEFAULT_REGION envar would take precedence over this value.

Other considerations

Another interesting approach would be to leverage a custom JWT claim to determine the regional URL, since the IdP would ostensibly have prior knowledge of the region used during authorization. The challenge with such an approach would be sharing the region across multiple instances of HTTPClient() since the value would, ostensibly, be extracted from the access_token in the fetch_tokens() or refresh() response (an operation performed within the scope of a single Credentials() object).

Another thing to note is that it is quite easy for an app to implement its own "region selector," since an application would also have knowledge of what region was used during authorization. This is the current recommended way to handle regional selection of url in the absence of default region support.

Unable to import version from other package modules

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

Package modules, such as httpclient.py, are unable to import __version__ from __init__.py due to package import statements being placed before the __version__ variable is declared.

For example, we want to import the __version__ variable in httpclient.py but the python runtime attempts to import HTTPClient (which references __version__) before the __version__ variable has been declared, which throws an exception.

Proposed solution

Move declaration of __version__ variable above package import statements.

Integrated OAuth2 Support

Palo Alto Networks Cloud Python SDK version: 1.0.2
Python version: 2.7.x, 3.5.x, 3.6.x
Operating System: Any

Description

Add OAuth2 support for performing token refresh and revocation
Add support to HTTPClient for auto-refresh
Add support for storing credentials in ~/.config/pancloud and as environment variables

Implementation Plan

Add credentials module that supports construction of Credentials object
Add property methods for retrieving access_token, client_id, client_secret, and refresh_token.
Add the following methods: _fetch_credential, get_credentials, remove_profile, refresh, write_credentials.
Use TinyDB to upsert and store credentials in JSON format in ~/.config/pancloud directory.

[Community Health Assessment] Changes needed

Health Check	Pass	Score	More Info
Contains a meaningful README.md file	✅	20 / 20	More info
SUPPORT.md file exists	✅	20 / 20	More info
Repo has a description	✅	15 / 15	More info
Has a recognized open source license	✅	15 / 15	More info
Has a descriptive repo name	✅	15 / 15	More info
Required topics attached to repo	✅	15 / 15	More info
CONTRIBUTING.md file with contribution guidelines	✅	5 / 5	More info
Has custom issue and pull request templates	❌	0 / 5	More info

Current score: 105
Target threshold: 100
Total possible: 110

Add MemoryStorage support to TinyDBStore adapter

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

TinyDB comes with two storage types: JSON and in-memory. The current default used by pancloud is JSON. There are some use-cases that could benefit from in-memory mode, such as the use-case where only access_token needs to be stored/cached in the pancloud credentials store. However, the current TinyDBStore class implementation does not provide a way to pass the storage param, i.e. storage=MemoryStorage.

Proposed solution

Add support for storage param to TinyDBStore using storage_params to pass the argument.

Failed refresh() or fetch_tokens() results in null credentials

Palo Alto Networks Cloud Python SDK version: v1.2.3
Python version: 2.7, 3.5+
Operating System: any

Description

A refresh() or fetch_token() attempt that returns an error (instead of tokens) effectively nulls out the access_token and/or refresh_token.

Proposed solution

Refactor code so that the updates occur only following a successful response from token endpoint.

Incorrect docstring return type in remove_profile() method

Palo Alto Networks Cloud Python SDK version: v1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

The Credentials() and TinyDBStore() remove_profile() methods doctrings specify a return type of int, which is incorrect. The tinydb remove() method returns a type list of all affected IDs.

Proposed solution

Update remove_profile() docstring to specify a return type of list and edit description accordingly.

Cleanup unnecessary use of json() property

Describe the bug

The requests json() property is used (rather irresponsibly) throughout the library.

Expected behavior

The json() property should only be called when necessary.

Current behavior

See description.

Possible solution

Reduce the number of times json() is called. One simple approach is to assign the output to a variable that can be reused within its scope.

Steps to reproduce

n/a

Screenshots

Context

Your Environment

Version used: alpha8
Environment name and version (e.g. Chrome 59, node.js 5.4, python 3.7.3):
Operating System and version (desktop or mobile):
Link to your project:

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Refactor Credentials to use HTTPClient instead of requests

Palo Alto Networks Cloud Python SDK version: 1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

Currently, the Credentials class implements its own, separate requests.Session for making HTTP requests. Although this is functionally ok, it's counter to the goal of maintaining HTTP persistence and other performance tuning options, in a unified manner.

Proposed solution

Refactor Credentials class and applicable methods to use HTTPClient in place of requests.Session. This solution should also support passing a shared HTTPClient object as a session parameter in addition to the override feature currently supported by the other service modules.

Update docs

Palo Alto Networks Cloud Python SDK version: v1.4.0
Python version: 2.7, 3.5+
Operating System: any

Description

Update all API documentation, README and guides.

Generate new state each time get_authorization_url() is called

Palo Alto Networks Cloud Python SDK version: v.1.3.0
Python version: 2.7, 3.5+
Operating System: any

Description

When instantiating a Credentials class the current behavior is to generate and store a state that can be reused throughout the lifetime of the Credentials() object. The limitation of this approach is that the state doesn't change, even after running the get_authorization_url() method, which increases the risk of the state getting misused by a bad actor looking to gain access to protected API resources.

Proposed solution

Refactor the get_authorization_url() method to roll the state uuid, each time it is called, and update the self.state instance variable with the current value. This will ensure that a fresh/unique state is used each time authorization is attempted and that the current value is always accessible as a class attribute.