Code Monkey home page Code Monkey logo

sroka's Introduction

sroka package

Package providing simple Python access to data in:

  • Google Analytics
  • Google AdManager (GAM earlier DoubleClick for Publishers, DFP)
  • Google sheets
  • Google BigQuery
  • MOAT
  • Qubole
  • Rubicon
  • AWS Athena
  • AWS s3
  • MySQL
  • neo4j

Sroka library was checked to work for Python >=3.8, <=3.11.

Developers

Install requirements and enable custom githooks:

pip install -r requirements.txt
git config --local core.hooksPath .githooks/

Check style with flake8:

flake8 .

Please target Pull Requests against dev branch.

Installation

Pypi last release

pip install sroka

GitHub version (beta version)

pip install git+ssh://[email protected]/Wikia/sroka

Configuration

in home folder create ~/.sroka_config (hidden folder) file where you will store:

  • config.ini file based on config.sample.ini with information to access Qubole, MOAT, Athena, S3 and Rubicon
  • client_secrets.json for GA access
  • ad_manager.json for GAM access
  • credentials.json for Google sheets access
  • bigquery_credentials.json for BigQuery access

Alternatively, you may set localization of your files during analysis:

from sroka.config.config import setup_env_variables
from sroka.config.config import setup_client_secret
from sroka.config.config import setup_admanager_config
from sroka.config.config import setup_bigquery_config
from sroka.config.config import setup_google_sheets_credentials
setup_env_variables('/file_path/config.ini')
setup_client_secret('/file_path/client_secrets.json')
setup_admanager_config('/file_path/ad_manager.json')
setup_bigquery_config('/file_path/bigquery_credentials.json')
setup_google_sheets_credentials('/file_path/credentials.json')

Getting GA, GAM, BigQuery and Google docs jsons with secrets

Google Analytics

  1. Use this wizard  to create or select a project in the Google Developers Console and automatically turn on the API. Click Continue, then Go to credentials.
  2. On the Add credentials to your project page, click the Cancel button.
  3. At the top of the page, select the OAuth consent screen tab. Select an Email address, enter a Product name if not already set, and click the Save button.
  4. Select the Credentials tab, click the Create credentials button and select OAuth client ID.
  5. Select the application type Other, enter the chosen name, and click the Create button.
  6. Click OK to dismiss the resulting dialog.
  7. Click the file_download (Download JSON) button to the right of the client ID.

GAM

  1. Follow these instructions
    • while adding a service account note that the role needs to have necessary viewing and reporting permissions.

You should end up with .json (!) file with credentials

  1. Make sure the Name in "OAuth 2.0 client IDs" matches the service account in "Service account keys": here
  2. Create GAM account as service account not a new user: https://support.google.com/admanager/answer/6078734?hl=en
  3. Once you have a service account, it can be used to access data in different networks. Simply add it as a new service account through GAM UI of the second network.
  4. Additional information can be specified in config.ini file:
  • network code - a default value that can be overwritten in a function call
  • application name - custom name of your network, if not specified, a generic value will be passed.

Google drive sheets credentials

In order to authorize in Google Sheets you need to generate credentials in Google Console:

You should end up with credentials.json file that should be downloaded to ~/.sroka_config folder.

Google BigQuery credentials

Go to link and follow up instructions within Setting up authentication section. You should end up with bigquery_credentials.json json file that should be downloaded to ~/.sroka_config folder.

Getting credentials & access tokens

Qubole

  1. Find your Qubole API Token (go to user -> My Profile -> my_account -> API Token -> show)
  2. Copy your Qubole API Token to config.ini file

Athena and s3 credentials

  1. You should have your aws_access_key_id and aws_secret_access_key from registration process in AWS console.
  2. s3bucket_name can be found in AWS console in Athena view when you click Settings, there you have Query result location. The name of location without s3:// and / is what you need.
  3. For Athena usage you need to set also region (AWS regional endpoint), e.g. 'us-east-1'

Rubicon credentials

  1. You should have your id, username and password from Rubicon
  2. Copy values to config.ini file in relevant fields

MySQL connection information

  1. In order to connect to a remote MySQL server, you need to provide the host and port values in the configuration. If it is accessible through a unix socket, you need to provide the path to this socket instead in the unix_socket configuration field.
  2. If the MySQL server is protected by user credentials, you need to provide the user and password values in the configuration.
  3. You can optionally specify the database to which you want to connect in the database configuration field.

Common issues

macOS

If you see an error like ValueError: unknown locale: UTF-8

Please add to ~/.bash_profile lines like this:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

installing sroka

  1. If PyYAML package is not building correctly, it may be caused by the fact that newer versions of pip won’t uninstall the package because it’s handled by disutils. Please install PyYAML package first with --ignore-installed flag.

  2. If numpy gets messed up during sroka installation it is probably caused by multiple versions installed. Please uninstall all using pip uninstall and then reinstall latest one.

Google APIs cached files

If you encounter RefreshError similar to google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n "error": "invalid_grant",\n "error_description": "Bad Request"\n}') , try removing all files from ~/.cache directory.

Credits

All people that contributed to sroka development before going opensource (including CR and QA):

sroka's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sroka's Issues

developing intake-sroka

This is a very convenient interface to several data provider APIs that have been requested in terms of the Intake and Dask projects.

I have just created intake-sroka, so that specific queries to the APIs can be saved as data sources, and stored in Intake's cataloging system. You are very welcome to comment and participate, to bring such data to wider attention!

I wonder, have you thought about how to access data in a parallel or distributed way? Many query outputs might be partitionable, and Dask dataframe makes turning a set of data-frame partitions into a logical dataframe for parallel, out-of-core and/or distributed processing easy. We already do this, for example when reading from parquet or SQL servers.

Implement MySQL connector

Part of #20

Implementing a connector for MySQL databases seems like a great idea, since it is one of the most widely used relational databases.

Implementation Proposition

Using the mysql.connector package is a safe bet, since it is the official package supported by MySQL.

Exception shows up without proper string formating and wrong permissions (?)

When the exception Configuration file {config} is not protected, make sure you're the only one allowed to read it ... is raised by line 43 in config/config.py, one instance of config is not replaced.
This is because the strings are concatenated after formating, rather than before. Adding a parenthesis should help, while preserving style guidelines.

Also: Shouldn't it be chmod 700? Using chmod 600 blocks Python from entering this directory and because of that I get PermissionError. This is supported by a Stack Overflow answer.

Disclaimer: I changed the config directory to a local one in config/config.py, so this might be less of an issue when using the standard one.

Feature: Support for Adobe Analytics

Whilst I appreciate not many could use Adobe Analytics due to the enterprise level and the high costs, the platform is still there and it's a pain to work with their graph interface or data exporter that doesn't support anything other than Windows.

Having a Python library could solve the issue for the remaining users.

RDBMS

I want to know if in feature release if RDBMS like Oracle, MSSQL, MySQL, Postgresql will be included.
Athena & S3 is definitely big win in current release.

thank you for sharing with community.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.