Code Monkey home page Code Monkey logo

amazon-redshift-udfs's People

Contributors

bashsz avatar bgmello avatar dependabot[bot] avatar eeesk avatar ericfe avatar hyandell avatar ianmeyers avatar inohiro avatar joeharris76 avatar mmehrten avatar rajiv914 avatar rjvgupta avatar runningjon avatar saeed2402 avatar sdia avatar zach-data avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-redshift-udfs's Issues

Redshift can't open shared object file

I'm trying to install the h3 library in Redshift using Pip and the utility install script installPipModuleAsRedshiftLibrary.sh.

I am building h3 on Ubuntu 16.04. I am able to successfully upload the h3.zip file to my S3 bucket.

However, when I try to run the UDF, I get this error

OSError: /rdsdbdata/user_lib/0/0/288461.zip/h3/out/libh3.so.1: cannot open shared object file: Not a directory

Full stack trace from svl_udf_log:

File "h3.py", line 39, in <module>
File "__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
File "__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)```

The UDF I'm trying to use:

```CREATE OR REPLACE function get_hexbin(coord_lat double precision, coord_lon double precision, res INTEGER)
  RETURNS VARCHAR IMMUTABLE
AS $$
  from h3 import h3
  return 'hello world'
$$ LANGUAGE plpythonu; 

While installing h3 locally, I see this message about skipping wheel but then it uploads a zip file to S3 anyway:

Collecting h3
Saved /home/bhavika/Desktop/.h3/h3-3.1.0-cp36-cp36m-linux_x86_64.whl
Skipping h3, due to already being wheel.

Broken link to index.js for lambda-udfs/f_upper_nodejs

Hello,

Seems like you have a broken link to nodejs upper functionality. I was trying to figure out how to write a udf in GO and trying to find reference in NodeJS as I am more familiar on it. There seems to be couple of things missing/incomplete

  1. The comments are referring to Python in this directory. See line 3 on file function.sql

This sample function demonstrates how to create/use lambda UDFs in python

  1. index.js file is missing.
  2. I believe the docs was referring that package.json is needed but I don't see it

So can someone add the index.js on there? I would be happy to send a PR too for this

Unify this repo with Redshift Utils

Hi,

I've previously been working out some extensions to Redshift Utils for UDF's, and it would be great to unify these two repositories. The first item is an enciphering library that would complement your encryption UDF's quite nicely, and the second is a utility which will install any Pip module into a cluster as a Library. Is that something you'd be interested in getting setup?

Thx,

Ian

RedShift function to import S3 file

Can AWS RedShift resource be used to access S3 objects?

I'd like to write a function that will import any given drop file from s3 directly into the cluster the UDF is running on.

Something like

CREATE FUNCTION import_s3_object(full_path_to_s3 VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//COPY some_table FROM full_path_to_s3 
$$ LANGUAGE plpythonu
;

In doing so, I also would like the RedShift instance to not have to pass the credentials, but instead identify the role to be used that has the attached policies to access the file from RedShift.

To be explicit, I'd like my function to all together not have to do

COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...;token=...'     delimiter '\t' gzip NULL AS '\000' TRUNCATECOLUMNS;"

I'd like the function run the query as such:

COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
AS '\000' TRUNCATECOLUMNS;"

So, having verbalized my question, my requirement is a function to

CREATE FUNCTION import_s3_object(S3_FULL_PATH VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//EXCUTE: COPY public.some_table FROM 'S3_FULL_PATH'
//iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
//AS '\000' TRUNCATECOLUMNS;"
$$ LANGUAGE plpythonu
;

Script fails to find wheel files or zip results.

When the script is executed on an EC2 install it fails to define "wheelFile" and execute the find operation.

wheelFile=find . -name *.whl

enclosing the wildcard in quotes resolves the issue in this environment:

wheelFile=find . -name "*.whl"

Additionally, the script does not zip the files before transporting them with an S3 copy command.

I am proposing a pull request to resolve these.

ua-parser not working: No module named _regexes. Please look at svl_udf_log for more information

I have created ua-parser.zip, following instructions that have been provided here.

When I extract to check, I see that it has "regexes.py" as well as "user_agent_parser.py"

After creating library in redshift, and trying to do simple user agent parsing, it fails with the following error:
"ImportError: No module named _regexes. Please look at svl_udf_log for more information".

The same code from the function I have tested in terminal using Python and it works.

So there must be some problem how Redshift handles this,

Request: example multi-file UDF

This and other sample sets have been really nice for understanding a one-file UDF. However, some projects are more complex. Please add a simple example of a multi-file UDF.

This request is inspired by a situation where it would be great to use the UAP User-Agent Parser project inside a Redshift UDF. This UDF would require code, a configuration file, and the UAP Python library. Walking through all of the options to make this work is confusing, as I'm not a Python run-time ninja.

import fails for murmur2 package

Hello,

I am trying to import murmur2 package as a library in Redshift database. I did following steps

  1. Run the module packer
    $ ./installPipModuleAsRedshiftLibrary.sh -m murmur2 -s s3://path/to/murmur2/lib

  2. Create library on redshift

CREATE OR REPLACE LIBRARY murmur2
LANGUAGE plpythonu
from 's3://path/to/murmur2/lib/murmur2.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AAAAAAAAAAAAAAAAAAAA;aws_secret_access_key=SSSSSSSSSSSSSSSSS'
region 'us-east-1';

  1. Create function and query
create OR REPLACE function f_py_kafka_partitioner (s varchar, ps int)
  returns int
stable
as $$
  import murmur2
  m2 = murmur2.murmur64a(s, len(s), 0x9747b28c)
  return m2 % ps

$$ language plpythonu;

SELECT f_py_kafka_partitioner('jiimit', 100);

This gives following error :

[Amazon](500310) Invalid operation: ImportError: No module named murmur2. Please look at svl_udf_log for more information
Details: 
 -----------------------------------------------
  error:  ImportError: No module named murmur2. Please look at svl_udf_log for more information
  code:      10000
  context:   UDF
  query:     0
  location:  udf_client.cpp:366
  process:   padbmaster [pid=31381]
  -----------------------------------------------;

Questions from AWS customer

Hello Zach,

I have a customer that tried to use your script to install a Pip module and has some questions around the github script [https://github.com/aws-samples/amazon-redshift-udfs/blob/master/bin/PipLibraryInstaller/installPipModuleAsRedshiftLibrary.sh] developed by you.

Customer questions:

Q] I do have a lingering question, however, around why the installPipModuleAsRedshiftLibrary.sh script didn't work? In the future, it might be nice to use this, but it didn't seem like Redshift liked me uploading the wheel files it generated... Can you let me know if the Redshift engineers have a fix or workaround for this? Or let me know if I've used the script incorrectly?

Let me know.

thanks,
Varsha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.