aws-samples / amazon-redshift-udfs Goto Github PK

A collection of example UDFs for Amazon Redshift.

License: Other

PLpgSQL 87.36% Shell 11.06% JavaScript 1.02% Java 0.56%

amazon-redshift-udfs's Issues

Redshift can't open shared object file

I'm trying to install the h3 library in Redshift using Pip and the utility install script installPipModuleAsRedshiftLibrary.sh.

I am building h3 on Ubuntu 16.04. I am able to successfully upload the h3.zip file to my S3 bucket.

However, when I try to run the UDF, I get this error

OSError: /rdsdbdata/user_lib/0/0/288461.zip/h3/out/libh3.so.1: cannot open shared object file: Not a directory

Full stack trace from svl_udf_log:

File "h3.py", line 39, in <module>
File "__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
File "__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)```

The UDF I'm trying to use:

```CREATE OR REPLACE function get_hexbin(coord_lat double precision, coord_lon double precision, res INTEGER)
  RETURNS VARCHAR IMMUTABLE
AS $$
  from h3 import h3
  return 'hello world'
$$ LANGUAGE plpythonu;

While installing h3 locally, I see this message about skipping wheel but then it uploads a zip file to S3 anyway:

Collecting h3
Saved /home/bhavika/Desktop/.h3/h3-3.1.0-cp36-cp36m-linux_x86_64.whl
Skipping h3, due to already being wheel.

[Enhancement] Support Python Lambda UDFs w/ dependent libraries

Python Lambda UDFs with dependent libraries should be listed in the requirements.txt file and be packaged upon deployment.

[Enhancement] f_nvp_td

Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/GqMytEclfnw7bi9yJoorMQ

Script fails to find wheel files or zip results.

When the script is executed on an EC2 install it fails to define "wheelFile" and execute the find operation.

wheelFile=find . -name *.whl

enclosing the wildcard in quotes resolves the issue in this environment:

wheelFile=find . -name "*.whl"

Additionally, the script does not zip the files before transporting them with an S3 copy command.

I am proposing a pull request to resolve these.

[Feature Request] String comparison Python UDF example

Please add an example of advanced string comparison and matching for the UDF Python function. More details here.

[Enhancement] f_translate_chk_td

Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/Uo19CNoctPqFjsuuTam9ag

[Enhancement] Example Java UDF

Would like an example and an easy way to deploy a Java UDF.

Broken link to index.js for lambda-udfs/f_upper_nodejs

Hello,

Seems like you have a broken link to nodejs upper functionality. I was trying to figure out how to write a udf in GO and trying to find reference in NodeJS as I am more familiar on it. There seems to be couple of things missing/incomplete

The comments are referring to Python in this directory. See line 3 on file function.sql

This sample function demonstrates how to create/use lambda UDFs in python

index.js file is missing.
I believe the docs was referring that package.json is needed but I don't see it

So can someone add the index.js on there? I would be happy to send a PR too for this

[Enhancement] Example for C++ Lambda UDF

Would like a sample and easy way to deploy a C++ UDF

[Enhancement] f_char2hexint_td

Simulate TD behaviour:
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/jGrGYsKV5XBVbrvEyG9d3Q

[Enhancement] f_csvld_td

Simulate TD behaviour
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/eUs36TFjg6hhnCqHku4~6Q

Questions from AWS customer

Hello Zach,

I have a customer that tried to use your script to install a Pip module and has some questions around the github script [https://github.com/aws-samples/amazon-redshift-udfs/blob/master/bin/PipLibraryInstaller/installPipModuleAsRedshiftLibrary.sh] developed by you.

Customer questions:

Q] I do have a lingering question, however, around why the installPipModuleAsRedshiftLibrary.sh script didn't work? In the future, it might be nice to use this, but it didn't seem like Redshift liked me uploading the wheel files it generated... Can you let me know if the Redshift engineers have a fix or workaround for this? Or let me know if I've used the script incorrectly?

Let me know.

thanks,
Varsha

[Enhancement] f_octet_length_td

Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/~GoBlolWbv6b95S94u1I4g

[Enhancement] f_rank_td

Simulate rank behavior in TD
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/8Ex9CS5XErnUTmh7zcrOPg

Support non-Commercial partitions

Some code in the repository explicitly references the "aws" partition - this limits functionality for non-commercial partitions like GovCloud.

amazon-redshift-udfs/stored-procedures/sp_cdc_DynamoDB_to_Redshift/sp_cdc_DynamoDB_to_Redshift.sql

Line 14 in b37a90a

role_string = 'arn:aws:iam::'+$2+':role/'+$1;

[Enhancement] f_transalate_td

Simulate TD behaviour
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/~C2wem_pHhqmuy2J6oY4Yg

Unify this repo with Redshift Utils

Hi,

I've previously been working out some extensions to Redshift Utils for UDF's, and it would be great to unify these two repositories. The first item is an enciphering library that would complement your encryption UDF's quite nicely, and the second is a utility which will install any Pip module into a cluster as a Library. Is that something you'd be interested in getting setup?

Thx,

Ian

ua-parser not working: No module named _regexes. Please look at svl_udf_log for more information

I have created ua-parser.zip, following instructions that have been provided here.

When I extract to check, I see that it has "regexes.py" as well as "user_agent_parser.py"

After creating library in redshift, and trying to do simple user agent parsing, it fails with the following error:
"ImportError: No module named _regexes. Please look at svl_udf_log for more information".

The same code from the function I have tested in terminal using Python and it works.

So there must be some problem how Redshift handles this,

[Enhancement] f_oreplace_td

https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/4EDh6YkzUpaKM7mjAXO1PA

import fails for murmur2 package

Hello,

I am trying to import murmur2 package as a library in Redshift database. I did following steps

Run the module packer
$ ./installPipModuleAsRedshiftLibrary.sh -m murmur2 -s s3://path/to/murmur2/lib
Create library on redshift

CREATE OR REPLACE LIBRARY murmur2
LANGUAGE plpythonu
from 's3://path/to/murmur2/lib/murmur2.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AAAAAAAAAAAAAAAAAAAA;aws_secret_access_key=SSSSSSSSSSSSSSSSS'
region 'us-east-1';

Create function and query

create OR REPLACE function f_py_kafka_partitioner (s varchar, ps int)
  returns int
stable
as $$
  import murmur2
  m2 = murmur2.murmur64a(s, len(s), 0x9747b28c)
  return m2 % ps

$$ language plpythonu;

SELECT f_py_kafka_partitioner('jiimit', 100);

This gives following error :

[Amazon](500310) Invalid operation: ImportError: No module named murmur2. Please look at svl_udf_log for more information
Details: 
 -----------------------------------------------
  error:  ImportError: No module named murmur2. Please look at svl_udf_log for more information
  code:      10000
  context:   UDF
  query:     0
  location:  udf_client.cpp:366
  process:   padbmaster [pid=31381]
  -----------------------------------------------;

Request: example multi-file UDF

This and other sample sets have been really nice for understanding a one-file UDF. However, some projects are more complex. Please add a simple example of a multi-file UDF.

This request is inspired by a situation where it would be great to use the UAP User-Agent Parser project inside a Redshift UDF. This UDF would require code, a configuration file, and the UAP Python library. Walking through all of the options to make this work is confusing, as I'm not a Python run-time ninja.

RedShift function to import S3 file

Can AWS RedShift resource be used to access S3 objects?

I'd like to write a function that will import any given drop file from s3 directly into the cluster the UDF is running on.

Something like

CREATE FUNCTION import_s3_object(full_path_to_s3 VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//COPY some_table FROM full_path_to_s3 
$$ LANGUAGE plpythonu
;

In doing so, I also would like the RedShift instance to not have to pass the credentials, but instead identify the role to be used that has the attached policies to access the file from RedShift.

To be explicit, I'd like my function to all together not have to do

COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...;token=...'     delimiter '\t' gzip NULL AS '\000' TRUNCATECOLUMNS;"

I'd like the function run the query as such:

COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
AS '\000' TRUNCATECOLUMNS;"

So, having verbalized my question, my requirement is a function to

CREATE FUNCTION import_s3_object(S3_FULL_PATH VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//EXCUTE: COPY public.some_table FROM 'S3_FULL_PATH'
//iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
//AS '\000' TRUNCATECOLUMNS;"
$$ LANGUAGE plpythonu
;

aws-samples / amazon-redshift-udfs Goto Github PK

amazon-redshift-udfs's Issues

Recommend Projects

Recommend Topics

Recommend Org