aws-samples / amazon-redshift-udfs Goto Github PK
View Code? Open in Web Editor NEWA collection of example UDFs for Amazon Redshift.
License: Other
A collection of example UDFs for Amazon Redshift.
License: Other
I'm trying to install the h3 library in Redshift using Pip and the utility install script installPipModuleAsRedshiftLibrary.sh
.
I am building h3
on Ubuntu 16.04. I am able to successfully upload the h3.zip file to my S3 bucket.
However, when I try to run the UDF, I get this error
OSError: /rdsdbdata/user_lib/0/0/288461.zip/h3/out/libh3.so.1: cannot open shared object file: Not a directory
Full stack trace from svl_udf_log
:
File "h3.py", line 39, in <module>
File "__init__.py", line 443, in LoadLibrary
return self._dlltype(name)
File "__init__.py", line 365, in __init__
self._handle = _dlopen(self._name, mode)```
The UDF I'm trying to use:
```CREATE OR REPLACE function get_hexbin(coord_lat double precision, coord_lon double precision, res INTEGER)
RETURNS VARCHAR IMMUTABLE
AS $$
from h3 import h3
return 'hello world'
$$ LANGUAGE plpythonu;
While installing h3
locally, I see this message about skipping wheel but then it uploads a zip file to S3 anyway:
Collecting h3
Saved /home/bhavika/Desktop/.h3/h3-3.1.0-cp36-cp36m-linux_x86_64.whl
Skipping h3, due to already being wheel.
Python Lambda UDFs with dependent libraries should be listed in the requirements.txt file and be packaged upon deployment.
Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/GqMytEclfnw7bi9yJoorMQ
When the script is executed on an EC2 install it fails to define "wheelFile" and execute the find operation.
wheelFile=find . -name *.whl
enclosing the wildcard in quotes resolves the issue in this environment:
wheelFile=find . -name "*.whl"
Additionally, the script does not zip the files before transporting them with an S3 copy command.
I am proposing a pull request to resolve these.
Please add an example of advanced string comparison and matching for the UDF Python function. More details here.
Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/Uo19CNoctPqFjsuuTam9ag
Would like an example and an easy way to deploy a Java UDF.
Hello,
Seems like you have a broken link to nodejs upper functionality. I was trying to figure out how to write a udf in GO and trying to find reference in NodeJS as I am more familiar on it. There seems to be couple of things missing/incomplete
This sample function demonstrates how to create/use lambda UDFs in python
So can someone add the index.js on there? I would be happy to send a PR too for this
Would like a sample and easy way to deploy a C++ UDF
Simulate TD behaviour:
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/jGrGYsKV5XBVbrvEyG9d3Q
Simulate TD behaviour
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/eUs36TFjg6hhnCqHku4~6Q
Hello Zach,
I have a customer that tried to use your script to install a Pip module and has some questions around the github script [https://github.com/aws-samples/amazon-redshift-udfs/blob/master/bin/PipLibraryInstaller/installPipModuleAsRedshiftLibrary.sh] developed by you.
Customer questions:
Q] I do have a lingering question, however, around why the installPipModuleAsRedshiftLibrary.sh script didn't work? In the future, it might be nice to use this, but it didn't seem like Redshift liked me uploading the wheel files it generated... Can you let me know if the Redshift engineers have a fix or workaround for this? Or let me know if I've used the script incorrectly?
Let me know.
thanks,
Varsha
Simulate TD behavior
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/~GoBlolWbv6b95S94u1I4g
Simulate rank behavior in TD
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/8Ex9CS5XErnUTmh7zcrOPg
Some code in the repository explicitly references the "aws" partition - this limits functionality for non-commercial partitions like GovCloud.
Simulate TD behaviour
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/~C2wem_pHhqmuy2J6oY4Yg
Hi,
I've previously been working out some extensions to Redshift Utils for UDF's, and it would be great to unify these two repositories. The first item is an enciphering library that would complement your encryption UDF's quite nicely, and the second is a utility which will install any Pip module into a cluster as a Library. Is that something you'd be interested in getting setup?
Thx,
Ian
I have created ua-parser.zip, following instructions that have been provided here.
When I extract to check, I see that it has "regexes.py" as well as "user_agent_parser.py"
After creating library in redshift, and trying to do simple user agent parsing, it fails with the following error:
"ImportError: No module named _regexes. Please look at svl_udf_log for more information".
The same code from the function I have tested in terminal using Python and it works.
So there must be some problem how Redshift handles this,
Hello,
I am trying to import murmur2 package as a library in Redshift database. I did following steps
Run the module packer
$ ./installPipModuleAsRedshiftLibrary.sh -m murmur2 -s s3://path/to/murmur2/lib
Create library on redshift
CREATE OR REPLACE LIBRARY murmur2
LANGUAGE plpythonu
from 's3://path/to/murmur2/lib/murmur2.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AAAAAAAAAAAAAAAAAAAA;aws_secret_access_key=SSSSSSSSSSSSSSSSS'
region 'us-east-1';
create OR REPLACE function f_py_kafka_partitioner (s varchar, ps int)
returns int
stable
as $$
import murmur2
m2 = murmur2.murmur64a(s, len(s), 0x9747b28c)
return m2 % ps
$$ language plpythonu;
SELECT f_py_kafka_partitioner('jiimit', 100);
This gives following error :
[Amazon](500310) Invalid operation: ImportError: No module named murmur2. Please look at svl_udf_log for more information
Details:
-----------------------------------------------
error: ImportError: No module named murmur2. Please look at svl_udf_log for more information
code: 10000
context: UDF
query: 0
location: udf_client.cpp:366
process: padbmaster [pid=31381]
-----------------------------------------------;
This and other sample sets have been really nice for understanding a one-file UDF. However, some projects are more complex. Please add a simple example of a multi-file UDF.
This request is inspired by a situation where it would be great to use the UAP User-Agent Parser project inside a Redshift UDF. This UDF would require code, a configuration file, and the UAP Python library. Walking through all of the options to make this work is confusing, as I'm not a Python run-time ninja.
Can AWS RedShift resource be used to access S3 objects?
I'd like to write a function that will import any given drop file from s3 directly into the cluster the UDF is running on.
Something like
CREATE FUNCTION import_s3_object(full_path_to_s3 VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//COPY some_table FROM full_path_to_s3
$$ LANGUAGE plpythonu
;
In doing so, I also would like the RedShift instance to not have to pass the credentials, but instead identify the role to be used that has the attached policies to access the file from RedShift.
To be explicit, I'd like my function to all together not have to do
COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...;token=...' delimiter '\t' gzip NULL AS '\000' TRUNCATECOLUMNS;"
I'd like the function run the query as such:
COPY public.some_table FROM 's3://some_bucket/drop_file.gz'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
AS '\000' TRUNCATECOLUMNS;"
So, having verbalized my question, my requirement is a function to
CREATE FUNCTION import_s3_object(S3_FULL_PATH VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
//EXCUTE: COPY public.some_table FROM 'S3_FULL_PATH'
//iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
//AS '\000' TRUNCATECOLUMNS;"
$$ LANGUAGE plpythonu
;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.