expediagroup / apiary-metastore-docker Goto Github PK
View Code? Open in Web Editor NEWDocker image for Apiary Data Lake metastore
Home Page: https://github.com/ExpediaGroup/apiary
License: Apache License 2.0
Docker image for Apiary Data Lake metastore
Home Page: https://github.com/ExpediaGroup/apiary
License: Apache License 2.0
Describe the bug
hms-readwrite container restart succeeding even if the calls to aws fails like to get aws account_id, which cause issues like removing aws account_id from s3 bucket name apiary--<aws_account_id>-<aws_region>-<schema_name>.
below shows a blank for aws_account_id
desc database sandbox;
sandbox s3://apiary-<prefix>--us-east-1-sandbox/ root USER
To Reproduce
Can restart the hms-readwrite container with aws calls failing
Expected behavior
hms-readwrite container startup should fail if any aws call fails.
Logs
@rpoluri has more details
As Sysops,
I wan't to configure metastore to connect to RDS using IAM Credentials,
so that we can remove dependency on Vault
so that we can source jar files from maven.org
Is your feature request related to a problem? Please describe.
There is a bug where Iceberg clients aborts the session and then keeps orphan locks in the Hive DB.
Related to apache/iceberg#2301
Describe the solution you'd like
Enable Hive Housekeeper service that could cleanup these locks.
Additional context
example error message:
org.apache.iceberg.hive.HiveTableOperations$WaitingForLockException: Waiting for lock.
org.apache.spark.SparkException: Writing job aborted
Caused by: org.apache.iceberg.exceptions.CommitFailedException: Timed out after 181560 ms waiting for lock
https://github.com/ExpediaGroup/apiary-extensions 3.0.0 has been released. This is a non-backwards compatible change so will need some changes made here in order to use this (and future) versions.
apiary metastore is failing with following message after switching to fargate from EC2 instances.
message:com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain
in hive 2.3.2 sometimes alter partitions fail with
"java.lang.NumberFormatException: null"
https://issues.apache.org/jira/browse/HIVE-18767
As a SysOps
I want to enable Hive Metastore metrics
So that process health can be monitored
Acceptance Criteria:
When dbpass
contains special character like )
, startup.sh
script fails with following error:
bash: -c: line 0: `/usr/lib/hive/bin/hive --service metastore --hiveconf hive.root.logger=INFO,console --hiveconf javax.jdo.option.ConnectionURL=jdbc:mysql://apiary-cluster.cluster-XXXXXXX.us-west-2.rds.amazonaws.com:3306/apiarydb --hiveconf javax.jdo.option.ConnectionUserName=hive_user --hiveconf javax.jdo.option.ConnectionPassword=XXX;YYY)ZZZ='
Apply fix for https://issues.apache.org/jira/browse/HIVE-16939
As sysops
want to update vault binary
so that we are not too outdated from current release.
As a SysOps
I want to add option to enable Glue Sync Metastore Lisntener
So that I can use Athena to query apiary datalake.
When using https://github.com/HotelsDotCom/circus-train to replicate tables from one Apiary datalake to another, we hit the following Hive issue: https://issues.apache.org/jira/browse/HIVE-18767
This has been fixed in Hive 2.3.4+. Apiary Metastore docker image should be updated to use the AWS EMR repo 5.24.1, which is the latest EMR repo that installs Hive 2.3.4
As sysops,
I want to deploy apiary-metastore-listener jar file
so that we can remove corresponding java src file from this docker repo.
As a SysOps
I want to publish metastore events to SNS
So that those events can be processed by external application to automatically trigger replication or ETL.
As sysops,
I want to enable hive metastore ranger authorization
so that we can apply fine grained access control
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.