Comments (7)
I agree with you @jessedobbelaere , the adapter shouldn't have any issue on using v3, as it's tight to the workgroup.
Let's refer to this to understand if there is work to do. After some testing I noticed those:
- When working with Iceberg tables, v3 is 20-30% faster
- When working with parquet v3 is 20-30% slower
- Queries need to be re-written to take in consideration the data type breaking changes.
The biggest breaking changes should be in the model level, not in the adapters internal.
Also, we have this repo https://github.com/dbt-athena/dbt-athena-tester to use as reference to run the same set of models when developing. @Jrmyy and @jessedobbelaere feel free to have a look and add relevant models if necessary to test v2 vs v3
from dbt-athena.
We finally decided to go only for a support of v3 engine concerning Iceberg tables, (i.e. if you use parquet tables, you can still use the v2 engine). (#64)
What drove us to this decision is :
- V3 Engine is based on Trino and all the new features are going to be pushed on v3 engine on the Athena side.
- V3 Engine is more performant on Iceberg tables
The consequences are, for Iceberg, you will need :
- To have a workgroup configured with V3 Engine
- Use unique table locations (
uuid
,table_unique
orschema_table_unique
according to documentation).
from dbt-athena.
I also assume that the adapter itself is not tightly coupled to Athena engine v2 specifics. The work_group
param allows you to switch between a v2 or v3 workgroup indeed.
Personally, I don't have experience running on athena engine v3 yet, as I experienced some Athena errors such as HIVE_METASTORE_ERROR: Database cannot be a link for this operation when called on a table.
when running a create table on lake formation governed tables, or a random java.lang.NullPointerException
in Athena. I also saw dbt-athena users having errors or performance issues in the #dbt-athena slack thread. But I'll log AWS support tickets and take it for a spin in a month again and evaluate 👌
from dbt-athena.
@Jrmyy I managed to use v3 with the adapter, I time to time need to apply explicit casting to the timestamp.
Here few:
- when a timestamp field is overflowing I need to run
cast(my_timestamp as timestamp(3))
- I cannot use current_timestamp anymore, but rather
cast(REPLACE(cast(current_timestamp as varchar), ' UTC', '') as timestamp(3)) as now
maybe we create a macro?
I think that to tackle this issue, we could just add a section in the readme on how to solve common cases, to make extra smoother, a sort of enrichment of the athena docs.
from dbt-athena.
Yes, I think we can tackle this using README.md, since now we will support both engine versions but with different features (CTAS & merge strategies for v3, temp parquet table for v2 + some data types stuff).
from dbt-athena.
Should we close this since the documentation makes it clearer now what you can and can't do with different athena adapter versions and different table types ?
from dbt-athena.
Yes please.
from dbt-athena.
Related Issues (20)
- model contract enforce is not done at compile time HOT 8
- Add issue templates
- seeds with columns starting with underscore fail HOT 2
- Unable to compile test models in Athena HOT 1
- Add parameter `detailed_table_type` for `list_relations_without_caching` function
- Support column level statistics HOT 4
- Add default_lf_inherited_tags
- When using `force_batch=true` with incremental models, it will fail if there is no data to write HOT 1
- [Communication] Install Slack on the repository to enable release subscription HOT 4
- Prune old table version for incremental models. HOT 1
- Athena partitions limit fix (#360) fails with partitions defined as non-Athena functions HOT 4
- Log the query id HOT 2
- Post hook when too many partitions failed HOT 3
- Allow dbt-athena to grant permissions via lakeformation named resource method
- `external_location` ignored when `table_type='hive'` HOT 5
- Support multi-engine views in Athena
- Fine-Grain Permissions make re-creating tables difficult. HOT 9
- (Potential) bug with incremental iceberg tables HOT 6
- [Feature] set external_location for iceberg table HOT 1
- [Feature] Introduce ha flag for table materialisation and iceberg tables to allow users to control final location
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-athena.