Code Monkey home page Code Monkey logo

Comments (7)

nicor88 avatar nicor88 commented on May 19, 2024 1

I agree with you @jessedobbelaere , the adapter shouldn't have any issue on using v3, as it's tight to the workgroup.

Let's refer to this to understand if there is work to do. After some testing I noticed those:

  • When working with Iceberg tables, v3 is 20-30% faster
  • When working with parquet v3 is 20-30% slower
  • Queries need to be re-written to take in consideration the data type breaking changes.

The biggest breaking changes should be in the model level, not in the adapters internal.

Also, we have this repo https://github.com/dbt-athena/dbt-athena-tester to use as reference to run the same set of models when developing. @Jrmyy and @jessedobbelaere feel free to have a look and add relevant models if necessary to test v2 vs v3

from dbt-athena.

Jrmyy avatar Jrmyy commented on May 19, 2024 1

We finally decided to go only for a support of v3 engine concerning Iceberg tables, (i.e. if you use parquet tables, you can still use the v2 engine). (#64)
What drove us to this decision is :

  • V3 Engine is based on Trino and all the new features are going to be pushed on v3 engine on the Athena side.
  • V3 Engine is more performant on Iceberg tables

The consequences are, for Iceberg, you will need :

  • To have a workgroup configured with V3 Engine
  • Use unique table locations (uuid, table_unique or schema_table_unique according to documentation).

from dbt-athena.

jessedobbelaere avatar jessedobbelaere commented on May 19, 2024

I also assume that the adapter itself is not tightly coupled to Athena engine v2 specifics. The work_group param allows you to switch between a v2 or v3 workgroup indeed.

Personally, I don't have experience running on athena engine v3 yet, as I experienced some Athena errors such as HIVE_METASTORE_ERROR: Database cannot be a link for this operation when called on a table. when running a create table on lake formation governed tables, or a random java.lang.NullPointerException in Athena. I also saw dbt-athena users having errors or performance issues in the #dbt-athena slack thread. But I'll log AWS support tickets and take it for a spin in a month again and evaluate 👌

from dbt-athena.

nicor88 avatar nicor88 commented on May 19, 2024

@Jrmyy I managed to use v3 with the adapter, I time to time need to apply explicit casting to the timestamp.
Here few:

  • when a timestamp field is overflowing I need to run cast(my_timestamp as timestamp(3))
  • I cannot use current_timestamp anymore, but rather cast(REPLACE(cast(current_timestamp as varchar), ' UTC', '') as timestamp(3)) as now maybe we create a macro?

I think that to tackle this issue, we could just add a section in the readme on how to solve common cases, to make extra smoother, a sort of enrichment of the athena docs.

from dbt-athena.

Jrmyy avatar Jrmyy commented on May 19, 2024

Yes, I think we can tackle this using README.md, since now we will support both engine versions but with different features (CTAS & merge strategies for v3, temp parquet table for v2 + some data types stuff).

from dbt-athena.

Jrmyy avatar Jrmyy commented on May 19, 2024

Should we close this since the documentation makes it clearer now what you can and can't do with different athena adapter versions and different table types ?

from dbt-athena.

nicor88 avatar nicor88 commented on May 19, 2024

Yes please.

from dbt-athena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.