Code Monkey home page Code Monkey logo

Comments (7)

sylvanayelda avatar sylvanayelda commented on June 9, 2024 1

@ion-elgreco I was using version 0.15.1. But with 0.14.0, I see the same issue (although the __delta_rs_path is no longer listed in either the LogicalPlan or ExecutionPlan schemas).

from delta-rs.

Blajda avatar Blajda commented on June 9, 2024

Does the error in this case happen consistently? Would be nice to have the exact stack trace / error.
Since this data is partitioned on two columns another possibility is the the distinct partition value scan might have an issue.

from delta-rs.

ion-elgreco avatar ion-elgreco commented on June 9, 2024

@Blajda yeah happens consistently, I'll try to see if I find some time to reproduce it with an MRE

I can try giving you the full stack trace but I have to rename a bunch of stuff since it's confidential, I'll get back to on that tomorrow:)

from delta-rs.

sylvanayelda avatar sylvanayelda commented on June 9, 2024

I am seeing the exact same behavior. My table is partitioned on two columns. When I change it to be partitioned by a single column, the error no longer occurs.

from delta-rs.

ion-elgreco avatar ion-elgreco commented on June 9, 2024

@sylvanayelda which version do you see the issue? Could you try it on 0.14.0 as well

from delta-rs.

Blajda avatar Blajda commented on June 9, 2024

@sylvanayelda Are you merging using the polars interface?
Please also provide the schema of the table you are using.

from delta-rs.

sylvanayelda avatar sylvanayelda commented on June 9, 2024

@Blajda I'm not using polars. Here is a sample of my code:

from deltalake import DeltaTable, write_deltalake
import pyarrow as pa

# schema is a pyarrow schema
source_table = pa.Table.from_pylist(records, schema=schema)
target_table = DeltaTable(table_path, storage_options=storage_opts)
(
          target_table.merge(
              source=source_table,
              source_alias=SOURCE_ALIAS,
              target_alias=TARGET_ALIAS,
              predicate=get_predicate(),
              large_dtypes=False,
          )
          .when_not_matched_insert(updates=get_inserts(schema.names))
          .execute()
        )

The table has the following schema:

Schema(
    [
        Field(partition_col1, PrimitiveType("string"), nullable=True),
        Field(col2, PrimitiveType("string"), nullable=True),
        Field(col3, PrimitiveType("string"), nullable=True),
        Field(col4, PrimitiveType("long"), nullable=True),
        Field(col5, PrimitiveType("long"), nullable=True),
        Field(col6, PrimitiveType("string"), nullable=True),
        Field(col7, PrimitiveType("string"), nullable=True),
        Field(col8, PrimitiveType("long"), nullable=True),
        Field(col9, PrimitiveType("long"), nullable=True),
        Field(col10, PrimitiveType("long"), nullable=True),
        Field(col11, PrimitiveType("long"), nullable=True),
        Field(
            col12,
            ArrayType(PrimitiveType("long"), contains_null=True),
            nullable=True,
        ),
        Field(
            col13,
            ArrayType(PrimitiveType("long"), contains_null=True),
            nullable=True,
        ),
        Field(partition_date, PrimitiveType("string"), nullable=True),
    ]
)

I should point out that we are also storing the data in ADLS2. Could that be causing any issues here?

from delta-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.