Code Monkey home page Code Monkey logo

Comments (8)

mjakubowski84 avatar mjakubowski84 commented on July 20, 2024

Hi!
Parquet itself does not support fields of type Any. You need to specify a fixed type. So I suggest you change the model of DataMap. For example, you can have two maps: stringIds: Map[String, String] and decimalIds: Map[String, BigDecimal].

from parquet4s.

reggieperry avatar reggieperry commented on July 20, 2024

Unfortunately, there’s way too much legacy code that depends on this. Can I dynamically generate the TypedSchemaDef via Ref[A] somehow? How is it that the SchemaDef I wrote actually works? I didn’t reason it out so much as I tried different things.

from parquet4s.

mjakubowski84 avatar mjakubowski84 commented on July 20, 2024

The schema is for the whole Parquet file - not for a single row. So, if you keep writing decimals to one file, and then all strings to another (with another schema) - then it will work.

However, you can expect later problems with reading files with conflicting schemas.

from parquet4s.

reggieperry avatar reggieperry commented on July 20, 2024

The thing is that I wrote the encoder to always write strings but it seems like the type of the input data is checked against the output schema as opposed to the encoder output being validated against the schema. So if I change that map to use stringSchema instead of decimalSchema, it fails to compile.

from parquet4s.

normana400 avatar normana400 commented on July 20, 2024

if the value of the Map[String,Any] can be of a finite set of possibilities (i.e either the value is a string or it is a long then I think the structure could feasibly be described as an Either.

Is there support for an Either structure? (ie a Map described as a Map[String, Either[String,Long]])

from parquet4s.

mjakubowski84 avatar mjakubowski84 commented on July 20, 2024

Of course, there is :)
As I said before - do not insist on saving heterogeneous values of a map to a single collection. Partition your map into two: one for strings and the second for decimals. E.g. you can encode Ref directly as a RowParquetRecord if creating an intermediary case class is such a problem:

implicit def myEncoder[T]: OptionalValueEncoder[Ref[T]] = 
  new OptionalValueCodec[CustomType] {
    override def encodeNonNull(ref: Ref[T], configuration: ValueCodecConfiguration): Value =
      RowParquetRecord("type" -> [type as string], "stringIds" -> MapParquetRecord(stringIds entries), "decimalIds" -> MapParquetRecord([decimalIds entries])
}

And define a corresponding groupSchema.

from parquet4s.

mjakubowski84 avatar mjakubowski84 commented on July 20, 2024

There's another low-level option - you can implement a custom version of MapParquetRecord, which writes several types of map entries: https://github.com/mjakubowski84/parquet4s/blob/master/core/src/main/scala/com/github/mjakubowski84/parquet4s/ParquetRecord.scala#L814 (not strictly one type, as it is done now).

However, I do not recommend it because it would be a non-standard approach to a map and reading such a map would be a challenge using any existing application/framework.

from parquet4s.

normana400 avatar normana400 commented on July 20, 2024

my map seems to write okay however when I attempt to read it in parquet tools, I get a ArrowInvalid: Map keys must be provided. Is there something I need to explicitly do to add the annotation here?
implicit def refSchema[A <: MyObject[_]](implicit stringSchema: TypedSchemaDef[String]): TypedSchemaDef[Ref[A]] = { SchemaDef .group( stringSchema("type"), SchemaDef.map(stringSchema, stringSchema)("ids") ).typed[Ref[A]] }

from parquet4s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.