Code Monkey home page Code Monkey logo

avro4s's People

Contributors

2chilled avatar adrian-salajan avatar agilelab-tmnd1991 avatar alexjg avatar andreas-schroeder avatar antwnis avatar bobbyrauchenberg avatar cb372 avatar dylanwilder avatar felixmulder avatar francisdb avatar jeroentervoorde avatar joan38 avatar jtvoorde avatar krzemin avatar l-cdc avatar lacarvalho91 avatar newta avatar olib963 avatar peterneyens avatar piotr-kalanski avatar rbnks avatar scala-steward avatar sethtisue avatar sirocchj avatar sksamuel avatar sohumb avatar wesselvs avatar whazenberg avatar zhen-hao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

avro4s's Issues

Unused import on AvroOutputStream usage

Copy pasted from my SO question:

I have small test class like this:

package test.avro

object Test extends App {
  import java.io.ByteArrayOutputStream
  import com.sksamuel.avro4s.AvroOutputStream

  case class Composer(name: String, birthplace: String, compositions: Seq[String])
  val ennio = Composer("ennio morricone", "rome", Seq("legend of 1900", "ecstasy of gold"))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Composer](baos)
  output.write(ennio)
  output.close()
  print(baos.toString("UTF-8"))
}

With relevant settings:

scalaVersion := "2.11.8"
libraryDependencies += "com.sksamuel.avro4s" %% "avro4s-core" % "1.6.1"

When I try to compile it I receive following error message:

[error] [path on my drive...]/src/main/scala/test/avro/Test.scala:1: Unused import
[error] package pl.combosolutions.jobstream.common
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed

I saw that there was similar error reported on avro4s issue tracker, but with implicit error not unused imports. However that was in version 1.5.0 - I am using version 1.6.1 (and tried several versions in-between to check if that's not some random regression). Changing avro4j import to import com.sksamuel.avro4s._ didn't help either.

On the other hand error message is similar to this one. I use Scala 2.11.8, but just in case I checked whether changing to 2.11.7 would help (it didn't).

What else can I try to figure out what is the source of such weird behavior? Is it something that I missed or a bug? Of so where should I file it? I suspect it is something with ToRecord trait macros, but I cannot tell for sure.

Removal of "-Ywarn-unused-import" make things work again - should I assume it is a bug in a library?

Sealed trait doesn't get encoded as an Avro enum

The Type Mappings table in the README suggests that I can use a sealed trait hierarchy to represent an enum, but it doesn't seem to work. Am I doing something wrong?

MyEnum.scala:

package foo

import com.sksamuel.avro4s._

sealed trait MyEnum
case object Wow extends MyEnum
case object Yeah extends MyEnum

case class Container(myEnum: MyEnum)

REPL:

import com.sksamuel.avro4s._

scala> AvroSchema[foo.MyEnum]
res1: org.apache.avro.Schema = {"type":"record","name":"MyEnum","namespace":"foo","fields":[]}

scala> AvroSchema[foo.Container]
res2: org.apache.avro.Schema = {"type":"record","name":"Container","namespace":"foo","fields":[{"name":"myEnum","type":{"type":"record","name":"MyEnum","fields":[]}}]}

I would expect the sealed trait to be encoded in the schema as:

{
  "type": "enum",
  "symbols": [ "Wow", "Yeah" ]
}

I thought this might be a symptom of SI-7046, but I can see that the shapeless Coproduct is derived just fine:

scala> Generic[foo.MyEnum]
res4: shapeless.Generic[foo.MyEnum]{type Repr = shapeless.:+:[foo.Wow.type,shapeless.:+:[foo.Yeah.type,shapeless.CNil]]} = anon$macro$3$1@57f23793

Add capability for scala ~> avro schema

Let's say i have a scala case class

package mypackge
case class MyClass(
  foo: Seq[Boolean]
)

And i want to quickly generate the appropriate Avro schema for the class above, and perform that in run-time so that an external UI can be build around it to allow easy (scala case) ~> (avro-schema) conversions

Using Generic Type Serialisation

Hi,

Im trying to write a Generic serialization method for Avro4s :

  def toBinary[T](event: T)(implicit schemaFor: SchemaFor[T],  toRecord: ToRecord[T]) = {
    val baos = new ByteArrayOutputStream()
    val output = AvroOutputStream.binary[T](baos)
    output.write(event)
    output.close()
    baos.toByteArray
  }

But I get the following error:

could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[T]

I also try this:

def toBinary[T](event: T)(implicit schema: AvroSchema[T],  toRecord: ToRecord[T]) = {
    val baos = new ByteArrayOutputStream()
    val output = AvroOutputStream.binary[T](baos)
    output.write(event)
    output.close()
    baos.toByteArray
  }

and I get the following error:

org.apache.avro.reflect.AvroSchema does not take type parameters
[error]   def toBinary[T](event: T)(implicit schema: AvroSchema[T],  toRecord: ToRecord[T]) = {

Is there anyway I can generalise the serialisation?

Order by context level avro schemas

Currently when transforming a JSON into Avro i.e. using https://www.landoop.com/tools/avro-scala-generator the order is type -> name -> namespace

{
  "type" : "record",
  "name" : "MyClass",
  "namespace" : "com.test.avro",

It would be more formal and give you a top down understanding if avro4s ordered avro schemas in contextual order i.e.

{
  "namespace" : "com.test.avro",
  "name" : "MyClass",
  "type" : "record",

Using Generic type doesn't populate the Record correctly

I am trying to use a generic on a function that parses my avro record using a schema. However when I use the generic it fails to set the data on the case class, when I use the explicit class on the function it works as expect.

I've attached a working example of what I'm talking about.
MyApp.txt

Could you explain why this doesn't work, if it should work and is it possible to fix or get around it ?

Support generating Avro partitioning files (for Kite)

Scenario. We need to define an Avro schema, and an Avro Partitioning Strategy as well

i.e.

app_logging.avsc

{
  "type": "record",
  "name": "app_logging",
  "namespace": "my.namespace.com",
  "doc": "An application log",
  "fields": [
      {
          "name": "level",
          "type": "string",
          "doc" : "Level"
      }, {
          "name": "application_name",
          "type": "string",
          "doc" : "Name of the application"
      }
    ]
}

Partitioning strategy:
app_logging_partition.json

[
  {"type": "identity", "source": "application_name", "name": "component_name"}
]

Could not find implicit value for parameter builder: Shapeless.Lazy

Hello,

I wrote the following code

val schema = AvroSchema[SalesRecord]
val output = AvroOutputStream[SalesRecord](new File(outputLocation))
output.write(salesList)
output.flush
output.close

But this throws a compile time error

could not find implicit value for parameter builder: shapeless.Laxy[com.sksamuel.avro4s.AvroSchema[...]]

not enough arguments for method apply: ....
Unspecified value parameter builder.

Support default values in Avro schema generated from a case class that has default parameter values

As of 2.10, Scala supports default parameter values. However, those values get lost in translation when generating a case class's Avro schema. Here is an illustrative example:

scala> import com.sksamuel.avro4s.AvroSchema
import com.sksamuel.avro4s.AvroSchema

scala> case class Ingredient(name: String = "flour")
defined class Ingredient

scala> val schema = AvroSchema[Ingredient]
schema: org.apache.avro.Schema = {"type":"record","name":"Ingredient","namespace":"$iw","fields":[{"name":"name","type":"string"}]}

Note that the default value "flour" for parameter name doesn't appear in the Avro schema generated for case class Ingredient. It would be nice if the generated schema were, instead,

{"type":"record","name":"Ingredient","namespace":"$iw","fields":[{"name":"name","type":"string","default":"flour"}]}

Since avro4s requires a later version (2.11.8, currently) than 2.10, I don't think there would be anything blocking such an enhancement.

Akka persistence - Added parameter

Hi,
i choose this library for serialize persistence actors in akka.
I use it like here: http://giampaolotrapasso.com/akka-persistence-using-avro-serialization/ but little smarter. Basicly i have problem with schem evolution:
If added new parameter to case class, persist actor is not restored and throw this:

java.util.NoSuchElementException: head of empty stream

My code:
Serializer class:

class CardEventSerializer extends SerializerWithStringManifest {

  override def identifier: Int = 1000100

  override def manifest(o: AnyRef): String = o.getClass.getName

  final val Test = classOf[Test].getName

  override def toBinary(o: AnyRef): Array[Byte] = {

    o match{
      case t: Test => toBinaryByClass(t, o)
      case _ => throw new IllegalArgumentException(s"Unknown event to serialize. $o")
    }
  }

  def toBinaryByClass[T: SchemaFor : ToRecord](c: T, o:AnyRef): Array[Byte] = {

    val os = new ByteArrayOutputStream
    val output = AvroOutputStream.binary[T](os)

    output.write(o.asInstanceOf[T])
    output.flush()
    output.close()
    os.toByteArray
  }

  override def fromBinary(bytes: Array[Byte], manifest: String): AnyRef = {
    manifest match {
      case TestManifest => fromBinaryByClass[Test](bytes)
      case _ => throw new IllegalArgumentException(s"Unable to handle manifest $manifest")
    }
  }

  def fromBinaryByClass[T: SchemaFor : FromRecord](bytes: Array[Byte]): T = {

    val is = new ByteArrayInputStream(bytes)
    val input = AvroInputStream.binary[T](is)
    is.close()
    input.iterator.toSeq.head
  }
}

My case class:

sealed trait TestEvents {
}
case class Test(p1: Long, p2: String, p3: Long) extends TestEvents

Now i persist actor...
And try to extend this like this:

case class Test(p1: Long, p2: String, p3: Long, p4: Int = 0) extends TestEvents

And crash...

Schema is generated fine. I try to print it, and default value is set in schema...
If i delete parameters, everything works fine. I was try use binary and json too.

So, my question is: Is some option to handle schema evolution?

Many thanks

Add scalajs support

I'm wondering if you could add support for Scala.js. The restriction is you can't do reflection, so it has to be macros and I think you'll need to use a library like https://github.com/mtth/avsc on the JS side. It might be a good fit with autowire.

Current solution I'm looking at is upickle (arity < 22) + circe (for arity > 22, but it makes the build extremely slow with shapeless and all), and boopickle for binary (not sure how compatible it's with all the browsers, and it's hard to debug since it doesn't have a human-readable codec). ScalaPB is also an alternative, but you need to write an IDL boilerplate.

AvroFixed Annotation Usage

I'm trying to create a schema that has some Fixed types, but I can't figure out how to do it.

I saw that there is an AvroFixed annotation and the README mentions that String can be a String or Fixed avro type.

This is what I tried:

@ case class FixedTest( @AvroFixed(4) value: String)

defined class FixedTest

@ val fixedSchema = AvroSchema[FixedTest]

fixedSchema: org.apache.avro.Schema = 
{"type":"record","name":"FixedTest","namespace":"$sess.cmd12","fields":[{"name":"value","type":"string"}]}

However the output doesn't have the fixed types. I looked a bit through the code, but I don't see anything similar to the other annotations where they check if the AvroFixed annotation exists.

I took a look and tried to see if I could take a stab at implementing it but I'm not sure how the Fixed type should exist, should it have an implicit ToSchema somewhere (I think this would mean it would need its own type and at that point no annotation would be needed?) or should it just be overridden at the end similar to how other annotations are handled.

    val field = new Schema.Field(name, schema, doc(annos), defaultNode)
    // check if the AvroField annotation exists, and if it does create a new schema or change this one here??
    aliases(annos).foreach(field.addAlias)
    addProps(annos, field.addProp)

If this is already done and I'm just going about this wrong let me know.
Any advice would be much appreciated!
Kalvin

Deserialising ADT

Hi,
how to map to avro:

sealed trait A
case class B(id: String) extends A
case class C(id: String) extends A
case class D(a: A)



def toBinary[T: SchemaFor : ToRecord](event: T): Array[Byte] = {
  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.binary[T](baos)
  output.write(event)
  output.close()
  baos.toByteArray
}

def fromBinary[T: SchemaFor : FromRecord](bytes: Array[Byte]) = {
  val in = new ByteArrayInputStream(bytes)
  val input = AvroInputStream.binary[T](in)
  input.close()
  input.iterator.toSeq
}

val d = D(B("1"))

val b = toBinary(d)

fromBinary(b)

When deserializing there is an error:

could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[

How can we deserialize ADTs, do we need a Packing Object

But the Packing object, Packs instances and not Type:

implicit val BasePack = Pack[Base](A -> 0, B -> 1)

RecordFormat and polymorphic types

Hello!

I have a case class and RecordFormat.to usage like so:

// with type mappings for Instant in scope
case class Trade(tid: Long, price: BigDecimal, volume: BigDecimal, timestamp: Instant, tpe: String)

val trades = List(Trade(....))
val listFormat = RecordFormat[List[Trade]]

val listRecord = listFormat.to(trades)

which has compilation error:

could not find implicit value for parameter toRecord: com.sksamuel.avro4s.ToRecord[List[co.coinsmith.kafka.cryptocoin.Trade]]
[error]   val listFormat = RecordFormat[List[Trade]]

Is this code snippet supposed to work? I believe so.

Deserialization seems failed

object AvroExample extends App {
  val pepperoni = Pizza("pepperoni", Seq(Ingredient("pepperoni", 12, 4.4), Ingredient("onions", 1, 0.4)), false, false, 98)
  val hawaiian = Pizza("hawaiian", Seq(Ingredient("ham", 1.5, 5.6), Ingredient("pineapple", 5.2, 0.2)), false, false, 91)

  val os = AvroOutputStream[Pizza](new File("pizzas.avro"))
  os.write(pepperoni)
  os.close()

  val is = AvroInputStream[Pizza](new File("pizzas.avro"))
  println(is.iterator.toList)
  is.close()
}

I follow example and get a empty List.., it seems deserialization failed

JSON serialize fails - could not find implicit evidence

Am I doing something daft? From the example:

object AvroTest extends App {
  import com.sksamuel.avro4s.AvroOutputStream

  case class Composer(name: String, birthplace: String, compositions: Seq[String])
  val ennio = Composer("ennio morricone", "rome", Seq("legend of 1900", "ecstasy of gold"))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Composer](baos)
  output.write(ennio)
  output.close()
  println(baos.toString("UTF-8"))
}

gives

Error:(14, 47) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[AvroTest.Composer]
  val output = AvroOutputStream.json[Composer](baos)
                                              ^

Error:(14, 47) not enough arguments for method json: (implicit evidence$5: com.sksamuel.avro4s.SchemaFor[AvroTest.Composer], implicit evidence$6: com.sksamuel.avro4s.ToRecord[AvroTest.Composer])com.sksamuel.avro4s.AvroJsonOutputStream[AvroTest.Composer].
Unspecified value parameter evidence$6.
  val output = AvroOutputStream.json[Composer](baos)
                                              ^

using sbt:

 sbt compile
[info] Loading project definition from /Users/nick/workspace/scala/avro4s-example/project
[info] Set current project to avro4s-example (in build file:/Users/nick/workspace/scala/avro4s-example/)
[info] Updating {file:/Users/nick/workspace/scala/avro4s-example/}avro4s-example...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /Users/nick/workspace/scala/avro4s-example/target/scala-2.11/classes...
[error] /Users/nick/workspace/scala/avro4s-example/src/main/scala-2.11/AvroTest.scala:12: could not find implicit value for evidence parameter of type com.sksamuel.avro4s.ToRecord[AvroTest.Composer]
[error]   val output = AvroOutputStream.json[Composer](baos)
[error]                                               ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 5 s, completed 05 Jul 2016 4:28:07 PM

Efficiently support value classes

This example

class Wrapper(val underlying: Int) extends AnyVal
case class Test(i:Int,w:Wrapper)
 AvroSchema[Test].toString(true)

Generates a complex record representation

{
  "type" : "record",
  "name" : "Test",
  "fields" : [ {
    "name" : "i",
    "type" : "int"
  }, {
    "name" : "w",
    "type" : {
      "type" : "record",
      "name" : "Wrapper",
      "fields" : [ {
        "name" : "underlying",
        "type" : "int"
      } ]
    }
  } ]
}

It would best if the schema simply represented w by the underlying int type.

Value classes are very useful, providing type-safety and domain operations without any run-time cost..

ADT

Given (defined in a different module, nesting the case class in a companion object UserEvent also doesn't work)

sealed trait UserEvent
case class Registered(id: UUID, name: String, email: String, password: String) extends UserEvent
case class EmailUpdated(id: UUID, email: String) extends UserEvent
case class PasswordUpdated(id: UUID, newPassword: String) extends UserEvent
case class LoggedIn(id: UUID, newPassword: String) extends UserEvent

Running

  val schema = AvroSchema[UserEvent]

  println(schema.toString(true))

Gives

{
  "type" : "record",
  "name" : "UserEvent",
  "namespace" : "statements.example",
  "fields" : [ {
    "name" : "newPassword",
    "type" : [ "null", "string" ]
  }, {
    "name" : "name",
    "type" : [ "null", "string" ]
  }, {
    "name" : "email",
    "type" : [ "null", "string" ]
  }, {
    "name" : "id",
    "type" : [ "string" ]
  }, {
    "name" : "password",
    "type" : [ "null", "string" ]
  } ]
}

I would expect a enum here. It should also include a discriminator field (like type or something)

As a side note, can't we use the shapeless type class derivation trick to create product and coproduct instances for SchemaFor[T] ?

WDYT?

Remove generated dirs on sbt clean ?

When compiling a project having avro4s as a dependency - classes are generated at folders:
avro4s-core and avro4s-generator

Can they be automatically removed on sbt clean ?

Value class does not deserialize json

Thank you for implementing #50

Unfortunately there appears to be a defect in AvroInputStream for both binary and json

package com.sksamuel.avro4s.examples


import com.sksamuel.avro4s.AvroSchema
import org.scalatest.{Matchers, WordSpec}

case class Wrapper(val underlying: Int) extends AnyVal
case class Test(i:Int,w:Wrapper)

class ValueClassExamples extends WordSpec with Matchers {
  import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
  import com.sksamuel.avro4s.{AvroInputStream, AvroOutputStream}

  "AvroStream json serialization" should {

    "round trip the value class " in {
      val baos = new ByteArrayOutputStream()
      val output = AvroOutputStream.json[Test](baos)
      output.write(Test(2,Wrapper(10)))
      output.close()

      val json = baos.toString("UTF-8")

      json shouldBe ("""{"i":2,"w":10}""") // as expected

      val in = new ByteArrayInputStream(json.getBytes("UTF-8"))
      val input = AvroInputStream.json[Test](in) //  <=== fails
      val result = input.iterator.toSeq
      result shouldBe Vector("")
    }
  }

}

The stack trace is

10 (of class java.lang.Integer)
scala.MatchError: 10 (of class java.lang.Integer)
    at com.sksamuel.avro4s.LowPriorityFromValue$$anon$1.apply(FromRecord.scala:22)

Avro4s should support Scala Enumerations

Hello,

I wrote the following code

val is = AvroInputStreamMyType

"body" is value that i got from a callback, and the type is array of byte (body: Array[Byte])

But this throws a compile time error

could not find implicit value for parameter builder:
Error:(48, 60) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[MyType]
implicit val is = AvroInputStreamMyType
^
not enough arguments for method apply: (implicit evidence$1: com.sksamuel.avro4s.SchemaFor[MyType], implicit evidence$2: com.sksamuel.avro4s.FromRecord[MyType])com.sksamuel.avro4s.AvroInputStream[MyType] in object AvroInputStream.
Unspecified value parameter evidence$2.
implicit val is = AvroInputStreamMyType

enums don't work in unions or arrays

Hi there.

This may be related to #60 but I'm not sure; thought I'd report it anyway. It may also be two separate bugs; let me know if you'd like me to file it as two separate issues.

I can't seem to get an enum to work inside either a union or an array.

For the array case, I can create a wrapper record with one field that holds the array; I haven't tried the wrapper with the union.

I start with the avsc file and generate the Scala code. The scala case classes look fine, but blow up when trying to serialize using the data codec.

Can't create schema for generic type

Hi,

I'm a fan of your library, having looked around at the others yours is cleanest and most well thought through. However I've found a limitation that may be a bug - not sure.

I'm trying to nest case classes like this;

case class Message[T](payload: T, identity: String = UUID.randomUUID().toString)
case class MyRecord(key: String, str1: String, str2: String, int1: Int) 

AvroSchema[Message[MyRecord]]

However I get the compiler error;
Could not find implicit SchemaFor[Message[MyRecord]] AvroSchema[Message[MyRecord]]

I need to avoid having a SchemaFor defined for every T.

Perhaps there's something I'm doing wrong? I've tried several ways to specify a SchemaFor, but not having a lot of luck.

Really keen to use your library on an up-coming project - please help!

Regards,
Ryan.

BigDecimal scale and precision support

Hello again,

I'm looking for support for setting the scale and precision of an Avro BigDecimal. This is possible via implicits and a new type that contains the scale and precision.

I will have a contribution for it soon.

Scala Enumerations inside collection

@A1kmm observed that an invalid schema was generated for collections that contain scala Enums, resulting in a NPE on deserialization, eg

object MyEnum extends Enumeration {
  val AnEnum = Value
}
val s = Set[MyEnum]()

NPE happens here:
https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/FromRecord.scala#L77

Called from here (note no second parameter to apply, so default of null is used): https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/FromRecord.scala#L95

Modernize the build

The SBT docs recommend using .sbt files in favor of the "old" project/*.scala method. Also, I think the build could be cleaned up by creating AutoPlugins for the global settings and for publishing. I'll issue a PR with these changes.

value schema in trait ToSchema of type org.apache.avro.Schema is not defined

Hello!

I get this build error when defining a custom type mapping with 1.5.1.

Error:(30, 19) object creation impossible, since value schema in trait ToSchema of type org.apache.avro.Schema is not defined
  implicit object InstantToSchema extends ToSchema[Instant] {

Here is the mapping code:

  implicit object InstantToSchema extends ToSchema[Instant] {
    override def apply(): Schema = Schema.create(Schema.Type.STRING)
  }

  implicit object InstantToValue extends ToValue[Instant] {
    override def apply(value: Instant): String = value.toString
  }

  implicit object InstantFromValue extends FromValue[Instant] {
    override def apply(value: Any, field: Field): Instant = Instant.parse(value.toString)
  }

This does not happen in 1.4.3 with the same code.

Cheers!

Getting warning of local val never used, using AvroInputStream.data

When using AvroInputStream.data[SomeType](bytes) I'm getting a warning,

local val in value x$1 is never used

I use -Xfatal-warnings as scalac option, which means this turns into an error.
Am I doing something wrong here, or do you ignore warnings?

The below code triggers the warning:

package my.pckg

import com.sksamuel.avro4s._
case class Bla(name: String)
object Avro {
  def fromAvro(bytes: Array[Byte]) = {
    AvroInputStream.data[Bla](bytes)
  }
}

Incompatible migration with Option

For the case class:
Person(name: String)
I have the following schema:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    }
  ]
}

I'd like to add an optional field "nickname":
Person(name: String, nickname: Option[String] = None)
That must result in the schema:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "nickname",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

But unfortunately, the schema generated is missing the default to null:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "nickname",
      "type": [
        "null",
        "string"
      ]
    }
  ]
}

And so it's not compatible.

Primitive type schemas

Hi @sksamuel

When I used to do println(AvroSchema[Double]), it would return "double". Now it seems to do {"type":"record","name":"Double","namespace":"scala","fields":[]}.

Not sure if I'm doing something wrong here?

How to use Avro4s to create generic KafkaDecoder for Spark streaming

I would like to use your great work but I'm stuck with my limited knowledge of Scala to solve the following problem. I know you possible don't know Kafka and Spark but it is more about reflection and context bounds.

RecordFormat[T] requires fromRecord and toRecord implicit values. Spark KafkaDirectStream takes Kafka Decoder type and creates instance of it via reflection looking for a single argument constructor. When fromRecord and toRecord are provided in decoder implementation via context bounds or as implicit values in separate parameter list compiler add them to constructor as additional arguments breaking the required contract.

Does anybody know how to provide these implicits in KafkaAvroGenericDecoder shown below so that RecordFormat is satisfied without breaking Spark code?

Goal is to let avro4s generate decoding from Avro GenericRecord to case class using Confluent Schema Registry.

Spark code to satisfy, VD is Decoder type and V is resulting case class, Decoder is interface defined by Kafka:

val valueDecoder = classTag[VD].runtimeClass.getConstructor(classOf[VerifiableProperties])
  .newInstance(props)
  .asInstanceOf[Decoder[V]]

The code I currently have, context bounds removed:

class KafkaAvroGenericDecoder[T]
  extends AbstractKafkaAvroDeserializer
  with Decoder[T] {

  def this(schemaRegistry: SchemaRegistryClient) = {
    this()
    this.schemaRegistry = schemaRegistry
  }

  def this(schemaRegistry: SchemaRegistryClient, props: VerifiableProperties) = {
    this(schemaRegistry)
    configure(deserializerConfig(props))
  }

  /** Constructor used by Kafka consumer */
  def this(props: VerifiableProperties) = {
    this()
    configure(new KafkaAvroDeserializerConfig(props.props))
  }

  override def fromBytes(bytes: Array[Byte]): T = {
    val record = deserialize(bytes).asInstanceOf[GenericRecord]
    val format = RecordFormat[T]
    format.from(record)
  }
}

object Kafka {
  def createStream[V : ClassTag](topics: Set[String])
    (implicit ssc: StreamingContext): InputDStream[(Array[Byte], V)] = {

    KafkaUtils.createDirectStream[Array[Byte], V, DefaultDecoder, KafkaAvroGenericDecoder[V]](
      ssc, session.kafkaConf, topics)

    // Or this one for test. Change return type to InputDStream[Array[Byte]]
    // createTestStream[V, KafkaAvroGenericDecoder[V]](ssc):
  }

  // just to print out list of constructors for decoder and try to create its instance
  private def createTestStream[V, VD : ClassTag](ssc: StreamingContext): InputDStream[Array[Byte]] = {
     val props = new Properties()
     props.put("bootstrap.servers", "localhost:9092")
     props.put("group.id", "test")
     props.put("schema.registry.url", "http://localhost:8989")
     props.put("partition.assignment.strategy", "range")

     classTag[VD].runtimeClass.getConstructors.foreach { x =>
       println("CONSTRUCTOR INFO: " + x)
     }

     val valueDecoder = classTag[VD].runtimeClass.getConstructor(classOf[VerifiableProperties])
       .newInstance(new VerifiableProperties(props))
       .asInstanceOf[Decoder[V]]

     val seq = Seq("k1".getBytes, "k2".getBytes, "k3".getBytes)
     val rdd = ssc.sparkContext.makeRDD(seq)

     new ConstantInputDStream(ssc, rdd)
    }
  }
}

// user code
object Streamz {
  def run()(implicit ssc: StreamingContext): Unit = {
    val topics = Set("test")
    val stream = Kafka.createStream[MyFancyEvent](topics)
    val result = stream.count()
    result.print()
  }
}

With context bounds:

class KafkaAvroGenericDecoder[T : FromRecord : ToRecord]
  extends AbstractKafkaAvroDeserializer
  with Decoder[T] { ... }

object Kafka {
  def createStream[V : FromRecord : ToRecord : ClassTag](topics: Set[String]) { ... }
}

Many thanks for any advice,
Vladimir

Compatibility issue Serializing/Deserializing messages generated with avro4s and other languages

I have two separate applications that interact using a queue in the middle, and they communicate sending messages to each other, using Avro (Serialize and Deserialize).
The queue in the middle is managed by a message broker (RabbitMq)

The scenario is really simple:

  • one application create a message (based of a schema), serialize the message and put the message in the queue.
  • on the other side there is an other program that pick this message, deserialize the avro message and read the content.

This is my schema (really simple)

{
    "type": "record",
    "name": "Surface",
    "fields": [
    {
        "name": "surfaceId",
        "type": "string"
    }]
}

case class Surface(surfaceId: String)

I serialize the message using avro4s library and I send this message to message broker.
I use this code to serialized the message:

First scenario: two applications made in Scala

    val surface = Surface(surfaceId = "5423g343423")
    val stream = new ByteArrayOutputStream()
    val os = AvroOutputStream[Surface](stream, true)
    os.write(surface)
    os.flush()

On the other side there is an other application that grab the message from the queue, and deserialize the message with the avro4s library. Body is Array[Byte]

        val is = AvroInputStream[Surface](body)
        val result = is.iterator.toSet
        println(result.mkString("\n"))
        is.close()

This configuration works perfectly.

Second scenario: One application made in Scala serialize the message and on the other side an other application made in Java, deserialize the message and read the content.
I try to deserialize the message using the official Java module Apache Avro (https://avro.apache.org/docs/1.7.7/gettingstartedjava.html) I got an error and I am not able to deserialize the content of the message.

Third scenario: One application made in Scala serialize the message and on the other side an other application made in Python, deserialize the message and read the content.
I got an error when I tried to deserialized the message (same as with Java)

Forth scenario: One application made in Python serialize the message and on the other side an other application made in Scala, deserialize the message and read the content.
When I tried to deserialize the message using the avro4s module I got an error.

Fifth scenario: One application made in Python serialize the message and on the other side an other application made in Java, deserialize the message and read the content.
It works perfectly!

Sixth scenario: One application made in Python serialize the message and on the other side an other application made in JavaScript, deserialize the message and read the content.
It works perfectly!

I suspect there is a compatibility issue in Serializing and Deserializing messages using the avro4s and other library.

Or I'm doing something wrong ...

Can't use custom type mappings for value classes

It seems like custom ToValue and FromValue instances for value classes are not being picked up by the ToRecord macro.

Example:

import java.io.ByteArrayOutputStream

import com.sksamuel.avro4s._
import org.apache.avro.Schema
import org.apache.avro.Schema.Field

object Main extends App {
  final case class Item(valueString: String) extends AnyVal {
    def value: Int = valueString.toInt
  }

  object Item {
    implicit object ItemToSchema extends ToSchema[Item] {
      override val schema: Schema = ToSchema.IntToSchema()
    }

    implicit object ItemToValue extends ToValue[Item] {
      override def apply(item: Item): Int = item.value
    }

    implicit object ItemFromValue extends FromValue[Item] {
      override def apply(value: Any, field: Field): Item =
        Item.apply(FromValue.IntFromValue(value, field).toString)
    }
  }

  final case class Group(item: Item)

  val group = Group(Item("123"))

  println(ToRecord[Group].apply(group))

  val baos = new ByteArrayOutputStream()
  val output = AvroOutputStream.json[Group](baos)
  output.write(group)
  output.close()

  println(baos.toString("UTF-8"))
}

Gives:

{"item": "123"}
[error] (run-main-2c) java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:117)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
	at com.sksamuel.avro4s.AvroJsonOutputStream.write(AvroOutputStream.scala:83)
...

(The ToSchema is picked up, but the ToValue and FromValue are not.)

Making Item a normal case class yields the expected results:

{"item": 123}
{"item":123}

Example deserialization code from readme.md doesn't compile as is

Hi,

The example deserialization code on the main page :

import java.io.File
import com.sksamuel.avro4s.AvroInputStream

val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))
val pizzas = is.iterator.toSet
is.close()
println(pizzas.mkString("\n"))

doesn't compile for me (using Scala 2.11.8 & avro4s 1.6.2):

Error:(13, 45) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[scalapt.core.Pizza]
        val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))
Error:(13, 45) not enough arguments for method data: (implicit evidence$23: com.sksamuel.avro4s.SchemaFor[scalapt.core.Pizza], implicit evidence$24: com.sksamuel.avro4s.FromRecord[scalapt.core.Pizza])com.sksamuel.avro4s.AvroDataInputStream[scalapt.core.Pizza].
Unspecified value parameter evidence$24.
        val is = AvroInputStream.data[Pizza](new File("pizzas.avro"))

Am I missing an implicit import?

Read via kafka-avro-console-consumer

I write to topic using avro4s with GenericRecord java api. Then, when I try to read topic via kafka-avro-console-consumer (which is part of confluent platform) I will get following error:

Processed a total of 1 messages
[2016-11-08 09:42:47,646] ERROR Unknown error when running consumer:  (kafka.tools.ConsoleConsumer$:103)
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!

Using producer with plain java api does not produce the error above. Is it bug or I just misunderstood usage of avro4s library?

the problem of new schema creation for each record.

Hi Samuel,

I have a problem when using ToRecord for converting case class objects to GenericRecord for Kafka, since for each object a new schema is generated like at https://github.com/sksamuel/avro4s/blob/master/avro4s-macros/src/main/scala/com/sksamuel/avro4s/ToRecord.scala#L170, kafka takes an exception like that; https://groups.google.com/forum/#!msg/confluent-platform/gkmtn2FO4Ug/IIsp8tZHT0QJ

Could we have a chance to pass a pre-defined schema for a case class to ToRecord instead of generating at each time.

fails with sealed abstract class

this test fails - when I've got a sealed base class

package com.sksamuel.avro4s

import java.io.File
import org.scalatest.{Matchers, WordSpec}

class EdgeCasesTest extends WordSpec with Matchers {

  val instance = Enterprise(EntityId("Dead"), Person("Julius Caesar", 2112))
  val go = GoPiece(GoBase.StringInstance("hello"), GoBase.EmptyInstance,
    GoBase.IntInstance(42), GoBase.EmptyInstance)

  val sealedAC = Wrapper(SealedAbstractClass1("a"), SealedAbstractClass3)

  "Avro4s" should {
    "support extends AnyVal" in {
      val tempFile = File.createTempFile("enterprise", ".avro")

      val writer = AvroOutputStream[Enterprise](tempFile)
      writer.write(instance)
      writer.close()

      val reader = AvroInputStream[Enterprise](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (instance)
    }

    "support sealed case object instances 1" in {
      val tempFile = File.createTempFile("go", ".avro")

      val writer = AvroOutputStream[GoPiece](tempFile)
      writer.write(go)
      writer.close()

      val reader = AvroInputStream[GoPiece](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (go)
    }

    "support sealed case object instances 2" in {
      val tempFile = File.createTempFile("sac", ".avro")

      val writer = AvroOutputStream[Wrapper](tempFile)
      writer.write(sealedAC)
      writer.close()

      val reader = AvroInputStream[Wrapper](tempFile)
      val enterprise = reader.iterator.next()
      enterprise should === (sealedAC)
    }
  }
}

case class Enterprise(id: EntityId, user: Person)
case class Person(name: String, age: Int)
case class EntityId(v: String) extends AnyVal

sealed abstract class SealedAbstractClass {}
case class SealedAbstractClass1(v: String) extends SealedAbstractClass
case class SealedAbstractClass2(v: Int) extends SealedAbstractClass
case object SealedAbstractClass3 extends SealedAbstractClass

case class Wrapper(a: SealedAbstractClass, b: SealedAbstractClass)

sealed abstract class GoBase {}
object GoBase {
  implicit class StringInstance(v: String) extends GoBase
  implicit class IntInstance(v: Int) extends GoBase
  case object EmptyInstance extends GoBase
}
case class GoPiece(a: GoBase, b: GoBase, c: GoBase, d: GoBase)

not available for 2.10.x

The only file that needs to be changed is in AvroSchema.scala - some of the reflection code is different.

      // in Build.scala
    libraryDependencies ++= {
      if (scalaVersion.value.contains("2.11")) {
        Seq(
          "org.scala-lang"        % "scala-reflect" % scalaVersion.value,
          "com.chuusai"           %% "shapeless" % ShapelessVersion,
          "org.apache.avro"       % "avro" % AvroVersion,
          "org.slf4j"             % "slf4j-api" % Slf4jVersion,
          "log4j"                 % "log4j" % Log4jVersion % "test",
          "org.slf4j"             % "log4j-over-slf4j" % Slf4jVersion % "test",
          "org.scalatest"         %% "scalatest" % ScalatestVersion % "test"
        )
      } else Nil
    },
    libraryDependencies ++= {
      if (scalaVersion.value.contains("2.10")) {
        Seq(
          "org.scala-lang"        %  "scala-reflect"    % scalaVersion.value,
          "com.chuusai"           %% "shapeless"        % "2.2.5",
          "org.apache.avro"       %  "avro"             % AvroVersion,
          "org.slf4j"             %  "slf4j-api"        % Slf4jVersion,
          compilerPlugin("org.scalamacros" % "paradise_2.10.6" % "2.1.0"),
          "log4j"                 %  "log4j"            % Log4jVersion % "test",
          "org.slf4j"             %  "log4j-over-slf4j" % Slf4jVersion % "test",
          "org.scalatest"         %% "scalatest"        % "2.2.6"      % "test"
        )
      } else Nil
    },
    ...

  lazy val core = Project("avro4s-core", file("avro4s-core"))
    .settings(rootSettings: _*)
    .settings(publish := {})
    .settings(name := "avro4s-core")
    .settings(unmanagedSourceDirectories in Compile ++= {
      if (scalaVersion.value startsWith "2.10.")
        Seq(baseDirectory.value / "src"/ "main" / "scala-2.10")
      else if (scalaVersion.value startsWith "2.11.")
        Seq(baseDirectory.value / "src"/ "main" / "scala-2.11")
      else Nil
    })

Avro schema embedded along with binary

Hi,
I was trying to use your library for publishing Kafka messages with Avro. The problem I ran into is the consumer side cannot parse the message because schema is embedded with binary content. It looks like an object container file. Is there a way around it?

Thank you,
Arsen

Scala Type DateTime is not supported.

If i have org.joda.time.DateTime in my case class declaration, I am seeing an error while Invoking AvroInputStream . Please find below the error .It would be great if avro4s has a type matching this DateTime

my case class : a.b.c.d.e.pkg01.TransactionLine
Error:(90, 78) could not find implicit value for evidence parameter of type com.sksamuel.avro4s.FromRecord[a.b.c.d.e.pkg01.TransactionLine]
val is: AvroInputStream[TransactionLine] = AvroInputStream[TransactionLine](new File%28"/tmp/TransactionLine.avro"%29)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.