elodina / go-avro Goto Github PK
View Code? Open in Web Editor NEWApache Avro for Golang
Home Page: http://elodina.github.io/go-avro/
License: Apache License 2.0
Apache Avro for Golang
Home Page: http://elodina.github.io/go-avro/
License: Apache License 2.0
I was going to take issue with the enum caching code from 2f447d1 partially because it disambiguates on the schema name and I don't feel like that's a good idea; enums can have very common names like 'Type' which is a type of thing that is contextually different based on what the enclosing schema is. I have a potential solution to that coming down the line which still provides the speedup advantage.
However, more importantly while reading it over, I found a data race. The source of the issue is that the map reads are not locked via mutexes. This is not allowed if the map can be written to (even if the writes are locked via mutex); and the end result is this can cause run-time crashes when running an application which runs multiple goroutines.
With these tests: crast@enum-data-race-proof
Runtime with go test -race
finds this race:
$ go test -race -run Race -v
=== RUN TestEnumCachingRace
==================
WARNING: DATA RACE
Read by goroutine 8:
runtime.mapaccess1_faststr()
/usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap_fast.go:179 +0x0
github.com/elodina/go-avro.(*GenericDatumReader).mapEnum()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:477 +0x1a9
github.com/elodina/go-avro.(*GenericDatumReader).readValue()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:422 +0x14a
github.com/elodina/go-avro.(*GenericDatumReader).findAndSet()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:382 +0xb5
github.com/elodina/go-avro.(*GenericDatumReader).mapRecord()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:555 +0x266
github.com/elodina/go-avro.(*GenericDatumReader).readValue()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:430 +0xd4
github.com/elodina/go-avro.(*GenericDatumReader).Read()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:364 +0x1c6
github.com/elodina/go-avro.enumRaceTest.func1()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:363 +0x3b7
github.com/elodina/go-avro.parallelF.func1()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:375 +0x74
Previous write by goroutine 7:
runtime.mapassign1()
/usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap.go:411 +0x0
github.com/elodina/go-avro.(*GenericDatumReader).mapEnum()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:484 +0x438
github.com/elodina/go-avro.(*GenericDatumReader).readValue()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:422 +0x14a
github.com/elodina/go-avro.(*GenericDatumReader).findAndSet()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:382 +0xb5
github.com/elodina/go-avro.(*GenericDatumReader).mapRecord()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:555 +0x266
github.com/elodina/go-avro.(*GenericDatumReader).readValue()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:430 +0xd4
github.com/elodina/go-avro.(*GenericDatumReader).Read()
$GOPATH/src/github.com/elodina/go-avro/datum_reader.go:364 +0x1c6
github.com/elodina/go-avro.enumRaceTest.func1()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:363 +0x3b7
github.com/elodina/go-avro.parallelF.func1()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:375 +0x74
Goroutine 8 (running) created at:
github.com/elodina/go-avro.parallelF()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:377 +0xb4
github.com/elodina/go-avro.enumRaceTest()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:364 +0x18d
github.com/elodina/go-avro.TestEnumCachingRace()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:346 +0xc2
testing.tRunner()
/usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc
Goroutine 7 (running) created at:
github.com/elodina/go-avro.parallelF()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:377 +0xb4
github.com/elodina/go-avro.enumRaceTest()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:364 +0x18d
github.com/elodina/go-avro.TestEnumCachingRace()
$GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:346 +0xc2
testing.tRunner()
/usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc
==================
--- PASS: TestEnumCachingRace (0.01s)
=== RUN TestEnumCachingRace2
--- PASS: TestEnumCachingRace2 (0.00s)
PASS
Found 1 data race(s)
exit status 66
FAIL github.com/elodina/go-avro 1.090s
https://github.com/elodina/go-avro/blob/master/schema.go#L1102
the default value of the field is converted into specific type
https://github.com/elodina/go-avro/blob/master/codegen.go#L485
the type of the default value is no longer float64
Currently:
float64
due to GoLang JSON unmarshalling.The above need to be fixed.
I’m confused… I’ve mostly just copied-and-pasted your example code from specific_datum.go
and adjusted it to use my own values, but it’s not working. I’m sure I’m just missing something… help?
The panic:
reflect: call of reflect.Value.Elem on struct Value
/usr/local/Cellar/go/1.4.2/libexec/src/runtime/panic.go:387
Full Stack Trace
/usr/local/Cellar/go/1.4.2/libexec/src/runtime/panic.go:387 (0x15328)
gopanic: reflectcall(unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
/usr/local/Cellar/go/1.4.2/libexec/src/reflect/value.go:703 (0x113db5)
Value.Elem: panic(&ValueError{"reflect.Value.Elem", v.kind()})
/Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:104 (0xfd32c)
(*SpecificDatumWriter).findField: elem := where.Elem() //TODO maybe check first?
/Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:93 (0xfd197)
(*SpecificDatumWriter).writeRecord: field, err := this.findField(v, schemaField.Name)
/Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:55 (0xfcad8)
(*SpecificDatumWriter).write: return this.writeRecord(v, enc, s)
/Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:34 (0xfc9ea)
(*SpecificDatumWriter).Write: return this.write(rv, enc, this.schema)
/Users/thavi/dev/go/src/github.com/timehop/streams/tests/integration/lastopens_integration_test.go:209 (0x6386a)
serialize: writer.Write(appopen, encoder)
My schema and target struct:
package events
// This is an Avro deserialization “target” or “template”.
type AppOpen struct{
Timestamp int64 // Unix timestamp (the number of seconds elapsed since January 1, 1970 UTC)
UserID int64
Platform string
}
type schema string
const AppOpenSchema schema = `{
"type": "record",
"name": "AppOpen",
"fields": [
{ "name": "Timestamp", "type": "long" },
{ "name": "UserID", "type": "long" },
{ "name": "Platform", "type": "string" }
]
}`
My serialize func:
func serialize(appopens []events.AppOpen, schema avro.Schema) [][]byte {
writer := avro.NewSpecificDatumWriter()
writer.SetSchema(schema)
results := make([][]byte, len(appopens))
for i, appopen := range appopens {
buffer := new(bytes.Buffer)
encoder := avro.NewBinaryEncoder(buffer)
fmt.Println("About to serialize:", appopen, "to", writer, "using", buffer, "and", encoder)
writer.Write(appopen, encoder)
results[i] = buffer.Bytes()
}
return results
}
the schema is parsed outside of this func but that seems to be working just fine.
What am I missing?
Project is looking very interesting, but it misses example how to load multiple avro schemas and then use them.
I can not see in test cases example where one would load few avro schemas where one is depended on another, and then serialize/deserialize message.
Hi,
I was wondering if that was possible to encode in binary without embedding the schema within the message?
Many thanks,
Marc
It would be very useful to allow one to do something like:
genericrecord.Set("enumname", "string")
This is currently not possible in the GenericDatum type.
I have a basic avro schema that includes some nested records, enums, and some arrays & maps.
We are observing no issues when roundtripping this data purely in go. By this, I mean serializing a record to binary data using go and then deserializing the same data back into memory.
However, a simple java test program using the java 1.7.7 avro libraries cannot deserialize binary avro data written in go. The inverse is not working either, i.e. we are not able to deserialize (in go) data generated using the same schema (by java).
As far as I can grok from looking at the binary data generated by each runtime, java appears to generate more densely packed 'array' and 'map' count values which proceed the actual data.
I have a suspicion that java is generating one byte 'count' values and this library is generating two byte values.
Has anybody tried this kind of java <> go interop for binary serialized data (specifically, maps/arrays)?
I can possibly supply some sample java & go code.
I tried to use the codegen to generate code for the following schema, obtained from:
http://wiki.pentaho.com/display/EAI/Avro+Input
{
"type": "map",
"values":{
"type": "record",
"name":"ATM",
"fields": [
{"name": "serial_no", "type": "string"},
{"name": "location", "type": "string"}
]
}
}
However, a panic occurs:
panic: interface conversion: interface is nil, not string
goroutine 1 [running]:
github.com/stealthly/go-avro.parseSchemaField(0x234780, 0x8205cde90, 0x8205a5128, 0x8205cde60, 0x22, 0x1, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:1028 +0x142
github.com/stealthly/go-avro.parseRecordSchema(0x8205cde30, 0x8205a5128, 0x8205cde60, 0x22, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:1013 +0x969
github.com/stealthly/go-avro.schemaByType(0x234780, 0x8205cde30, 0x8205a5128, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:962 +0x1361
github.com/stealthly/go-avro.ParseSchemaWithRegistry(0x3815c0, 0x246, 0x8205a5128, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:882 +0x182
github.com/stealthly/go-avro.ParseSchema(0x3815c0, 0x246, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:870 +0xd2
github.com/stealthly/go-avro.(_CodeGenerator).Generate(0x8205a5560, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/codegen.go:88 +0xe7
main.main.func1(0x82057c3c0)
/mypath/golang/src/.../schema.go:88 +0xaef
github.com/codegangsta/cli.Command.Run(0x2feb18, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x338360, 0x1d, 0x0, ...)
/mypath/golang/src/github.com/codegangsta/cli/command.go:127 +0x1052
github.com/codegangsta/cli.(_App).Run(0x820594200, 0x820552100, 0x4, 0x4, 0x0, 0x0)
/mypath/golang/src/github.com/codegangsta/cli/app.go:159 +0xc2f
main.main()
If a name property is added to the map object, then at least the panic is resolved...
We just recently open sourced our fork of the AVRO java code generator, which generates golang avro objects from avdl files. It currently generates bindings and associated roundtrip unit tests that use the AVRO C libraries for serialization/deserialization, as at the time it was written last summer we were not aware of any golang avro implementations.
Unfortunately, we no longer use golang in our environment, so we aren't planning on updating the code generator. However, with a little work, I suspect either this project or the other golang avro project could adapt it to use golang bindings instead of the C library.
Hi Team,
When I try to serialize Array of strings, I am getting an index out of range panic. Can you please post an example of writing an array of strings?
Here's my code (pretty much copy pasted from the example)...
reader, err := avro.NewDataFileReader(fileName, avro.NewSpecificDatumReader())
if err != nil {
fmt.Println(err)
return
}
for {
obj := &PADirectJustListedItem{}
ok, err := reader.Next(obj)
if !ok {
if err != nil {
fmt.Println(err)
return
}
break
} else {
fmt.Printf("%#v\n", obj)
}
}
ouput:
go run main.go
&main.PADirectJustListedItem{snapshotdate:(*int64)(nil), propertyid:(*int32)(nil), accountid:(*int32)(nil), bedrooms:(*int)(nil), bathrooms:(*string)(nil), finishedsqft:(*int)(nil), lotsizesqft:(*int)(nil), city:(*string)(nil), state:(*string)(nil), postalcode:(*string)(nil), propetyaddress:(*string)(nil), image1id:(*int64)(nil), image2id:(*int64)(nil), image3id:(*int64)(nil), manualimageid:(*int64)(nil), sellingpricedollarcnt:(*int32)(nil), realestatebrokerid:(*int32)(nil), daysonzillow:(*int32)(nil), multiplelistingservicecode:(*string)(nil), postingid:(*int32)(nil), postingdateinitial:(*int64)(nil), auditdatecreated:(*int64)(nil)}
&main.PADirectJustListedItem{snapshotdate:(*int64)(nil), propertyid:(*int32)(nil), accountid:(*int32)(nil), bedrooms:(*int)(nil), bathrooms:(*string)(nil), finishedsqft:(*int)(nil), lotsizesqft:(*int)(nil), city:(*string)(nil), state:(*string)(nil), postalcode:(*string)(nil), propetyaddress:(*string)(nil), image1id:(*int64)(nil), image2id:(*int64)(nil), image3id:(*int64)(nil), manualimageid:(*int64)(nil), sellingpricedollarcnt:(*int32)(nil), realestatebrokerid:(*int32)(nil), daysonzillow:(*int32)(nil), multiplelistingservicecode:(*string)(nil), postingid:(*int32)(nil), postingdateinitial:(*int64)(nil), auditdatecreated:(*int64)(nil)}
Block read is unfinished
I have a couple files to test with. All of them I'm able to use the avro tools to convert them to json and it works fine:
java -jar avro-tools-1.8.2.jar tojson part-m-00000.avro > 00001.json
To track the development of SpecificDatumWriter complex types
Currently, for GenericDatum:
Both of the above need to be fixed.
It would help a lot if there is an example to encode and decode a complex of complex type using generic datum.
I am having a hard time figuring it out.
What could be the problem if I get this error:
End of file reached
At:
decoder := avro.NewBinaryDecoder(message)
decodedRecord := avro.NewGenericRecord(avroSchema)
err := avroReader.Read(decodedRecord, decoder)
The messages will be decoded correctly by another avro decoder app (Java).
The schema contains only "string" type, but a lot of fields.
{
"type":"record",
"name":"schema",
"fields":[
{
"name":"prop1",
"type":"string"
},
{
"name":"prop2",
"type":"string"
},
...
]
}
We need some Godoc once API is more or less stable
Since you're talking about having master be an API break in #47, I would like to collect a few changes to make this library a bit more idiomatic with go usages:
NewBinaryDecoder
should return a Decoder interface, not the concrete typeNewBinaryEncoder
returns an Encoder interfaceTell()
from the Encoder interface. From what I can tell, it's not used at all and constrains writing your own Encoder should you desire.
NewBinaryEncoder
subsequently take an io.Writer
at construction. This is actually not a breaking change for user code because *bytes.Buffer
satisfies io.Writer, but it allows people to pass in other writers, like, for example, a network socket to encode avro directly to a network connection, or a file, or the like.FixedSchema
, EnumSchema
, IntSchema
and so on. This reduces visual clutter, and tab completion confusion. It should also be noted that even though there are pieces which switch on type codes using schema.Type()
, the datum writers do type asserts to get the concrete types of many of the the schema types so it's not like someone can simply implement the Schema
interface with their own type (or even embed the type) and use it as a replacement as it stands.
Nearly all of these changes, while technically breaking, shouldn't break the vast majority of user code, because most users aren't manipulating schema types or embedding the BinaryEncoder type, they just want to encode and decode from / to avro.
I'm happy to submit PR's for any and all of these changes if you approve of them
Hi,
why is 1 an invalid default value for a long type? For int its the same. Thanks!
{
"name": "Packet",
"type": "record",
"fields": [{
"name": "duration",
"type": "long",
"default": 1
}]
}
E:\Repositorys\hemera-golang>go run codegen.go --schema schemas/packet.avsc --out foo.go
Invalid default value for duration field of type long
exit status 1
I'm not sure if this is intentional or not, but the GenericDatumWriter currently writes the Fixed type as a Bytes type. That is to say it prepends the bytes with the length of the byte array. This does not follow avro's serialization for Fixed types (which does not include the length of the array). I noticed that in SpecificDatumWriter, the Fixed type is encoded as a Fixed type, hence why I'm not sure if this was intentional or not.
func (writer *GenericDatumWriter) writeFixed(v interface{}, enc Encoder, s Schema) error {
return writer.writeBytes(v, enc)
}
Hi there,
I'm trying to generate code with Codegen using two schemas.
In device.avsc, I have a record of type Device.
In status.avsc, I have a record of type Status that have a field of type Device.
I tried to first generate device.avsc and then status.avsc and to use both schema to generate one go file but Codegen says that the type Device is undefined.
Is there a way to make that happen ? If not, do you have ideas to modify Codegen in order to make that possible ? I'm willing to help.
Thanks and regards,
Albin.
I have a schema file containing an array of record schemas like so:
[
{
"name": "ReusedRecord",
"type": "record",
"fields": [
{
"name": "aString",
"type": "string"
}
]
},
{
"name": "MainRecord",
"type": "record",
"fields": [
{
"name": "reusedRecord1",
"type": "ReusedRecord"
},
{
"name": "reusedRecord1",
"type": "ReusedRecord"
}
]
}
]
The code generation works as expected in java through avro-tools but using go-avro it fails with this error: https://github.com/elodina/go-avro/blob/master/codegen.go#L93-L96 since the root schema is a UnionSchema and not a RecordSchema.
Does someone know how to circumvent this issue? Or should I repeat the schema x times?
It would be great to have a basic README.md to explain the status of this project. A simple usage example plus a note about what is and is not supported would go a long way. Alternatively, if this library is just for internal use, even a note saying that would be useful.
I tried this, which is based on an example I groked on Pentaho's website:
...
schemas = []string {
`{
"type": "record",
"namespace": "com.philips.lighting.dna.ingestion",
"name": "LongList",
"fields": [
{
"type": "map",
"name": "inner_name",
"values": {
"type": "record",
"name": "ATM",
"fields": [
{
"name": "serial_no",
"type": "string"
},
{
"name": "location",
"type": "string"
}
]
}
}
]
}`
}
gen := avro.NewCodeGenerator(schemas)
code, err := gen.Generate()
And the following error is generated:
2015/09/21 11:58:05 Unknown type name: map
make: *** [codegen] Error 1
It looks like the code that parses the "type" sees a string value associated with the "type: map" line, and then only can interpret basic types, excluding maps, arrays, enums etc.
Perhaps I misunderstood or then this example avro schema is invalid? (source for the schema: http://wiki.pentaho.com/display/EAI/Avro+Input)
Avro specification contains description of generating schema fingerprints. Is this functionality supported at all in the goavro package?
Hi,go-avro is a nice tool for golang to process avro ,but I find that codegen can not create go structs with union schema,is it a bug?
https://github.com/elodina/go-avro/blob/master/datum_reader.go
linenum:356
why omit the error?
Currently BinaryEncoder uses long
(int64
) and BinaryDecoder uses int
(int32
). Avro spec asks us to use long
.
The generic datums writeEnum method does not error if a passed string that is not in the list of symbols. It simply does not write the index, causing an error when decoding the data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.