Code Monkey home page Code Monkey logo

Comments (4)

ptupitsyn avatar ptupitsyn commented on August 17, 2024

invalid BinarySchema

It is not invalid. If a schema was used once, it has to be stored to deserialize that object later.

You can try to minimize the number of unique schemas - for example, ensure consistent field order. Currently, for (Map.Entry<String, Object> item : param.entrySet()) can produce items in random order when underlying Map implementation is unordered.

from ignite.

asdfgh19 avatar asdfgh19 commented on August 17, 2024

@ptupitsyn Thanks for your reply and suggestion! By analyzing BinaryObjectBuilderImpl#serializeTo, I found a little rule.

 Writing fields out of order or updating some fields of an existing record will not create a new BinarySchema, but whenever we add a new field to a record, a new Schema will be created.

 Suppose there is such a scenario, first we create a new record with field A, and the schemaId of the record is 1 at this time.
 Next we update the record and add a field B, at this time the schemaId becomes 2. In this way, we add a new field every time until we reach the final goal of 1000 fields, and the schemaId becomes 1000.
 If there are no other records referencing the 999 BinarySchemas we created in the past, then no object deserialization needs to use these BinarySchemas.

 I solved this problem by a way. When writing the record for the first time, writing null to all non-existing fields creates a unique BinarySchema. But this solution wastes some extra memory, because null values occupy 2 bytes after serialization, and probably most records only need to write 200 fields out of 1000 fields..

from ignite.

ptupitsyn avatar ptupitsyn commented on August 17, 2024

Thanks for posting the solution. Yes, there is a trade-off - create more schemas, or waste some space for nulls.

Can you also describe your use case a little bit please? 1000 fields handled dynamically is somewhat unusual to see.

from ignite.

asdfgh19 avatar asdfgh19 commented on August 17, 2024

@ptupitsyn I agree with you. There is a trade-off here.

We try to store the latest values of properties and telemetry of devices in IOT scenarios. Each record is a device, and they generally have dozens to hundreds of fields.
Some telemetry may be uploaded every 1 minute, others may be uploaded every 5 minutes.
Some devices may upload only a few dozen telemetry fields, others may upload hundreds.

Even for a cache with only three fields, there may be 7 BinarySchemas at the beginning, such as 1, 2, 3, 12, 13, 23, 123. In the end, only one or two of them may exist after updating.

I have 2 plain ideas.
The first is to delete the BianrySchema that is not referenced by any object when writing or updating.

  1. Add a reference count to BinarySchema.
  2. When we write a new record, we add one to the reference count of the BinarySchema corresponding to this record.
  3. When we update an existing record,If we create a new BinarySchema,we can get the old BinarySchema and decrement its reference count by one.
  4. When we delete a record, we decrement the corresponding BinarySchema reference count by one.
  5. If the old BinarySchema reference count becomes 0, we can remove this BinarySchema from memory.
  6. The reference count does not need to be serialized into the binary_metadata file, because we can rebuild it from the cache when restarting, so that there is no need to update the binary_metadata file every time the BinarySchema reference count is updated.
  7. At each restart, when we find a BinarySchema is not referenced by any object, we can delete it from the binary_metadata file.

The second is to periodically check each cache and delete those BinarySchema that are not referenced by any object.

It looks like it's a bit complicated, just a proposal. But if we implement this maybe we can get better sparse storage and flexibility. So We can support schemaless better.

from ignite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.