o2-czech-republic / proxima-platform Goto Github PK

The Proxima platform.

License: Apache License 2.0

Java 99.36% Makefile 0.05% Python 0.27% Shell 0.22% Groovy 0.11%

batch-processing analytical-platform unified-data-processing apache-spark apache-flink apache-beam iot-platform data-mesh stream-processing

proxima-platform's Issues

Use thread pool for observers

Currently, observers start their own threads outside of any thread pool or executor. We should use some Executor for that. This should be configured on Repository and used platform-wide.

Create IO for PubSub

Implement Google PubSub commitlog and partitionedview. The partitionedview will be implemented by Apache Beam Repartition.

Replace ApproxPercentileMetric with windowed T-digest

ApproxPercentileMetric is flaky and sometimes causes test failures. Replace this implementation with T-digest.

Add generic entity transformation

Each attribute family must have option to specify generic transformation function that will convert given entity to some other entity and/or attribute. This is to facilitate automatic entity transformation and migration of such transformating pipelines.

Add option to disable whole attribute family

attributeFamilies.<family>.disabled = true

cz.o2.proxima.kafka.shaded.org.apache.kafka.clients.consumer.internals.Fetcher produces enormous amount of logs when DEBUG level is set

This class produces enormous amount of logs when DEBUG level is set. Is it necessary?

Support seamless replication

Need support for multimaster-multislave replication schemes accross multiple proxima deployments with eventual consistency. Required features:

the replication will work on commit logs only - all other consistency and replication will occur inside given proxima deployment
easy configuration
ability to observe non replicated items in commit log only (defaults to whole commit log in case there is no replication)

Draft of configuration:

 entities {
   original {
     attrA: { scheme: "bytes" }
     attrB: { scheme: "bytes" }
   }
 }
 replication {
   replicate-original-attrA {
     entity: original
     attributes: [ "attrA" ] # can be "*"
     # declare attribute families that will be written when update happens
     targets {
       first {
         // attribute family specifier
       }
       second {
         // attribute family specifier
       }
     }
     # declare single source family that will be read and written to local commit log
     source {
       // attribute family specifier
     }
   }
   replicate-original-all {
     entity: original
     attributes: [ "*" ] # replicate all attributes with some other settings
     targets { ... }
     source { ... }
   }
 }

In order to accomplish this we need to:

create implicit attribute for each replicated one (prefixed with _<replication_name>_s_ for source attribute family, _<replication_name>_t_<target_name>_ for target attribute), _<replication_name>_ for direct writes and _<replication_name>_r_ for the replicated attribute containing all replicated data (which will be read instead of original attribute, which becames virtual)
relax AttributeProxy so that it can be used only for writes (i.e. attribute is modified only on direct write, read happens on actual attribute) - this will be used to transform writes to attribute X into _<replication_name>_X
start replication of _<replication_name>_X into X
start replication of _<replication_name>_source_X into X
start replication of _<replication_name>_X into each of _<replication_name>_target_<target_name>_X

Add docs and value proposition wiki

Summarize design goals and value proposition.

Create multi-family random access reader

It is necessary to allow reading from multiple random-access families by a single (wrapper) random access reader. The reader might be constructed using a builder:

 RandomAccessReader multi = MultiRandomReader.newBuilder()
     .addReader(reader1, attr1, attr2, attr3)
     .addReader(reader2, attr4, attr5)
     ...
     .build()

Support Kafka Streams as an executor for processing API

Because of ease of deployment it is needed to use Kafka Streams as the first class executor for the processing API.

Create consistently partitioned stream views

Current abstraction of observers is insufficient, we need to be able to consistently partition the elements in a commit log. In order to achieve this, we need introduce a partitioned view for a commit-log attribute family. This view can then be observed via RemoteLogObserver, with following features:

the onNext method will take a Collector, which will collect results
these results will be available as euphoria's Dataset
the PartitionedView's observe method will return the Dataset
the flow will be deployed in a standard euphoria way, by calling Flow.submit()
RemoteLogObserver needs to be Serializable.

Outage of tests running on Stage env.

We have encountered outage of tests running on stage env. Logs are enclosed below.

Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 1782419
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.kafka.KafkaCommitLog$2.onPartitionsAssigned(KafkaCommitLog.java:417)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:367)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.kafka.KafkaCommitLog.processConsumerWithObserver(KafkaCommitLog.java:563)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.kafka.KafkaCommitLog.processConsumer(KafkaCommitLog.java:525)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.kafka.KafkaCommitLog.lambda$observePartitionsBulk$7(KafkaCommitLog.java:432)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at java.lang.Thread.run(Thread.java:748)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1782419
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:76)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:50)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.hadoop.io.SequenceFile$Writer.sync(SequenceFile.java:1299)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1539)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1569)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.hdfs.HdfsDataAccessor.rollback(HdfsDataAccessor.java:125)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.server.IngestServer$3.onRestart(IngestServer.java:885)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #011at cz.o2.proxima.storage.kafka.KafkaCommitLog$2.onPartitionsAssigned(KafkaCommitLog.java:397)
Nov 9 08:38:41 mujsignal-205 pexeso-ingest[1144]: #11... 10 more

Create compatibility test-kit for readers, writers and views

Create generic test suite for each common interface and make sure all implementations pass these tests.

Create ProximaIO for Apache Beam

Create PoC of ProximaIO which will create PCollection from given attributes and/or will be able to persist PCollection into given attribute(s).

Add option to disable transformers

Transformers need option to be disabled.

Validate exceptions thrown on misconfiguration

When a miscofiguration occurs, we need to provide user with descriptive exception explaining the problem. Currently, when using type: primary for storage without online writing capability, we get

Exception in thread "main" java.lang.ClassCastException: cz.o2.proxima.storage.hdfs.HdfsDataAccessor cannot be cast to cz.o2.proxima.storage.OnlineAttributeWriter
   at cz.o2.proxima.storage.AttributeWriterBase.online(AttributeWriterBase.java:47)
   at cz.o2.proxima.repository.Repository.lambda$readAttributeFamilies$17(Repository.java:553)
   at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691)
   at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
   at cz.o2.proxima.repository.Repository.readAttributeFamilies(Repository.java:543)
   at cz.o2.proxima.repository.Repository.<init>(Repository.java:242)
   at cz.o2.proxima.repository.Repository.<init>(Repository.java:58)
   at cz.o2.proxima.repository.Repository$Builder.build(Repository.java:120)
   at cz.o2.proxima.repository.Repository.of(Repository.java:67)
   at cz.o2.proxima.server.IngestServer.<init>(IngestServer.java:440)
   at cz.o2.proxima.server.IngestServer.main(IngestServer.java:84)

The correct explanation would be more like "Do not use bulk storages for primary storage".

Query language

In order to be able to perform at least basic analytical queries against the data stored in the platform we need to extend our query language to support batch queries.

Currently the query language is really basic and not flexible enough, e.g.

env.gateway.device.streamFromOldest().filter({ it.attribute != "device.ZWAVE_1" && it.value != null }).windowAll().reduce({ new Tuple(it.key, it.attribute) }, null, { a, b -> b }).filter({ it.second != null }).map({ it.second }).filter({ !it.value.payload.contains("Smoke Sensor") && !it.value.payload.contains("Door/Window Sensor") && !it.value.payload.contains("Power Meter") }).forEach({ println it.key + ": " + it.value.payload })

We need to develop query language that is easy to use and has sufficient ability to express the analytical queries.

Fix dependencies

Some artifacts include conflicting dependencies (i.e. netty, commons, ...). This issues is about fixing these dependencies (by exclusions).

Wildcard attributes can be listed from console only from list-primary-key families

We have to split attrToReader to two maps, one of which will be used for listing of primary keys and the other for listing wildcard attributes.

Make sure all interface correctly extend and implement AutoCloseable

Interfaces that require closing must extend AutoCloseable.

Cleanup: remove Serializable from writers and readers

Only DataAccessor should be Serializable and all implementations should work as factories returning non serializable readers and writers. This is due to the fact, that the clients are rarely Serializable and therefore make implementation of the Serializable interface complicated (using transient fields and lazy initialization).

Error on close of bulk to gcloud storage

java.io.IOException: write beyond end of stream
	at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:201)
	at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
	at cz.o2.proxima.gcloud.storage.BinaryBlob$Writer.writeBytes(BinaryBlob.java:69)
	at cz.o2.proxima.gcloud.storage.BinaryBlob$Writer.write(BinaryBlob.java:86)
	at cz.o2.proxima.gcloud.storage.BulkGCloudStorageWriter.write(BulkGCloudStorageWriter.java:120)
	at cz.o2.proxima.server.IngestServer$3.lambda$writeInternal$1(IngestServer.java:870)
	at net.jodah.failsafe.Functions$10.call(Functions.java:252)
	at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
	at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:81)
	at cz.o2.proxima.server.IngestServer$3.writeInternal(IngestServer.java:868)
	at cz.o2.proxima.server.IngestServer$3.onNextInternal(IngestServer.java:859)
	at cz.o2.proxima.storage.commitlog.RetryableBulkObserver.onNextInternal(RetryableBulkObserver.java:85)
	at cz.o2.proxima.storage.commitlog.RetryableBulkObserver.onNext(RetryableBulkObserver.java:46)
	at cz.o2.proxima.storage.pubsub.PubSubReader.lambda$observeBulk$1(PubSubReader.java:146)
	at cz.o2.proxima.storage.pubsub.PubSubReader.lambda$consume$2(PubSubReader.java:232)
	at cz.o2.proxima.pubsub.shaded.com.google.cloud.pubsub.v1.MessageDispatcher$4.run(MessageDispatcher.java:405)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Break down AttributeDescriptor not to contain writers and scheme serializers

Redesign Repository so that it contains separate descriptors of entities, attributes, attribute families and views and has separate methods to retrieve all runtime needed accessors (writers, readers, view accessors, scheme serializers, ...).

Access in attribute family should be array

Extend setting for access in attribute family specification to be array of strings.

Handle offset commit and reset in sources

We need to be able to correctly handle offset committing and rewinding in places where sources are converted to Euphoria. This applies currently to tools and partitionedview.

Add HBase/BigTable module

Add Google Cloud Storage IO

Add bulk IO for storing data to Google Cloud Storage.

Add health check metric

Add health check that will signal ansible playbook that the instance is operating normally and that it can proceed to another instance to prevent possibility of rolling install bringing the whole application down.

Some tests are flaky

These tests fail from time to time.

testWrite[0](cz.o2.proxima.gcloud.storage.BulkGCloudStorageWriterTest)  Time elapsed: 6.134 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 2000 milliseconds
	at cz.o2.proxima.gcloud.storage.BulkGCloudStorageWriterTest.testWrite(BulkGCloudStorageWriterTest.java:137)

testWrite[1](cz.o2.proxima.gcloud.storage.BulkGCloudStorageWriterTest)  Time elapsed: 6.117 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 2000 milliseconds
	at cz.o2.proxima.gcloud.storage.BulkGCloudStorageWriterTest.testWrite(BulkGCloudStorageWriterTest.java:137)

Documentation enhacements

Document:

types of storages and their capabilities
groovy generated code and its methods

Create partitioned cached view

The platform's inputs have to have an option to consume somehow partitioned view of updates to some attribute(s). We will achieve this by specifying (optional) implementation of this on level of attribute family. The view will be updatable and will have callbacks to be called when internal data are updated.

Add hadoop-native to docker-compose

The example is not working properly when writing data to HDFS due to missing hadoop-native. We should fix this.

Create implicit private replication attributes and replications

Subissue of #94

Add support for marking StreamElement as replicated

In order to be able to consistently replicate data between two environments in master-master fashion, we need to be able to distinguish between inputs that have already been replicated, because otherwise we might end up in replication loops.

Add support for creating entites as replicas of other entity

Example:

 entitites {
   event {
     attributes {
        click: { ... }
        view: { ... }
      }
    }
    eventReplica {
      from: event
    }
  }

This will create two entities with identical attributes.

Create RandomAccessView

Create abstraction for views of commit log as RandomAccessReader.

Enable attribute family to be persisted by stateful operation(s)

Currently, each attribute family is persisted via stateless element-wise (map) operation. For some use-cases, this is not sufficient (HFile bulk importing). We will relax this restriction by the following:

annotate attribute family with pipeline field which will declare class name which will take as input PCollection<StreamElement> and will persist it into desired storage
when any attribute family is annotated with the pipeline keyword, loading of such config into replication (ingest) server must fail
create replication-pipeline module, which will create single Beam's Pipeline from the config that upon submission will perform all replications

Add option to specify source attribute family for replication

When attribute family is type replica add optional from which will hold name of source attribute family.

Add websocket source

Add source that reads streaming data from websocket (single partition, no commits or checkpoints).

Implement full contract of RandomAccessReader for Cassandra

Currently, cassandra random accessor has not full contract implemented, fix this.

Add `json-proto` scheme

Scheme reading from raw jsons and presenting these data as protobuffers.

Terminated thread pool in cassandra io

Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: 2018-02-20 10:17:14 ERROR CassandraWriter:53 - Failed to ingest record StreamElement(uuid=9ed40715-15e8-4c68-b21b-8e6a04747c76, entityDesc=EntityDescriptor(comm_message), attributeDesc=AttributeDescriptor(entity=comm_message, name=audit), key=d6a75626-161e-11e8-aba4-00505602142b, attribute=audit, stamp=1519118234096, value.length=672) into cassandra
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: com.datastax.driver.core.exceptions.DriverInternalError: Unexpected exception thrown
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:39)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:104)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.storage.cassandra.CacheableCQLFactory.prepare(CacheableCQLFactory.java:340)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.storage.cassandra.CacheableCQLFactory.getPreparedStatement(CacheableCQLFactory.java:150)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.storage.cassandra.DefaultCQLFactory.elementInsert(DefaultCQLFactory.java:164)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.storage.cassandra.DefaultCQLFactory.getWriteStatement(DefaultCQLFactory.java:110)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.storage.cassandra.CassandraWriter.write(CassandraWriter.java:47)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.server.IngestServer.ingestRequest(IngestServer.java:589)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.server.IngestServer.writeRequest(IngestServer.java:515)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.server.IngestServer.processSingleIngest(IngestServer.java:476)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.server.IngestServer.access$000(IngestServer.java:82)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.server.IngestServer$IngestService.ingest(IngestServer.java:110)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.pexeso.server.IngestServer$BackwardsCompatibleIngestService.ingest(IngestServer.java:50)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.pexeso.proto.service.IngestServiceGrpc$MethodHandlers.invoke(IngestServiceGrpc.java:330)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:174)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:616)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:107)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.lang.Thread.run(Thread.java:748)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: Caused by: java.util.concurrent.RejectedExecutionException: Task cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.Futures$2$1@7e0ed157 rejected from java.util.concurrent.ThreadPoolExecutor@2c8fb64f[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:556)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.Futures$2.execute(Futures.java:1174)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:595)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at cz.o2.proxima.cassandra.shaded.com.google.common.util.concurrent.Futures.transformAsync(Futures.java:1153)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.GuavaCompatibility$Version19OrHigher.transformAsync(GuavaCompatibility.java:210)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.SessionManager.toPreparedStatement(SessionManager.java:196)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.SessionManager.prepareAsync(SessionManager.java:157)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.AbstractSession.prepareAsync(AbstractSession.java:126)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:102)
Feb 20 10:17:14 sb-ingest3 pexeso-ingest[1110]: #011... 20 more

Multidimensional wildcard attributes

It seems unavoidable to enable multidimensional wildcard attributes. Currently, the platform supports only single dimensional (i.e. attribute.<suffix>), we need to enable multiple dimensions by attribute.<first>.<second>.... with appropriate semantics including prefix listing, etc.

Implement full contract of RandomAccessReader for HBase

RandomHBaseReader has not fully implemented RandomAccessReader.

Add data processing API to generated code

The code generated by config compiler should generate access methods for Datasets. Transformations applied to these datasets should then be possible to execute on different executors using Euphoria API. Basically, this means moving parts of code from tools to core.

PubSubReader: IndexOutOfBoundsException

At certain conditions the PubSubReader might throw following exception:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:657)
	at java.util.ArrayList.get(ArrayList.java:433)
	at cz.o2.proxima.storage.pubsub.PubSubReader.lambda$null$2(PubSubReader.java:197)
	at java.util.concurrent.atomic.AtomicReference.updateAndGet(AtomicReference.java:179)
	at cz.o2.proxima.storage.pubsub.PubSubReader.lambda$null$3cafaa0b$1(PubSubReader.java:181)
	at cz.o2.proxima.storage.commitlog.BulkLogObserver$OffsetCommitter.confirm(BulkLogObserver.java:57)
	at cz.o2.proxima.source.UnboundedStreamSource$1.lambda$commitOffset$0(UnboundedStreamSource.java:143)
	at java.util.ArrayList.forEach(ArrayList.java:1255)
	at cz.o2.proxima.source.UnboundedStreamSource$1.commitOffset(UnboundedStreamSource.java:143)
	at cz.o2.proxima.source.UnboundedStreamSource$1.commitOffset(UnboundedStreamSource.java:103)
	at cz.seznam.euphoria.executor.local.LocalExecutor$UnboundedPartitionSupplierStream.get(LocalExecutor.java:119)
	at cz.seznam.euphoria.executor.local.LocalExecutor.lambda$execMap$7(LocalExecutor.java:628)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

o2-czech-republic / proxima-platform Goto Github PK

proxima-platform's Issues

Recommend Projects

Recommend Topics

Recommend Org