Code Monkey home page Code Monkey logo

hydro-serving's People

Contributors

akastav avatar andriilatysh avatar beardedwhale avatar bulbawarrior avatar dmitriyisaev avatar dos65 avatar github-actions[bot] avatar gitter-badger avatar guzel738 avatar hydrorobot avatar kineticcookie avatar kiriliakovliev avatar mitrofanov avatar mkf-simpson avatar nufusrufus avatar pyct avatar roymprog avatar spushkarev avatar tidylobster avatar valenzione avatar vixtir avatar zajs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hydro-serving's Issues

Oversized serving input

For data.json file with 7.1 mb contents, manager fails to serve model.
Stacktrace:

[ERROR] i.h.s.m.c.ModelServiceController i.h.s.c.ServingDataDirectives$$anonfun$completeExecutionResult$2.apply.25 Serving failed
akka.http.scaladsl.model.EntityStreamSizeException: EntityStreamSizeException: actual entity size (Some(38515045)) exceeded content length limit (8388608 bytes)! You can configure this by setting `akka.http.[server|client].parsing.max-content-length` or calling `HttpEntity.withSizeLimit` before materializing the dataBytes stream.
	at akka.http.scaladsl.model.HttpEntity$Limitable$$anon$1.preStart(HttpEntity.scala:609) ~[akka-http-core_2.11-10.0.9.jar:?]
	at akka.stream.impl.fusing.GraphInterpreter.init(GraphInterpreter.scala:290) ~[akka-stream_2.11-2.5.3.jar:?]
	at akka.stream.impl.fusing.GraphInterpreterShell.init(ActorGraphInterpreter.scala:540) ~[akka-stream_2.11-2.5.3.jar:?]
	at akka.stream.impl.fusing.ActorGraphInterpreter.tryInit(ActorGraphInterpreter.scala:659) ~[akka-stream_2.11-2.5.3.jar:?]
	at akka.stream.impl.fusing.ActorGraphInterpreter.preStart(ActorGraphInterpreter.scala:707) ~[akka-stream_2.11-2.5.3.jar:?]
	at akka.actor.Actor$class.aroundPreStart(Actor.scala:521) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.stream.impl.fusing.ActorGraphInterpreter.aroundPreStart(ActorGraphInterpreter.scala:650) ~[akka-stream_2.11-2.5.3.jar:?]
	at akka.actor.ActorCell.create(ActorCell.scala:591) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:484) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.Mailbox.run(Mailbox.scala:223) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234) ~[akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.3.jar:?]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.3.jar:?]

Note that log says that manager got 38515045 bytes, while data.json size is 7132595 bytes.

Update
This issue shows up when model runtime (in this case VGG 300 TF model) returns huge response.

Fix

Set akka.http.[server|client].parsing.max-content-length parameter to ~100mb.

Can't create a service, if the same model is used in another service.

[INFO ] c.s.d.c.DefaultDockerClient c.s.d.c.DefaultDockerClient.createContainer.635 Creating container with ContainerConfig: ContainerConfig{hostname=null, domainname=null, user=null, attachStdin=null, attachStdout=null, attachStderr=null, portSpecs=null, exposedPorts=null, tty=null, openStdin=null, stdinOnce=null, env=null, cmd=null, image=kek:1, volumes={/model={}}, workingDir=null, entrypoint=null, networkDisabled=null, onBuild=null, labels={MODEL_VERSION_ID=1, SERVICE_ID=0, MODEL_NAME=kek, MODEL_TYPE=spark:2.1, MODEL_VERSION=1, HS_SERVICE_MARKER=HS_SERVICE_MARKER, DEPLOYMENT_TYPE=MODEL}, macAddress=null, hostConfig=null, stopSignal=null, healthcheck=null, networkingConfig=null}
[ERROR] i.h.s.m.ManagerHttpApi i.h.s.m.ManagerHttpApi$$anonfun$1.applyOrElse.74 Request error: POST unix://localhost:80/containers/create?name=s0modelkek: 409, body: {"message":"Conflict. The container name \"/s0modelkek\" is already in use by container \"0905c30d604b1a869ba26404758bedbc121bb8bee2c1823edabe0c591b9270d3\". You have to remove (or rename) that container to be able to reuse that name."}

com.spotify.docker.client.exceptions.DockerRequestException: Request error: POST unix://localhost:80/containers/create?name=s0modelkek: 409, body: {"message":"Conflict. The container name \"/s0modelkek\" is already in use by container \"0905c30d604b1a869ba26404758bedbc121bb8bee2c1823edabe0c591b9270d3\". You have to remove (or rename) that container to be able to reuse that name."}

	at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2503) ~[docker-client-8.8.0.jar:8.8.0]
	at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2453) ~[docker-client-8.8.0.jar:8.8.0]
	at com.spotify.docker.client.DefaultDockerClient.createContainer(DefaultDockerClient.java:638) ~[docker-client-8.8.0.jar:8.8.0]
	at io.hydrosphere.serving.manager.service.clouddriver.LocalCloudDriverService.io$hydrosphere$serving$manager$service$clouddriver$LocalCloudDriverService$$startModel(LocalCloudDriverService.scala:47) ~[classes/:?]
	at io.hydrosphere.serving.manager.service.clouddriver.LocalCloudDriverService$$anonfun$deployService$1$$anonfun$3.apply(LocalCloudDriverService.scala:74) ~[classes/:?]
	at io.hydrosphere.serving.manager.service.clouddriver.LocalCloudDriverService$$anonfun$deployService$1$$anonfun$3.apply(LocalCloudDriverService.scala:74) ~[classes/:?]
	at scala.Option.map(Option.scala:146) ~[scala-library-2.11.11.jar:?]
	at io.hydrosphere.serving.manager.service.clouddriver.LocalCloudDriverService$$anonfun$deployService$1.apply(LocalCloudDriverService.scala:74) ~[classes/:?]
	at io.hydrosphere.serving.manager.service.clouddriver.LocalCloudDriverService$$anonfun$deployService$1.apply(LocalCloudDriverService.scala:72) ~[classes/:?]
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) ~[scala-library-2.11.11.jar:?]
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run$$$capture(Future.scala:24) ~[scala-library-2.11.11.jar:?]
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala) ~[scala-library-2.11.11.jar:?]
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.8.jar:?]
Caused by: javax.ws.rs.ClientErrorException: HTTP 409 Conflict
	at org.glassfish.jersey.client.JerseyInvocation.createExceptionForFamily(JerseyInvocation.java:1044) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1027) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:816) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:92) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation$5.completed(JerseyInvocation.java:773) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:198) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.ClientRuntime.access$300(ClientRuntime.java:79) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:180) ~[jersey-client-2.22.2.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:340) ~[jersey-common-2.22.2.jar:?]
	at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:210) ~[jersey-client-2.22.2.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call$$$capture(Executors.java:511) ~[?:1.8.0_131]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java) ~[?:1.8.0_131]
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) ~[?:1.8.0_131]
	at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

Update readme on how to launch a demo

Step 1 in "How to launch demo" in readme is not reproducible, since build.sh doesn't exist in repo. We need to update this or put a link to RUN_DUMMY_DEMO.MD

Vectors are returning blank

Using the 0.0.18 release and vectors come back blank. tfidf and cv_tf should both return vectors.

[
{
"filtered": [
"foo"
],
"UU_EVAL_RESP_FTEXT": "foo",
"tfiddf": [],
"cv_tf": [],
"tokens": [
"foo"
]
}
]

Incorrect S3 source upload/cache

  • Occasional build errors if model source is S3 bucket.
    Sometimes build service couldn't find model directory and it fails.

  • Need to keep sync between bucket and local cache.
    After model is uploaded to S3 source, manager needs to copy it to the local source on every upload request. Since we removed SQS watcher service, now model state in S3 and in local cache is sometimes inconsistent and desynced

S3SourceWatcher initialisation error

[ERROR] a.a.OneForOneStrategy a.e.s.Slf4jLogger$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp.72 key not found: Records
java.util.NoSuchElementException: key not found: Records
    at scala.collection.MapLike$class.default(MapLike.scala:228) ~[scala-library-2.11.11.jar:?]
    at scala.collection.AbstractMap.default(Map.scala:59) ~[scala-library-2.11.11.jar:?]
    at scala.collection.MapLike$class.apply(MapLike.scala:141) ~[scala-library-2.11.11.jar:?]
    at scala.collection.AbstractMap.apply(Map.scala:59) ~[scala-library-2.11.11.jar:?]
    at io.hydrosphere.serving.manager.actor.modelsource.S3SourceWatcher$SQSMessage$.fromJson(S3SourceWatcher.scala:81) ~[manager.jar:0.0.1]
    at io.hydrosphere.serving.manager.actor.modelsource.S3SourceWatcher$$anonfun$2.apply(S3SourceWatcher.scala:25) ~[manager.jar:0.0.1]
    at io.hydrosphere.serving.manager.actor.modelsource.S3SourceWatcher$$anonfun$2.apply(S3SourceWatcher.scala:25) ~[manager.jar:0.0.1]
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ~[scala-library-2.11.11.jar:?]
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ~[scala-library-2.11.11.jar:?]
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library-2.11.11.jar:?]
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) ~[scala-library-2.11.11.jar:?]
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) ~[scala-library-2.11.11.jar:?]
    at scala.collection.AbstractTraversable.map(Traversable.scala:104) ~[scala-library-2.11.11.jar:?]
    at io.hydrosphere.serving.manager.actor.modelsource.S3SourceWatcher.onWatcherTick(S3SourceWatcher.scala:25) ~[manager.jar:0.0.1]
    at io.hydrosphere.serving.manager.actor.modelsource.SourceWatcher$$anonfun$watcherTick$1.applyOrElse(SourceWatcher.scala:38) ~[manager.jar:0.0.1]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) ~[scala-library-2.11.11.jar:?]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:513) ~[akka-actor_2.11-2.5.3.jar:?]
    at io.hydrosphere.serving.manager.actor.modelsource.S3SourceWatcher.aroundReceive(S3SourceWatcher.scala:17) ~[manager.jar:0.0.1]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527) [akka-actor_2.11-2.5.3.jar:?]
    at akka.actor.ActorCell.invoke(ActorCell.scala:496) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.Mailbox.run(Mailbox.scala:224) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.3.jar:?]
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.3.jar:?]

managerui exited with code 1

manager | APP_OPTS=-Dapplication.grpcPort=9091 -Dapplication.port=9090 -DopenTracing.zipkin.enabled=false -DopenTracing.zipkin.port=9411 -DopenTracing.zipkin.host=zipkin -Dmanager.advertisedHost=manager -Dmanager.advertisedPort=9091 -Ddatabase.jdbcUrl=jdbc:postgresql://postgres:5432/docker -Ddatabase.username=docker -Ddatabase.password=docker -DcloudDriver.docker.networkName=demo_hydronet -DdockerRepository.type=local -Dapplication.shadowingOn=false -Dsidecar.adminPort=8082 -Dsidecar.ingressPort=8080 -Dsidecar.egressPort=8081 -Dsidecar.host=sidecar
postgres | LOG: database system is ready to accept connections
postgres | LOG: autovacuum launcher started
sidecar | [2018-07-26 08:52:31.494][1][info][main] source/server/server.cc:178] initializing epoch 0 (hot restart version=9.200.16384.227.options=capacity=16384, num_slots=8209 hash=228984379728933363)
manager | Archive: /hydro-serving/app/lib/jffi-1.2.9-native.jar
sidecar | [2018-07-26 08:52:31.532][1][warning][upstream] source/common/config/grpc_mux_impl.cc:205] gRPC config stream closed: 1,
sidecar | [2018-07-26 08:52:31.532][1][warning][upstream] source/common/config/grpc_mux_impl.cc:36] Unable to establish new stream
sidecar | [2018-07-26 08:52:31.532][1][info][config] source/server/configuration_impl.cc:52] loading 0 listener(s)
sidecar | [2018-07-26 08:52:31.532][1][info][config] source/server/configuration_impl.cc:92] loading tracing configuration
sidecar | [2018-07-26 08:52:31.532][1][info][config] source/server/configuration_impl.cc:119] loading stats sink configuration
sidecar | [2018-07-26 08:52:31.533][1][info][main] source/server/server.cc:353] starting main dispatch loop
sidecar | [2018-07-26 08:52:31.536][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:127] cm init: initializing cds
manager | creating: jni/x86_64-Linux/
manager | inflating: jni/x86_64-Linux/libjffi-1.2.so
managerui | rm: can't remove '/etc/nginx/conf.d/default.conf': No such file or directory
managerui exited with code 1

Application's contract doesn't update properly

When contract of the model changes (for example, we changed some field's name), and we update the application that uses that model, referred contract example is not being updated for the application (when clicking on the Test button).

Referred contract example:
image

Model's true contract
image

Can't create an Application

  1. Add custom runtime:
{
  "name": "hydrosphere/serving-grpc-runtime-spark-2_1",
  "version": "0.0.1",
  "modelTypes": [
    "spark:2.1"
  ],
  "tags": [
    "string"
  ],
  "configParams": {}
}
  1. Build a model (word2vec for Spark 2.1.0)
{
  "modelId": 14
}
  1. Build an Application:
{
  "id": 0,
  "name": "testapp",
  "executionGraph": {
    "stages": [
      {
        "services": [
          {
            "serviceDescription": {
              "runtimeId": 2,
              "modelVersionId": 1,
              "environmentId": 0
            },
            "weight": 100
          }
        ],
        "signatureName": "string"
      }
    ]
  }
}

Result:

[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 1/6 : FROM busybox:1.28.0,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> 5b0d59026729
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 2/6 : LABEL MODEL_TYPE=spark:2.1,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> Running in b5f914e59880
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Removing intermediate container b5f914e59880
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> df1966d1a6e8
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 3/6 : LABEL MODEL_NAME=word2vec,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> Running in 2172c7a57456
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Removing intermediate container 2172c7a57456
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> 5d32ba9e6aff
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 4/6 : LABEL MODEL_VERSION=None,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> Running in 877af9e80f15
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Removing intermediate container 877af9e80f15
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> b007c4cec8f3
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 5/6 : VOLUME /model,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> Running in e430f440c583
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Removing intermediate container e430f440c583
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> ede9188f3e2a
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Step 6/6 : ADD model /model,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null, ---> 8739a8be79c8
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,null,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Successfully built 8739a8be79c8
,null,null,None)
[INFO ] i.h.s.m.s.ModelManagementServiceImpl i.h.s.m.s.ModelManagementServiceImpl$$anon$1.handle.257 ProgressMessage(null,null,Successfully tagged word2vec:1
,null,null,None)
[INFO ] c.s.d.c.DefaultDockerClient c.s.d.c.DefaultDockerClient.createContainer.635 Creating container with ContainerConfig: ContainerConfig{hostname=null, domainname=null, user=null, attachStdin=null, attachStdout=null, attachStderr=null, portSpecs=null, exposedPorts=null, tty=null, openStdin=null, stdinOnce=null, env=null, cmd=null, image=word2vec:1, volumes={/model={}}, workingDir=null, entrypoint=null, networkDisabled=null, onBuild=null, labels={MODEL_VERSION_ID=1, SERVICE_ID=1, MODEL_NAME=word2vec, MODEL_TYPE=spark:2.1, MODEL_VERSION=1, HS_SERVICE_MARKER=HS_SERVICE_MARKER, DEPLOYMENT_TYPE=MODEL}, macAddress=null, hostConfig=null, stopSignal=null, healthcheck=null, networkingConfig=null}
[INFO ] c.s.d.c.DefaultDockerClient c.s.d.c.DefaultDockerClient.createContainer.635 Creating container with ContainerConfig: ContainerConfig{hostname=null, domainname=null, user=null, attachStdin=null, attachStdout=null, attachStderr=null, portSpecs=null, exposedPorts=[9091], tty=null, openStdin=null, stdinOnce=null, env=[SIDECAR_PORT=8081, SERVICE_ID=1, MODEL_DIR=/model, SIDECAR_HOST=192.168.90.61, APP_PORT=9091], cmd=null, image=hydrosphere/serving-grpc-runtime-spark-2_1:0.0.1, volumes={}, workingDir=null, entrypoint=null, networkDisabled=null, onBuild=null, labels={SERVICE_ID=1, RUNTIME_ID=2, SERVICE_NAME=r2m1e0, HS_SERVICE_MARKER=HS_SERVICE_MARKER, DEPLOYMENT_TYPE=APP}, macAddress=null, hostConfig=HostConfig{binds=null, blkioWeight=null, blkioWeightDevice=null, blkioDeviceReadBps=null, blkioDeviceWriteBps=null, blkioDeviceReadIOps=null, blkioDeviceWriteIOps=null, containerIdFile=null, lxcConf=null, privileged=null, portBindings={9091=[PortBinding{hostIp=0.0.0.0, hostPort=}]}, links=null, publishAllPorts=null, dns=null, dnsOptions=null, dnsSearch=null, extraHosts=null, volumesFrom=[s1modelword2vec], capAdd=null, capDrop=null, networkMode=null, securityOpt=null, devices=null, memory=null, memorySwap=null, memorySwappiness=null, memoryReservation=null, nanoCpus=null, cpuPeriod=null, cpuShares=null, cpusetCpus=null, cpusetMems=null, cpuQuota=null, cgroupParent=null, restartPolicy=null, logConfig=null, ipcMode=null, ulimits=null, pidMode=null, shmSize=null, oomKillDisable=null, oomScoreAdj=null, autoRemove=null, pidsLimit=null, tmpfs=null, readonlyRootfs=null, storageOpt=null}, stopSignal=null, healthcheck=null, networkingConfig=null}
[INFO ] c.s.d.c.DefaultDockerClient c.s.d.c.DefaultDockerClient.startContainer.657 Starting container with Id: 853ad5441d894760c17d2aedea93a011c97da1c93d4a5033afd643bd2df2272b
[ERROR] i.h.s.m.ManagerHttpApi i.h.s.m.ManagerHttpApi$$anonfun$1.applyOrElse.74 empty.head
java.lang.UnsupportedOperationException: empty.head
	at scala.collection.immutable.Vector.head(Vector.scala:193) ~[scala-library-2.11.11.jar:?]
	at io.hydrosphere.serving.manager.service.ApplicationManagementServiceImpl$$anonfun$inferAppContract$1.apply(ApplicationManagementService.scala:306) ~[classes/:?]
	at io.hydrosphere.serving.manager.service.ApplicationManagementServiceImpl$$anonfun$inferAppContract$1.apply(ApplicationManagementService.scala:305) ~[classes/:?]
	at scala.util.Success$$anonfun$map$1.apply(Try.scala:237) ~[scala-library-2.11.11.jar:?]
	at scala.util.Try$.apply(Try.scala:192) ~[scala-library-2.11.11.jar:?]
	at scala.util.Success.map(Try.scala:237) ~[scala-library-2.11.11.jar:?]
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237) ~[scala-library-2.11.11.jar:?]
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237) ~[scala-library-2.11.11.jar:?]
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) [scala-library-2.11.11.jar:?]
	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) [akka-actor_2.11-2.5.8.jar:?]
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) [scala-library-2.11.11.jar:?]
	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:43) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.8.jar:?]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.8.jar:?]

When I try to recreate the same application:

[ERROR] i.h.s.m.ManagerHttpApi i.h.s.m.ManagerHttpApi$$anonfun$1.applyOrElse.74 ERROR: duplicate key value violates unique constraint "service_service_name_key"
  Detail: Key (service_name)=(r2m1e0) already exists.
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "service_service_name_key"
  Detail: Key (service_name)=(r2m1e0) already exists.
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2477) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169) ~[postgresql-42.1.4.jar:42.1.4]
	at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:136) ~[postgresql-42.1.4.jar:42.1.4]
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) ~[HikariCP-2.6.3.jar:?]
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.3.jar:?]
	at slick.jdbc.JdbcActionComponent$InsertActionComposerImpl$SingleInsertAction$$anonfun$run$8.apply(JdbcActionComponent.scala:509) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcActionComponent$InsertActionComposerImpl$SingleInsertAction$$anonfun$run$8.apply(JdbcActionComponent.scala:506) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcBackend$SessionDef$class.withPreparedInsertStatement(JdbcBackend.scala:378) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcBackend$BaseSession.withPreparedInsertStatement(JdbcBackend.scala:433) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcActionComponent$ReturningInsertActionComposerImpl.preparedInsert(JdbcActionComponent.scala:638) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcActionComponent$InsertActionComposerImpl$SingleInsertAction.run(JdbcActionComponent.scala:506) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcActionComponent$SimpleJdbcProfileAction.run(JdbcActionComponent.scala:29) ~[slick_2.11-3.2.1.jar:?]
	at slick.jdbc.JdbcActionComponent$SimpleJdbcProfileAction.run(JdbcActionComponent.scala:26) ~[slick_2.11-3.2.1.jar:?]
	at slick.basic.BasicBackend$DatabaseDef$$anon$2.liftedTree1$1(BasicBackend.scala:242) ~[slick_2.11-3.2.1.jar:?]
	at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:242) ~[slick_2.11-3.2.1.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

UI improvements

  • Return all apllications for a specific ModelVersion

  • Return all signatures for a specific ModelVersion

  • Return all datatypes

Deprecated readme - cannot launch, incomplete docker-compose

Could you please update readme.md, because it provides inconsistent information about how to launch ML Lambda. Your docker-compose only contains postgres+manager whereas standalone commands start a few other services too. Docker-compose should contain everything, so new users could get up and running quickly.

Also, you should probably use docker network instead of exposing everything via host network (you put HOST_IP environment variable in docker-compose.yml file - why?)

One last thing is that you could just push everything to Docker HUB, so new users will not be forced to build everything on their machines.

Update model bug

API returns that nextVersion available for all builded models after model update.

Getting error when trying to launch latest code.

the code keeps hanging on on this line.

package io.hydrosphere.serving.manager.service.clouddriver
val containerApp = map.getOrElse(DEPLOYMENT_TYPE_APP, throw new RuntimeException(s"Can't find APP for service $serviceId in $seq"))
val containerModel = map.get(DEPLOYMENT_TYPE_MODEL)

here are the logs

Detected a modification of naivebayes model ...

2018-02-02T02:04:08.904425412Z [ERROR] a.a.OneForOneStrategy a.e.s.Slf4jLogger$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp.69 Can't find APP for service -20 in ArrayBuffer(Container{id=943746c801f22a7327046483fe2e902cd7a140e81d2c30b0b60237668662f6f5, names=[/manager], image=hydrosphere/serving-manager:latest, imageId=sha256:5b64769a1afaec201315228f129c68f9d748d17dcb8892336c1265b6d4995f5a, command=/hydro-serving/app/start.sh, created=1517537030, state=running, status=Up 15 seconds, ports=[PortMapping{privatePort=8080, publicPort=8080, type=tcp, ip=0.0.0.0}, PortMapping{privatePort=8081, publicPort=8081, type=tcp, ip=0.0.0.0}, PortMapping{privatePort=8082, publicPort=8082, type=tcp, ip=0.0.0.0}, PortMapping{privatePort=9090, publicPort=9090, type=tcp, ip=0.0.0.0}, PortMapping{privatePort=9091, publicPort=9091, type=tcp, ip=0.0.0.0}], labels={APP=dev, HS_SERVICE_MARKER=HS_SERVICE_MARKER, MODEL=dev, MODEL_NAME=manager, MODEL_VERSION=latest, RUNTIME_TYPE_NAME=hysroserving-java, RUNTIME_TYPE_VERSION=latest, com.docker.compose.config-hash=4f0db1769543bf239bc78c5d86bec95419c1333fcf20860c1c3d919a4b84c3a3, com.docker.compose.container-number=1, com.docker.compose.oneoff=False, com.docker.compose.project=automation, com.docker.compose.service=manager, com.docker.compose.version=1.18.0, hydroServingServiceId=-20}, sizeRw=null, sizeRootFs=null, networkSettings=NetworkSettings{ipAddress=null, ipPrefixLen=null, gateway=null, bridge=null, portMapping=null, ports={}, macAddress=null, networks={automation_extnet=AttachedNetwork{aliases=null, networkId=691539ea57f31850a906f68150bcfeb7b6df61185ed71fb22fe20a8e1979927d, endpointId=172441d34b4990d656433199d99d1fa8b2ddec1d2cbf1ab05d391939486c975f, gateway=172.18.0.1, ipAddress=172.18.0.3, ipPrefixLen=16, ipv6Gateway=, globalIPv6Address=, globalIPv6PrefixLen=0, macAddress=02:42:ac:12:00:03}, automation_hydronet=AttachedNetwork{aliases=null, networkId=0946e51564a0333f2444f003060769f38bb5fc2cc541e1199a82f3aeb4aefe9a, endpointId=edec4e1ab6ef73e279cd22dd9c785554c10ee8a5cd87f0bace72f2d899f30b81, gateway=172.16.0.1, ipAddress=172.16.0.5, ipPrefixLen=24, ipv6Gateway=, globalIPv6Address=, globalIPv6PrefixLen=0, macAddress=02:42:ac:10:00:05}}, endpointId=null, sandboxId=null, sandboxKey=null, hairpinMode=null, linkLocalIPv6Address=null, linkLocalIPv6PrefixLen=null, globalIPv6Address=null, globalIPv6PrefixLen=null, ipv6Gateway=null}, mounts=[ContainerMount{type=bind, name=null, source=/opt/hydro-serving/integrations/automation/hydro-serving-runtime/models, destination=/models, driver=null, mode=rw, rw=true, propagation=rprivate}, ContainerMount{type=bind, name=null, source=/var/run/docker.sock, destination=/var/run/docker.sock, driver=null, mode=rw, rw=true, propagation=rprivate}]})
�z2018-02-02T02:04:08.904474888Z akka.actor.ActorInitializationException: akka://manager/user/$e: exception during creation
�2018-02-02T02:04:08.904478772Z at akka.actor.ActorInitializationException$.apply(Actor.scala:193) ~[akka-actor_2.11-2.5.8.jar:?]
�s2018-02-02T02:04:08.904481451Z at akka.actor.ActorCell.create(ActorCell.scala:608) ~[akka-actor_2.11-2.5.8.jar:?]
�w2018-02-02T02:04:08.904483738Z at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462) [akka-actor_2.11-2.5.8.jar:?]
�x2018-02-02T02:04:08.904485976Z at akka.actor.ActorCell.systemInvoke(ActorCell.scala:484) [akka-actor_2.11-2.5.8.jar:?]
�2018-02-02T02:04:08.904488321Z at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282) [akka-actor_2.11-2.5.8.jar:?]
�n2018-02-02T02:04:08.904490581Z at akka.dispatch.Mailbox.run(Mailbox.scala:223) [akka-actor_2.11-2.5.8.jar:?]
�o2018-02-02T02:04:08.904492783Z at akka.dispatch.Mailbox.exec(Mailbox.scala:234) [akka-actor_2.11-2.5.8.jar:?]
�2018-02-02T02:04:08.904495002Z at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [akka-actor_2.11-2.5.8.jar:?]
�2018-02-02T02:04:08.904497252Z at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [akka-actor_2.11-2.5.8.jar:?]
�2018-02-02T02:04:08.904499506Z at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [akka-actor_2.11-2.5.8.jar:?]
�2018-02-02T02:04:08.904501736Z at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.8.jar:?]

Cant start Gateway

I get this error over and over when trying to start the gateway service.

018-01-22T05:45:32.966950578Z [ERROR] io.hydrosphere.serving.gateway.actor.PipelineSynchronizeActor Unsupported Content-Type, supported: application/json WARNING arguments left: 1

Error handling

As in the current version, manager doesn't have a unified error handling.
It'll be better to always return JSONs (as for now, it's mixed) and handle error gracefully.

Application creation is not stable

If there are containers, that are using already reserved names, manager can't create an application. In addition to that, manager creates a db entries for incorrect containers, and it messes with whole application API.

Won't build the model if manager was interrupted during previous build

There is hanging build in db that has running state, but the actual Future that builds it is no longer present. Thus, you can't build a model with such name and version.

Need to somehow handle such cases. Possible solutions:

  1. Start new build with an increased version number of current running build. But there is still a hanging build.
  2. Perform an on-start cleanup of hanging builds.
    2.1. How to detect the hanging build?
    2.2. How to delete it?
    2.3. In case there is several instances, handling such cases would require additional effort.

Model version number stuck on `2`.

Model version number stuck on 2.

dopeshot

Infrastructure data

Manager information

  1. Manager version, or commit SHA.
  2. Manager logs.
    Second upload of model
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null, ---> 5608e9099050
,null,null,None)
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,null,null,null,None)
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,Successfully built 5608e9099050
,null,null,None)
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,Successfully tagged ks-test:2
,null,null,None)

Third upload

[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,null,null,null,None)
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,Successfully built fcfe8947eb60
,null,null,None)
[INFO ] i.h.s.m.s.m.b.InfoProgressHandler$ i.h.s.m.s.m.b.InfoProgressHandler$.handle.37 ProgressMessage(null,null,Successfully tagged ks-test:2
,null,null,None)
  1. Manager deployment command.
    hs upload --host localhost --port 9090
  2. Manager configuration.
    default

Sidecar information

  1. Sidecar version or commit SHA.
    hydrosphere/serving-sidecar:latest
  2. Sidecar logs.
[2018-06-08 08:40:55.761][1][info][main] source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-06-08 08:42:18.424][1][warning][upstream] source/common/config/grpc_mux_impl.cc:205] gRPC config stream closed: 13, 
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.RouteConfiguration failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.Listener failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment failed
[2018-06-08 08:42:18.424][1][warning][config] bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:66] gRPC update for type.googleapis.com/envoy.api.v2.Cluster failed
[2018-06-08 08:48:12.934][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster r3m4e0 starting warming
[2018-06-08 08:48:12.936][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:395] warming cluster r3m4e0 complete
[2018-06-08 08:50:51.626][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:437] removing cluster r3m4e0
[2018-06-08 08:51:08.379][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster r3m5e0 starting warming
[2018-06-08 08:51:08.382][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:395] warming cluster r3m5e0 complete
[2018-06-08 08:56:18.631][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:437] removing cluster r3m5e0
[2018-06-08 08:56:31.216][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster r3m6e0 starting warming
[2018-06-08 08:56:31.218][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:395] warming cluster r3m6e0 complete
[2018-06-08 08:57:12.003][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:437] removing cluster r3m6e0


  1. Sidecar deployment command.
  2. Sidecar configuration.
docker run -e MANAGER_HOST=$HOST_IP -e HOST_PRIVATE_IP=$HOST_IP \
    -e MANAGER_PORT=9091 \
    -e SERVICE_ID=-20 \
    -e SERVICE_NAME="manager" \
    -p 8080:8080 -p 8081:8081 -p 8082:8082 \
    hydrosphere/serving-sidecar:latest

ML data

Runtime

  1. Runtime image (if you got it from public sources)
{
  "name": "hydrosphere/serving-runtime-python",
  "version": "3.6-latest",
  "modelTypes": [
    "string"
  ],
  "tags": [
    "string"
  ],
  "configParams": {}
}
  1. Repository with commit SHA or release tag (if you built it yourself)
  2. Runtime logs.
INFO:PythonRuntimeService:Received inference request: model_spec {
  name: "KblK"
  signature_name: "ks_test"
}
inputs {
  key: "distribution"
  value {
    dtype: DT_STRING
    string_val: "normal"
  }
}
inputs {
  key: "distributionSample"
  value {
    dtype: DT_DOUBLE
    tensor_shape {
      dim {
        size: -1
      }
    }
    double_val: 1.0
  }
}
inputs {
  key: "sample"
  value {
    dtype: DT_DOUBLE
    tensor_shape {
      dim {
        size: -1
      }
    }
    double_val: 1.0
  }
}

INFO:PythonRuntimeService:Answer: outputs {
  key: "ksStatistics"
  value {
    dtype: DT_DOUBLE
    double_val: 0.895
  }
}
outputs {
  key: "pValue"
  value {
    dtype: DT_DOUBLE
    double_val: 0.17860426969764479
  }
}
outputs {
  key: "rejectionLevel"
  value {
    dtype: DT_DOUBLE
    double_val: 1.3633957605919127
  }
}

port 9090 is not exposed

Was not able to reach UI due to unexposed port in serving-manager.
I've just added to fix in local

ports:
  - "9090:9090"

Incorrect model copy

In some cases, files aren't copied to the runtime with preserved directory structure.
In some cases Spark detects checksum inequality in files.

URL encode signatures

Can't serve an application if signature name contains / e.g. tensorflow/serving/predict

Model container has conflicting name after db cleanup

Steps to reproduce:

  • start manager and deploy some model
  • restart manager with empty db
  • try to deploy same model as on first step
com.spotify.docker.client.exceptions.DockerRequestException:
  Request error: POST unix://localhost:80/containers/create?name=binarizer_0-0-1: 409,
  body: {"message":"Conflict. The container name \"/binarizer_0-0-1\" is already in use by container \"d60aff912cdce7106249b7fc1d8b5707c398492bd3027d255bfae372bcab851e\".
  You have to remove (or rename) that container to be able to reuse that name."}

Dropping container when wrong runtime

If runtime wasn't set correctly on creating application, it might cause problems with corresponding docker container. Invoking that application will likely fail, but moreover, user will not know, what happened, as there is no proper warning message. After invocation, runtime container will be dropped(removed).

Can't retry to deploy previously failed model

It seems that manager writes model info to db before container actually starts.
Failed deploy:

[ERROR] i.h.s.m.ManagerApi i.h.s.m.ManagerApi$$anonfun$1.applyOrElse.86 Request error: POST unix://localhost:80/containers/create?name=ssd_bulat_0.0.1: 409, body: {"message":"Conflict. The container name \"/ssd_bulat_0.0.1\" is already in use by container \"0ece243521fe3312233ce11c0509d547325c0db245a99875da6575bcdf7d985d\". You have to remove (or rename) that container to be able to reuse that name."}

com.spotify.docker.client.exceptions.DockerRequestException: Request error: POST unix://localhost:80/containers/create?name=ssd_bulat_0.0.1: 409, body: {"message":"Conflict. The container name \"/ssd_bulat_0.0.1\" is already in use by container \"0ece243521fe3312233ce11c0509d547325c0db245a99875da6575bcdf7d985d\". You have to remove (or rename) that container to be able to reuse that name."}

	at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2503) ~[docker-client-8.8.0.jar:8.8.0]

Failed retry:

[ERROR] i.h.s.m.ManagerApi i.h.s.m.ManagerApi$$anonfun$1.applyOrElse.86 ERROR: duplicate key value violates unique constraint "model_service_service_name_key"
  Detail: Key (service_name)=(ssd_bulat_0.0.1) already exists.
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "model_service_service_name_key"
  Detail: Key (service_name)=(ssd_bulat_0.0.1) already exists.

Get all modelRuntimes by model id

Right now we can get modelRuntimes for a given model using /api/v1/modelRuntime/{modelId}/last endpoint, but it's limited by a required 'maximum' query parameter, and we don't know how many modelRuntimes exist for this model, so we can't set maximum.
We need to either add some kind of pagination to this endpoint, or create a new endpoint that will return all modelRuntimes for a given model.

grpc issue when exec "docker-compose up"

hello , I get a issue when exec "docker-compose up" on manager project, can you have a look, thanks very much

the logs as follow:

source/common/upstream/cluster_manager_impl.cc:388] add/update cluster manager-http starting warming
manager | [2018-07-27 08:14:10.570][ERROR] a.a.OneForOneStrategy a.e.s.Slf4jLogger$$anonfun$receive$1.$anonfun$applyOrElse$1.69 null
manager | java.lang.NullPointerException: null
manager | at com.google.protobuf.Utf8.encodedLength(Utf8.java:251) ~[protobuf-java-3.5.1.jar:?]
manager | at com.google.protobuf.CodedOutputStream.computeStringSizeNoTag(CodedOutputStream.java:866) ~[protobuf-java-3.5.1.jar:?]
manager | at com.google.protobuf.CodedOutputStream.computeStringSize(CodedOutputStream.java:626) ~[protobuf-java-3.5.1.jar:?]
manager | at envoy.api.v2.core.SocketAddress.__computeSerializedValue(SocketAddress.scala:42) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.core.SocketAddress.serializedSize(SocketAddress.scala:52) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.core.Address.__computeSerializedValue(Address.scala:20) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.core.Address.serializedSize(Address.scala:27) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.Endpoint.__computeSerializedValue(Endpoint.scala:18) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.Endpoint.serializedSize(Endpoint.scala:24) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.LbEndpoint.__computeSerializedValue(LbEndpoint.scala:49) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.LbEndpoint.serializedSize(LbEndpoint.scala:58) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.LocalityLbEndpoints.$anonfun$__computeSerializedValue$1(LocalityLbEndpoints.scala:57) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.LocalityLbEndpoints.$anonfun$__computeSerializedValue$1$adapted(LocalityLbEndpoints.scala:57) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at scala.collection.Iterator.foreach(Iterator.scala:944) ~[scala-library.jar:?]
manager | at scala.collection.Iterator.foreach$(Iterator.scala:944) ~[scala-library.jar:?]
manager | at scala.collection.AbstractIterator.foreach(Iterator.scala:1432) ~[scala-library.jar:?]
manager | at scala.collection.IterableLike.foreach(IterableLike.scala:71) ~[scala-library.jar:?]
manager | at scala.collection.IterableLike.foreach$(IterableLike.scala:70) ~[scala-library.jar:?]
manager | at scala.collection.AbstractIterable.foreach(Iterable.scala:54) ~[scala-library.jar:?]
manager | at envoy.api.v2.endpoint.LocalityLbEndpoints.__computeSerializedValue(LocalityLbEndpoints.scala:57) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.endpoint.LocalityLbEndpoints.serializedSize(LocalityLbEndpoints.scala:65) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.ClusterLoadAssignment.$anonfun$__computeSerializedValue$1(ClusterLoadAssignment.scala:38) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.ClusterLoadAssignment.$anonfun$__computeSerializedValue$1$adapted(ClusterLoadAssignment.scala:38) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at scala.collection.immutable.List.foreach(List.scala:389) ~[scala-library.jar:?]
manager | at envoy.api.v2.ClusterLoadAssignment.__computeSerializedValue(ClusterLoadAssignment.scala:38) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at envoy.api.v2.ClusterLoadAssignment.serializedSize(ClusterLoadAssignment.scala:45) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at scalapb.GeneratedMessage.toByteString(GeneratedMessageCompanion.scala:140) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at scalapb.GeneratedMessage.toByteString$(GeneratedMessageCompanion.scala:139) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at envoy.api.v2.ClusterLoadAssignment.toByteString(ClusterLoadAssignment.scala:28) ~[envoy-data-plane-api_2.12-v1.6.0_1.jar:v1.6.0_1]
manager | at scalapb.AnyCompanionMethods.pack(AnyMethods.scala:35) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at scalapb.AnyCompanionMethods.pack$(AnyMethods.scala:30) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at com.google.protobuf.any.Any$.pack(Any.scala:183) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at scalapb.AnyCompanionMethods.pack(AnyMethods.scala:28) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at scalapb.AnyCompanionMethods.pack$(AnyMethods.scala:27) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at com.google.protobuf.any.Any$.pack(Any.scala:183) ~[scalapb-runtime_2.12-0.7.4.jar:0.7.4]
manager | at ### ### io.hydrosphere.serving.manager.service.envoy.xds.AbstractDSActor.$anonfun$sendToObserver$1(AbstractDSActor.scala:63) ~[manager.jar:0.0.24-SNAPSHOT]
manager | at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234) ~[scala-library.jar:?]
manager | at scala.collection.immutable.List.foreach(List.scala:389) ~[scala-library.jar:?]
manager | at scala.collection.TraversableLike.map(TraversableLike.scala:234) ~[scala-library.jar:?]
manager | at scala.collection.TraversableLike.map$(TraversableLike.scala:227) ~[scala-library.jar:?]
manager | at scala.collection.immutable.List.map(List.scala:295) ~[scala-library.jar:?]
manager | at io.hydrosphere.serving.manager.service.envoy.xds.AbstractDSActor.io$hydrosphere$serving$manager$service$envoy$xds$AbstractDSActor$$sendToObserver(AbstractDSActor.scala:63) ~[manager.jar:0.0.24-SNAPSHOT]
manager | at io.hydrosphere.serving.manager.service.envoy.xds.AbstractDSActor$$anonfun$receive$1.applyOrElse(AbstractDSActor.scala:90) ~[manager.jar:0.0.24-SNAPSHOT]

Issue with docker compose.

I am using the latest code.

manager | [ERROR] i.h.s.m.ManagerHttpApi i.h.s.m.ManagerHttpApi$$anonfun$1.applyOrElse.76 null
manager | java.lang.RuntimeException: null
manager | at io.hydrosphere.serving.manager.service.prometheus.PrometheusMetricsServiceImpl.fetchServices(PrometheusMetricsService.scala:37) ~[manager.jar:latest]
manager | at io.hydrosphere.serving.manager.controller.prometheus.PrometheusMetricsController$$anonfun$getServices$1$$anonfun$apply$1$$anonfun$apply$2.apply(PrometheusMetricsController.scala:30) ~[manager.jar:latest]
manager | at io.hydrosphere.serving.manager.controller.prometheus.PrometheusMetricsController$$anonfun$getServices$1$$anonfun$apply$1$$anonfun$apply$2.apply(PrometheusMetricsController.scala:30) ~[manager.jar:latest]

sidecar | [2018-02-09 02:03:07.572][26][debug][main] source/server/connection_handler_impl.cc:129] [C10] new connection
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:181] [C10] new stream
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:439] [C10][S4601685897479926094] request headers complete (end_stream=true):
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:444] [C10][S4601685897479926094] ':authority':'hd-nd05.campus.utah.edu:8080'
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:444] [C10][S4601685897479926094] 'user-agent':'curl/7.52.1'
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:444] [C10][S4601685897479926094] 'accept':'/'
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:444] [C10][S4601685897479926094] ':path':'/v1/prometheus/services'
sidecar | [2018-02-09 02:03:07.572][26][debug][http] source/common/http/conn_manager_impl.cc:444] [C10][S4601685897479926094] ':method':'GET'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:239] [C10][S4601685897479926094] cluster 'manager-http' match for URL '/v1/prometheus/services'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] ':authority':'hd-nd05.campus.utah.edu:8080'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'user-agent':'curl/7.52.1'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'accept':'/'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] ':path':'/v1/prometheus/services'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] ':method':'GET'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-forwarded-proto':'http'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-request-id':'35b54b31-9920-9176-9df7-30d0d1c3fc1e'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-b3-traceid':'8e19a689e87492a5'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-b3-spanid':'8e19a689e87492a5'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-b3-sampled':'1'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-ot-span-context':'8e19a689e87492a5;8e19a689e87492a5;0000000000000000'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] 'x-envoy-expected-rq-timeout-ms':'15000'
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:284] [C10][S4601685897479926094] ':scheme':'http'
sidecar | [2018-02-09 02:03:07.572][26][debug][pool] source/common/http/http1/conn_pool.cc:74] [C3] using existing connection
sidecar | [2018-02-09 02:03:07.572][26][debug][router] source/common/router/router.cc:902] [C10][S4601685897479926094] pool ready
sidecar | [2018-02-09 02:03:07.575][26][debug][router] source/common/router/router.cc:553] [C10][S4601685897479926094] upstream headers complete: end_stream=false
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:859] [C10][S4601685897479926094] encoding headers via codec (end_stream=false):
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] 'server':'envoy'
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] 'date':'Fri, 09 Feb 2018 02:03:07 GMT'
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] 'content-type':'text/plain; charset=UTF-8'
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] 'content-length':'13'
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] ':status':'500'
sidecar | [2018-02-09 02:03:07.575][26][debug][http] source/common/http/conn_manager_impl.cc:864] [C10][S4601685897479926094] 'x-envoy-upstream-service-time':'2'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.