Code Monkey home page Code Monkey logo

bazel-buildfarm's Introduction

Bazel Buildfarm

Build status OpenSSF Scorecard GitHub License GitHub Release Docker Pulls

This repository hosts the Bazel remote caching and execution system.

Background information on the status of caching and remote execution in bazel can be found in the bazel documentation.

File issues here for bugs or feature requests, and ask questions via build team slack in the #buildfarm channel.

Buildfarm Docs

Usage

All commandline options override corresponding config settings.

Redis

Run via

$ docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:7.2.4
redis-cli config set stop-writes-on-bgsave-error no

Bazel Buildfarm Server

Run via

$ bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

logfile has to be in the standard java util logging format and passed as a --jvm_flag=-Dlogging.config=file: configfile has to be in yaml format.

Bazel Buildfarm Worker

Run via

$ bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- <logfile> <configfile>

Ex: bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

logfile has to be in the standard java util logging format and passed as a --jvm_flag=-Dlogging.config=file: configfile has to be in yaml format.

Bazel Client

To use the example configured buildfarm with bazel (version 1.0 or higher), you can configure your .bazelrc as follows:

$ cat .bazelrc
$ build --remote_executor=grpc://localhost:8980

Then run your build as you would normally do.

Debugging

Buildfarm uses Java's Logging framework and outputs all routine behavior to the NICE Level.

You can use typical Java logging configuration to filter these results and observe the flow of executions through your running services. An example logging.properties file has been provided at examples/logging.properties for use as follows:

$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

and

$ bazel run //src/main/java/build/buildfarm:buildfarm-shard-worker -- --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties $PWD/examples/config.minimal.yml

To attach a remote debugger, run the executable with the --debug=<PORT> flag. For example:

$ bazel run //src/main/java/build/buildfarm:buildfarm-server -- --debug=5005 $PWD/examples/config.minimal.yml

Third-party Dependencies

Most third-party dependencies (e.g. protobuf, gRPC, ...) are managed automatically via rules_jvm_external. These dependencies are enumerated in the WORKSPACE with a maven_install artifacts parameter.

Things that aren't supported by rules_jvm_external are being imported as manually managed remote repos via the WORKSPACE file.

Deployments

Buildfarm can be used as an external repository for composition into a deployment of your choice.

Add the following to your WORKSPACE to get access to buildfarm targets, filling in the commit and sha256 values:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

BUILDFARM_EXTERNAL_COMMIT = "<revision commit id>"
BUILDFARM_EXTERNAL_SHA256 = "<sha256 digest of url below>"

http_archive(
    name = "build_buildfarm",
    strip_prefix = "bazel-buildfarm-%s" % BUILDFARM_EXTERNAL_COMMIT,
    sha256 = BUILDFARM_EXTERNAL_SHA256,
    url = "https://github.com/bazelbuild/bazel-buildfarm/archive/%s.zip" % BUILDFARM_EXTERNAL_COMMIT,
)

load("@build_buildfarm//:deps.bzl", "buildfarm_dependencies")

buildfarm_dependencies()

load("@build_buildfarm//:defs.bzl", "buildfarm_init")

buildfarm_init()

load("@maven//:compat.bzl", "compat_repositories")

compat_repositories()

Optionally, if you want to use the buildfarm docker container image targets, you can add this:

load("@build_buildfarm//:images.bzl", "buildfarm_images")

buildfarm_images()

Helm Chart

To install OCI bundled Helm chart:

helm install \
  -n bazel-buildfarm \
  --create-namespace \
  bazel-buildfarm \
  oci://ghcr.io/bazelbuild/buildfarm \
  --version "0.2.4"

bazel-buildfarm's People

Contributors

80degreeswest avatar abergmeier-dsfishlabs avatar amishra-u avatar andrewrothstein avatar angusdavis avatar buchgr avatar comius avatar cushon avatar dependabot[bot] avatar edbaunton avatar edschouten avatar jacobmou avatar jasonschroeder-sfdc avatar jerrymarino avatar jiaquan1 avatar keith avatar kekxv avatar krisstakos avatar lberki avatar luxe avatar meteorcloudy avatar philwo avatar shirchen avatar stefanobaghino avatar thna123459 avatar tobbe76 avatar ttsugriy avatar vladmos avatar werkt avatar wiwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bazel-buildfarm's Issues

Server watchOperation NPE

Have not had time to investigate deeply but have observed this near the end of a build.

Jun 20, 2018 4:39:08 AM io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@357c5d0
java.lang.NullPointerException
	at build.buildfarm.instance.memory.MemoryInstance.watchOperation(MemoryInstance.java:347)
	at build.buildfarm.server.WatcherService.watch(WatcherService.java:69)
	at com.google.watcher.v1.WatcherGrpc$MethodHandlers.invoke(WatcherGrpc.java:237)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
	at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
	at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Configurize Instance Digest Function

buildfarm is now broken for the default sha256 digest function in bazel

Incidentally, we were broken for the default md5 before this, but with bazel refusing to build without at least SHA1 for --remote_cache, most folks have declared a SHA1 default.

We should make the digest function configurable per instance, with the default sha256

Error when build on darwin. SDKROOT and DEVELOPER_DIR not set

I build my iOS project used buildfarm, But it failed. I know remoteWorker not set SDKROOT and DEVELOPER_DIR appropriately. Do you have a plan to support this ? My knowledge is limited and I have no idea who to fix this. Does writing a custom apple toolchain can fix this?

All does not build

Issuing a simple bazel build //... leads to:

no such package 'third_party/zlib': BUILD file not found on package path and referenced by '//third_party/grpc:grpc_unsecure'

Obvious question - do we want zlib in third_party or rather as a repo?

Try to use with bazel 0.9.0

Hello.

bazel

[jenkins@ci-slave-1:Ireland backend-ci]$ bazel version
Build label: 0.9.0- (@non-git)
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 31 04:52:01 +49959 (1514418007921)
Build timestamp: 1514418007921
Build timestamp as int: 1514418007921

bazel-buildfarm

commit f9044b62f0943fc736f22961cf77baec50e03698 (HEAD -> master, origin/master, origin/HEAD)
Author: Ed Baunton <[email protected]>
Date:   Sat Dec 30 18:25:11 2017 +0000

    Add link to bazel docs in buildfarm readme (#72)

I built & run server & work by following way:

# 10.21.50.51
bazel-bin/src/main/java/build/buildfarm/buildfarm-server ./examples/server.config.example -p 8090
# 10.21.50.52
bazel-bin/src/main/java/build/buildfarm/buildfarm-worker ./examples/worker.config.example --root=/tmp/bazel_farm

In worker.config.example I made following changes

[oleg@ci-slave-2:Ireland bazel-buildfarm]$ git diff examples/
diff --git a/examples/worker.config.example b/examples/worker.config.example
index 25ec7aa..8710203 100644
--- a/examples/worker.config.example
+++ b/examples/worker.config.example
@@ -3,10 +3,10 @@
 instance_name: "default_memory_instance"

 # the endpoint used for all api requests
-operation_queue: "localhost:8980"
+operation_queue: "10.21.50.51:8090"

 # all content for the operations will be stored under this path
-root: "/tmp/worker"
+root: "/tmp/bazel_worker"

 # the local cache location relative to the 'root', or absolute
 cas_cache_directory: "cache"

Now I try to build my project by with following tools/bazel.rc

build -c dbg --spawn_strategy=remote --remote_executor=10.21.50.51:8090

Build failed with following error:

 Note: Remote connection/protocol failed with: execution failed: com.google.devtools.build.lib.remote.Retrier$RetryException: Out of retries or status not retriable.
	at com.google.devtools.build.lib.remote.ByteStreamUploader$1.failure(ByteStreamUploader.java:245)
	at com.google.devtools.build.lib.remote.ByteStreamUploader$AsyncUpload$1.onClose(ByteStreamUploader.java:362)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
	at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:339)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:443)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:525)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:446)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:557)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:107)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusException: INVALID_ARGUMENT: Digest mismatch c9733dbcf7a27ce9a82e9536b53656c22368a7ec/1235 <-> 4af5fa8e76d8fb38349ebd7b71abc17077ea09f3c5e9cca63d4539eb3ab8c38d/1235
	at io.grpc.Status.asException(Status.java:534)
	at com.google.devtools.build.lib.remote.ByteStreamUploader$1.failure(ByteStreamUploader.java:238)
	... 13 more

Need to support networking sandbox

I've got tests from third parties which won't pass without the network sandbox (they bind to localhost:xxx). Please support blocks-network

Mount tmpfs over /tmp/

We have third party tests which require write access to /tmp/. We currently do that with the --sandbox_tmpfs_path=/tmp/ flag.

Failure to automatically insert into the CAS

We've got an action with 1300 outputs totaling about 34 MB. It fails to build.

Server log:

austin[2047] dev-builder2 ((d14b903...)) ~/local/bazel-buildfarm
$ bazel-bin/src/main/java/build/buildfarm/buildfarm-server examples/server.config.example
Action Input Size 31984579: /bin/bash
3499275672526637: Operation default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa was created

Worker log:

austin[2046] dev-builder2 ((d14b903...)) ~/local/bazel-buildfarm
$ bazel-bin/src/main/java/build/buildfarm/buildfarm-worker examples/worker.config.example 
Jan 18, 2018 8:04:48 PM build.buildfarm.worker.ExecuteActionStage runInterruptible
INFO: ExecuteActionStage: Waiting for input
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.MatchStage lambda$match$0
INFO: execute: Starting operation: default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.InputFetchStage tick
INFO: InputFetchStage: Fetching inputs
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.InputFetchStage tick
INFO: InputFetchStage: inputs fetched: OK
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.ExecuteActionStage spawn
INFO: ExecuteActionStage: spawn default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa
ExecuteActionStage: 1/16
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.ExecuteActionStage runInterruptible
INFO: ExecuteActionStage: Waiting for input
Jan 18, 2018 8:05:01 PM build.buildfarm.worker.Executor run
INFO: Executor: Updating operation default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa
Jan 18, 2018 8:05:02 PM build.buildfarm.worker.Executor run
INFO: Executor: Executing command
Jan 18, 2018 8:05:03 PM build.buildfarm.worker.Executor lambda$run$0
INFO: Executor: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:04 PM build.buildfarm.worker.Executor lambda$run$0
INFO: Executor: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:04 PM build.buildfarm.worker.Executor run
INFO: Executor: Executed command: exit code 0
ExecuteActionStage: 0/16
Jan 18, 2018 8:05:04 PM build.buildfarm.worker.ReportResultStage tick
INFO: ReportResultStage: Operation default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa
Jan 18, 2018 8:05:05 PM build.buildfarm.worker.ReportResultStage tick
INFO: ReportResultStage: Uploading outputs
Jan 18, 2018 8:05:06 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:06 PM build.buildfarm.worker.ReportResultStage tick
INFO: ReportResultStage: Updating action cache
Exception in thread "Thread-4" io.grpc.StatusRuntimeException: CANCELLED: HTTP/2 error code: CANCEL
Received Rst Stream
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
        at com.google.devtools.remoteexecution.v1test.ActionCacheGrpc$ActionCacheBlockingStub.updateActionResult(ActionCacheGrpc.java:343)
        at build.buildfarm.instance.stub.StubInstance.putActionResult(StubInstance.java:157)
        at build.buildfarm.worker.ReportResultStage.tick(ReportResultStage.java:156)
        at build.buildfarm.worker.AbstractPipelineStage.iterate(AbstractPipelineStage.java:63)
        at build.buildfarm.worker.ReportResultStage.run(ReportResultStage.java:70)
        at java.lang.Thread.run(Thread.java:748)
Jan 18, 2018 8:05:07 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:08 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:09 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Stage has exited at priority 4
Closing stage after exit at priority 4
Jan 18, 2018 8:05:10 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:11 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Closing stage at priority 3
Interrupting unterminated closed thread at priority 3
Stage has exited at priority 3
Jan 18, 2018 8:05:12 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:13 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Closing stage at priority 2
Interrupting unterminated closed thread at priority 2
Jan 18, 2018 8:05:14 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Stage has exited at priority 2
Closing stage at priority 1
Interrupting unterminated closed thread at priority 1
Exception in thread "Thread-1" io.grpc.StatusRuntimeException: CANCELLED: Call was interrupted
        at io.grpc.Status.asRuntimeException(Status.java:517)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:128)
        at build.buildfarm.v1test.OperationQueueGrpc$OperationQueueBlockingStub.take(OperationQueueGrpc.java:278)
        at build.buildfarm.instance.stub.StubInstance.match(StubInstance.java:345)
        at build.buildfarm.worker.MatchStage.match(MatchStage.java:57)
        at build.buildfarm.worker.MatchStage.run(MatchStage.java:42)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:622)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:122)
        ... 5 more
Stage has exited at priority 1
Jan 18, 2018 8:05:15 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:16 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:17 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:18 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:19 PM build.buildfarm.worker.ReportResultStage lambda$tick$0
INFO: ReportResultStage: poller: Completed Poll for default_memory_instance/operations/e62694ac-221d-4da2-95b0-7636f6065aaa: OK
Jan 18, 2018 8:05:20 PM build.buildfarm.worker.ReportResultStage lambda$tick$

When I set file_cas_control.limit to 1 on the worker, my action succeeds.

Worker linkInputs stuck in fatal state

Occasionally the Worker seems to get stuck in a position it can't get out of, requiring bringing down the server and re-trying the build. Any insight into how to look into this further?

linkFile /tmp/worker/default_memory_instance/operations/1c51acae-d278-44cc-9bea-69d966327681/external/go1_8_3_linux_amd64/src/go/build/zcgo.go -> /tmp/worker/cache/c0aa8a0d1f26b0a64015c48d988bb94e499f4d0834ac1b3a933374b5e28fe2db_771
Exception in thread "Thread-2" io.grpc.StatusRuntimeException: NOT_FOUND
	at io.grpc.Status.asRuntimeException(Status.java:526)
	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:556)
	at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43)
	at build.buildfarm.instance.stub.ByteStringIteratorInputStream.advance(ByteStringIteratorInputStream.java:92)
	at build.buildfarm.instance.stub.ByteStringIteratorInputStream.read(ByteStringIteratorInputStream.java:72)
	at java.io.InputStream.read(InputStream.java:101)
	at java.nio.file.Files.copy(Files.java:2908)
	at java.nio.file.Files.copy(Files.java:3027)
	at build.buildfarm.worker.CASFileCache.put(CASFileCache.java:331)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:204)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:221)
	at build.buildfarm.worker.operationqueue.Worker.fetchInputs(Worker.java:186)
	at build.buildfarm.worker.operationqueue.Worker.access$300(Worker.java:76)
	at build.buildfarm.worker.operationqueue.Worker$2.createActionRoot(Worker.java:370)
	at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:50)
	at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:61)
	at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:42)
	at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:49)
	at java.lang.Thread.run(Thread.java:748)
Stage has exited at priority 2
Closing stage at priority 4
Interrupting unterminated closed thread at priority 4

INVALID_ARGUMENT: Digest mismatch

Having configured Bazel to use SHA256

startup --host_jvm_args=-Dbazel.DigestFunction=SHA256

and build farm server to use SHA256 by adding

hash_function = SHA256

I'm getting the following exception when trying to build using remote build:

ERROR: /private/var/tmp/_bazel_ttsugrii/98bcb5c7cfb9e117ec3600033a1548b4/external/com_google_protobuf/BUILD:274:1: C++ compilation of rule '@com_google_protobuf//:protoc_lib' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed: com.google.devtools.build.lib.remote.Retrier$RetryException: Out of retries or status not retriable.
	at com.google.devtools.build.lib.remote.ByteStreamUploader$1.failure(ByteStreamUploader.java:245)
	at com.google.devtools.build.lib.remote.ByteStreamUploader$AsyncUpload$1.onClose(ByteStreamUploader.java:362)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41)
	at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:339)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:443)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:525)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:446)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:557)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:107)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusException: INVALID_ARGUMENT: Digest mismatch c2e7bd71bf22cc3aa5b3f24693fb6b9c84ca50cb/2035 <-> 57ad52caebab9999562fde1414d8c3f65d72d993094534f2ccc4e67a5842b84d/2035
	at io.grpc.Status.asException(Status.java:534)
	at com.google.devtools.build.lib.remote.ByteStreamUploader$1.failure(ByteStreamUploader.java:238)
	... 13 more
Target //src/main/java/build/buildfarm:buildfarm-server failed to build

The issue is caused by hard-coded SHA1 algorithm used in DigestUtil::compute function. The fix is trivial and I'm working on a PR with proper tests to prevent future regressions like this.

worker throws java.nio.file.NoSuchFileException in spite of the fact that the file exists

The buildfarm-worker threw the following exception.

Nov 02, 2018 9:24:06 PM build.buildfarm.worker.Executor runInterruptible
WARNING: Executor::run(default_memory_instance/operations/32b67bbb-2dbe-4047-abf4-8a2fcda6ac96): could not transition to EXECUTING
java.nio.file.NoSuchFileException: /tmp/worker/default_memory_instance/operations/47790f92-c7f3-4908-accd-8181b21266fc/external/tdl/third_party/folly/folly/tracing/StaticTracepoint.h
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
        at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
        at java.nio.file.Files.delete(Files.java:1126)
        at build.buildfarm.worker.CASFileCache$2.visitFile(CASFileCache.java:219)
        at build.buildfarm.worker.CASFileCache$2.visitFile(CASFileCache.java:216)
        at java.nio.file.Files.walkFileTree(Files.java:2670)
        at java.nio.file.Files.walkFileTree(Files.java:2742)
        at build.buildfarm.worker.CASFileCache.removeDirectory(CASFileCache.java:216)
        at build.buildfarm.worker.operationqueue.Worker$2.createActionRoot(Worker.java:396)
        at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:58)
        at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:69)
        at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:45)
        at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:52)
        at java.lang.Thread.run(Thread.java:748)

However, the file in question does exist and is readable.

$ ls -l /tmp/worker/default_memory_instance/operations/47790f92-c7f3-4908-accd-8181b21266fc/external/tdl/third_party/folly/folly/tracing/StaticTracepoint.h
-rwxrw-r-- 44 ubuntu ubuntu 1167 Nov  2 17:58 /tmp/worker/default_memory_instance/operations/47790f92-c7f3-4908-accd-8181b21266fc/external/tdl/third_party/folly/folly/tracing/StaticTracepoint.h

client: bazel 0.18
bazel-buildfarm SHA1: ec7a053

Prometheus Metrics

Prometheus metrics regarding time of action execution and cache hit rates are important. Creating a ticket here to track it, not sure if it is on your radar @werkt.

(FYI in my (somewhat limited) experience Buildfarm is the most stable/reliable REAPI implementation, so anyway nice job, this project is awesome)

FAILED_PRECONDITION: A requested input (or the `Command` of the `Action`) was not found in the CAS.

...
(18:10:47) ERROR: /home/mark/.../BUILD.bazel:31:1: Linking of rule '//:media_service' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed: com.google.devtools.build.lib.remote.Retrier$RetryException: Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: A requested input (or the `Command` of the `Action`) was not found in the CAS.
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:238)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:101)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:124)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:203)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:95)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:63)
        at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
        at com.google.devtools.build.lib.rules.cpp.CppLinkAction.execute(CppLinkAction.java:323)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:962)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:893)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:116)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:748)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:702)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:443)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:504)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:224)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:382)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: A requested input (or the `Command` of the `Action`) was not found in the CAS.
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
        at com.google.devtools.remoteexecution.v1test.ExecutionGrpc$ExecutionBlockingStub.execute(ExecutionGrpc.java:370)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$0(GrpcRemoteExecutor.java:127)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:220)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:101)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:127)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:220)
        ... 21 more
Target //:media_service failed to build
...

Bazel build fails with remote exec errors

carrying over form the slack channel
i am running into weird grpc errors on the buildfarm server while usign the smae to remote exec with bazel.
All the actions sent to the server error out while running GetActionResult
Logs from buildfarm server


07:19:29.943 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND DATA: streamId=19 padding=0 endStream=true length=51 bytes=000000002e122c0a2861326466643638346165383464623230653133336433356631656635646439393134373932306237105e
07:19:29.943 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND RST_STREAM: streamId=19 errorCode=8
07:19:29.944 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND HEADERS: streamId=19 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 13, grpc-message: Half-closed without a request] streamDependency=0 weight=16 exclusive=false padding=0 endStream=true
07:19:29.944 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=19 errorCode=5
07:19:29.946 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=19 errorCode=8
07:19:30.319 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND HEADERS: streamId=21 headers=GrpcHttp2RequestHeaders[:path: /build.bazel.remote.execution.v2.ActionCache/GetActionResult, :authority: localhost:8980, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-java-netty/1.10.0, build.bazel.remote.execution.v2.requestmetadata-bin: ChsKBWJhemVsEhIwLjE5LjAtIChAbm9uLWdpdCkSKGEyZGZkNjg0YWU4NGRiMjBlMTMzZDM1ZjFlZjVkZDk5MTQ3OTIwYjcaJGU0ZDBiOTRlLTY4MTktNDJlMS04YzRlLTIwMGZmNjhiNmY4ZSIkYTJkNGYxYzgtZjNiYy00ZWM3LTk0YzItNjQ0NzNmYmM5NTlk, grpc-accept-encoding: gzip, grpc-timeout: 59999851u, grpc-trace-bin: ] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
07:19:30.319 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND DATA: streamId=21 padding=0 endStream=true length=51 bytes=000000002e122c0a2861326466643638346165383464623230653133336433356631656635646439393134373932306237105e
07:19:30.320 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND RST_STREAM: streamId=21 errorCode=8
07:19:30.320 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND HEADERS: streamId=21 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 13, grpc-message: Half-closed without a request] streamDependency=0 weight=16 exclusive=false padding=0 endStream=true
07:19:30.321 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=21 errorCode=5
07:19:30.324 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=21 errorCode=8
07:19:31.109 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND HEADERS: streamId=23 headers=GrpcHttp2RequestHeaders[:path: /build.bazel.remote.execution.v2.ActionCache/GetActionResult, :authority: localhost:8980, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-java-netty/1.10.0, build.bazel.remote.execution.v2.requestmetadata-bin: ChsKBWJhemVsEhIwLjE5LjAtIChAbm9uLWdpdCkSKGEyZGZkNjg0YWU4NGRiMjBlMTMzZDM1ZjFlZjVkZDk5MTQ3OTIwYjcaJGU0ZDBiOTRlLTY4MTktNDJlMS04YzRlLTIwMGZmNjhiNmY4ZSIkYTJkNGYxYzgtZjNiYy00ZWM3LTk0YzItNjQ0NzNmYmM5NTlk, grpc-accept-encoding: gzip, grpc-timeout: 59999849u, grpc-trace-bin: ] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
07:19:31.110 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND DATA: streamId=23 padding=0 endStream=true length=51 bytes=000000002e122c0a2861326466643638346165383464623230653133336433356631656635646439393134373932306237105e
07:19:31.111 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND RST_STREAM: streamId=23 errorCode=8
07:19:31.111 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND HEADERS: streamId=23 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 13, grpc-message: Half-closed without a request] streamDependency=0 weight=16 exclusive=false padding=0 endStream=true
07:19:31.112 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=23 errorCode=5
07:19:31.113 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND RST_STREAM: streamId=23 errorCode=8
07:19:32.764 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND HEADERS: streamId=25 headers=GrpcHttp2RequestHeaders[:path: /build.bazel.remote.execution.v2.ActionCache/GetActionResult, :authority: localhost:8980, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-java-netty/1.10.0, build.bazel.remote.execution.v2.requestmetadata-bin: ChsKBWJhemVsEhIwLjE5LjAtIChAbm9uLWdpdCkSKGEyZGZkNjg0YWU4NGRiMjBlMTMzZDM1ZjFlZjVkZDk5MTQ3OTIwYjcaJGU0ZDBiOTRlLTY4MTktNDJlMS04YzRlLTIwMGZmNjhiNmY4ZSIkYTJkNGYxYzgtZjNiYy00ZWM3LTk0YzItNjQ0NzNmYmM5NTlk, grpc-accept-encoding: gzip, grpc-timeout: 59999842u, grpc-trace-bin: ] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false
07:19:32.765 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] INBOUND DATA: streamId=25 padding=0 endStream=true length=51 bytes=000000002e122c0a2861326466643638346165383464623230653133336433356631656635646439393134373932306237105e
07:19:32.766 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND RST_STREAM: streamId=25 errorCode=8
07:19:32.766 [grpc-default-worker-ELG-1-1] DEBUG io.grpc.netty.NettyServerHandler - [id: 0xc6f64687, L:/127.0.0.1:8980 - R:/127.0.0.1:49953] OUTBOUND HEADERS: streamId=25 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 13, grpc-message: Half-closed without a request] streamDependency=0 weight=16 exclusive=false padding=0 endStream=true

Logs from bazel client with verbose_logging

SUBCOMMAND: # //my-annotations:my_my_annotations_main [action 'Building my-annotations/libmy_my_annotations_main.jar ()']
(cd /local/data/scratch/.cache/bazel/_bazel_p2dpgnp/b6669dedfcd2eab0c289aa50224e42cf/execroot/my_root && \
  exec env - \
    LC_CTYPE=en_US.UTF-8 \
  external/embedded_jdk/bin/java -XX:+TieredCompilation '-XX:TieredStopAtLevel=1' -Xbootclasspath/p:/sw/external/jdk-1.8.0_60_x64/jre/lib/tools.jar -jar external/bazel_tools/tools/jdk/VanillaJavaBuilder_deploy.jar @bazel-out/k8-fastbuild/bin/my-annotations/libmy_my_annotations_main.jar-0.params)
ERROR: /local/data/scratch/royrah2/bazel10/my-annotations/BUILD:43:1: Couldn't build file my-annotations/libmy_my_annotations_main.jar: Building my-annotations/libmy_my_annotations_main.jar () failed (Exit 34). Note: Remote connection/protocol failed with: execution failed Call failed after 5 retry attempts: io.grpc.StatusRuntimeException: CANCELLED: HTTP/2 error code: CANCEL
Received Rst Stream: com.google.devtools.build.lib.remote.Retrier$RetryException: Call failed after 5 retry attempts: io.grpc.StatusRuntimeException: CANCELLED: HTTP/2 error code: CANCEL
Received Rst Stream
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:290)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:121)
        at com.google.devtools.build.lib.remote.GrpcRemoteCache.getCachedActionResult(GrpcRemoteCache.java:373)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:174)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:106)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:75)
        at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.internalExecute(SpawnAction.java:288)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.execute(SpawnAction.java:295)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:1001)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:930)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$800(SkyframeActionExecutor.java:121)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:770)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:725)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:478)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:519)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:216)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: CANCELLED: HTTP/2 error code: CANCEL
Received Rst Stream
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
        at build.bazel.remote.execution.v2.ActionCacheGrpc$ActionCacheBlockingStub.getActionResult(ActionCacheGrpc.java:347)
        at com.google.devtools.build.lib.remote.GrpcRemoteCache.lambda$getCachedActionResult$2(GrpcRemoteCache.java:376)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:268)
        ... 22 more
SUBCOMMAND: # //my-annotations:my_my_annotations_java_main [action 'Building my-annotations/libmy_my_annotations_java_main.jar (8 source files)']
(cd /local/data/scratch/.cache/bazel/_bazel_p2dpgnp/b6669dedfcd2eab0c289aa50224e42cf/execroot/my_root && \
  exec env - \
    LC_CTYPE=en_US.UTF-8 \
  external/embedded_jdk/bin/java -XX:+TieredCompilation '-XX:TieredStopAtLevel=1' -Xbootclasspath/p:/sw/external/jdk-1.8.0_60_x64/jre/lib/tools.jar -jar external/bazel_tools/tools/jdk/VanillaJavaBuilder_deploy.jar @bazel-out/k8-fastbuild/bin/my-annotations/libmy_my_annotations_java_main.jar-0.params)
ERROR: /local/data/scratch/royrah2/bazel10/my-annotations/BUILD:15:1: Couldn't build file my-annotations/libmy_my_annotations_java_main.jar: Building my-annotations/libmy_my_annotations_java_main.jar (8 source files) failed (Exit 34). Note: Remote connection/protocol failed with: execution failed Call failed after 5 retry attempts: io.grpc.StatusRuntimeException: INTERNAL: Half-closed without a request: com.google.devtools.build.lib.remote.Retrier$RetryException: Call failed after 5 retry attempts: io.grpc.StatusRuntimeException: INTERNAL: Half-closed without a request
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:290)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:121)
        at com.google.devtools.build.lib.remote.GrpcRemoteCache.getCachedActionResult(GrpcRemoteCache.java:373)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:174)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:106)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:75)
        at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.internalExecute(SpawnAction.java:288)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.execute(SpawnAction.java:295)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:1001)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:930)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$800(SkyframeActionExecutor.java:121)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:770)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:725)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:478)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:519)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:216)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.grpc.StatusRuntimeException: INTERNAL: Half-closed without a request
        at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
        at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
        at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
        at build.bazel.remote.execution.v2.ActionCacheGrpc$ActionCacheBlockingStub.getActionResult(ActionCacheGrpc.java:347)
        at com.google.devtools.build.lib.remote.GrpcRemoteCache.lambda$getCachedActionResult$2(GrpcRemoteCache.java:376)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:268)
        ... 22 more
Target //my-annotations:my_my_annotations_main failed to build
INFO: Elapsed time: 7.022s, Critical Path: 3.24s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: 0 processes.
FAILED: Build did NOT complete successfully

this is my bazelrc in case:

startup --host_jvm_args=-Dbazel.DigestFunction=SHA1 --host_jvm_args=-Djsi.branch='~ROYRAH!BAZELFS_CLEAN;SOURCE' --host_jvm_args=-Dworkspace.dir='/local/data/scratch/royrah2/bazel10/'  --host_jvm_args=-Dfs.enable.jsi='true'
build --nojava_header_compilation --jobs=1 --profile=/local/data/scratch/foo.profile --keep_going --java_toolchain=:custom_jdk --spawn_strategy=remote --genrule_strategy=remote --strategy=Javac=remote --strategy=Closure=remote --remote_executor=localhost:8980 --remote_cache=localhost:8980 --verbose_failures  --auth_enabled=false --tls_enabled=false

java.lang.NullPointerException from MemoryInstance.outstandingOperations.put

The buildfarm server failed with the following exception.

SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@63c18849
java.lang.NullPointerException
        at java.util.TreeMap.rotateLeft(TreeMap.java:2224)
        at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2291)
        at java.util.TreeMap.put(TreeMap.java:582)
        at build.buildfarm.instance.memory.MemoryInstance$OutstandingOperations.put(MemoryInstance.java:120)
        at build.buildfarm.instance.AbstractServerInstance.updateOperationWatchers(AbstractServerInstance.java:693)
        at build.buildfarm.instance.memory.MemoryInstance.updateOperationWatchers(MemoryInstance.java:277)
        at build.buildfarm.instance.AbstractServerInstance.putOperation(AbstractServerInstance.java:673)
        at build.buildfarm.instance.memory.MemoryInstance.putOperation(MemoryInstance.java:354)
        at build.buildfarm.instance.AbstractServerInstance.execute(AbstractServerInstance.java:541)
        at build.buildfarm.server.ExecutionService.execute(ExecutionService.java:99)
        at build.bazel.remote.execution.v2.ExecutionGrpc$MethodHandlers.invoke(ExecutionGrpc.java:497)
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

buildfarm revision: e9befb2 + pull/179 + pull/182
bazel revision: 3fc3ddb
server config:

  • max_size_bytes: 10737418240 # 10 G

worker config:

  • cas_cache_max_size_bytes: 107374182400 # 100G
  • execute_stage_width: 32

extra command line option for server:

  • --jvm_flag=-Xmx170g

Create instance identifier for configs

An instance cannot be identified without specifying both a name and a hash_function for communicating with it. Switch the configuration system to define an instance universally in terms of an identifier.

Java/proto project fails

I'm trying to get build-farm running on Ubuntu.

I successfully installed all required dependencies and Bazel 0.19.2, cloned the repo and started server and worker with the example configuration, each in its own screen.

This failed until I added a wrapper script exporting BAZEL_SH and JAVA_HOME.
Now both start up.

As demonstrated by the local build of build-farm itself, I can build java projects.

Starting it from a remote machine, I get loads of errors like this:

ERROR: /home/user/.cache/bazel/_bazel_user/f00d0a9338d336ba97aeac71037e0456/external/io_grpc_grpc_java/compiler/BUILD.bazel:1:1: Couldn't build file external/io_grpc_grpc_java/compiler/_objs/grpc_java_plugin/java_generator.o: undeclared inclusion(s) in rule '@io_grpc_grpc_java//compiler:grpc_java_plugin':
this rule is missing dependency declarations for the following files included by 'external/io_grpc_grpc_java/compiler/src/java_plugin/cpp/java_generator.cpp':
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/stddef.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/stdarg.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/stdint.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/x86intrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/ia32intrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/mmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xmmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/mm_malloc.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/pmmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/tmmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/ammintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/popcntintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/wmmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx2intrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512fintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512erintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512pfintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512cdintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512bwintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512dqintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlbwintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vldqintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512ifmaintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512ifmavlintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vbmiintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vbmivlintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/shaintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/lzcntintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/bmiintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/bmi2intrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/fmaintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/f16cintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/rtmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xtestintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/mm3dnow.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/prfchwintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/fma4intrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xopintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/lwpintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/tbmintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/rdseedintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/fxsrintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xsaveintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xsaveoptintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/adxintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/clwbintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/pcommitintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/clflushoptintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xsavesintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/xsavecintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include-fixed/limits.h'
  '/usr/lib/gcc/x86_64-linux-gnu/5/include-fixed/syslimits.h'

With logging enabled, all looks fine to me on both server and worker:

The server has lots of log lines like these:

FEIN: default_memory_instance::execute(2d1874ae7b2bf4f84fde94b97fc8e6f5ceeaf1f5751b093d6d2fa6b8a69f5108/142): default_memory_instance/operations/f72047dd-cefe-425e-9d0d-936ffa45c870 [Di Nov 27 17:14:23 MEZ 2018]

The worker log lines come in blocks and look like this:

FEIN: InputFetchStage::iterate(default_memory_instance/operations/4ba6f2f0-efca-4a38-aec2-33c7ebd00b03): Starting [Di Nov 27 17:14:37 MEZ 2018]
FEIN: Executor::executeCommand(default_memory_instance/operations/f72047dd-cefe-425e-9d0d-936ffa45c870): Completed command: exit code 0 [Di Nov 27 17:14:37 MEZ 2018]
FEIN: ExecuteActionStage::iterate(default_memory_instance/operations/f72047dd-cefe-425e-9d0d-936ffa45c870): 5,23900ms (0,0260000ms stalled) exit code: 0, 0/1 [Di Nov 27 17:14:37 MEZ 2018]
FEIN: ReportResultStage::iterate(default_memory_instance/operations/5c18d9b3-654d-47cd-90f2-1991813126f9): 6,18000ms (0,00300000ms stalled) Success [Di Nov 27 17:14:37 MEZ 2018]

I'm at a loss and I'd really love to get this running. If you need more info, I'll give my best to assist.

Basic documentation for logging / debugging

Is there any standard out / file logging? And if so is there any documentation regarding how to enable it?

The use-case: I enabled remote execution and since neither the worker nor the server output anything it is difficult to determine whether and how many of the targets are executed remotely. Looking through the code base I could find almost no mentions of the word log or logging...

Documentation Request: More info about the role of the server vs the worker

Thanks for the great default example. It works well as a starting point for basic experimentation.

That said, I think it would be very helpful with some more background info about the role of the server vs the worker.

Specifically, I would love some advice on how to configure this on a larger scale. How do you scale this across multiple machines? Do you run a single server with multiple workers per box?

java.lang.NullPointerException from MemoryInstance.putOperation

The buildfarm server failed with the following exception.

SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@4b94e81d
java.lang.NullPointerException
        at build.buildfarm.instance.memory.MemoryInstance.putOperation(MemoryInstance.java:371)
        at build.buildfarm.server.OperationQueueService.put(OperationQueueService.java:102)
        at build.buildfarm.v1test.OperationQueueGrpc$MethodHandlers.invoke(OperationQueueGrpc.java:369)
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Excerpt from MemoryInstance.java near line 371:

  @Override
  public boolean putOperation(Operation operation) throws InterruptedException {
    if (!super.putOperation(operation)) {
      return false;
    }
    String operationName = operation.getName();
    if (operation.getDone()) {
      // destroy requeue timer                                                                                                             
      Watchdog requeuer = requeuers.remove(operationName);
      if (requeuer != null) {
        requeuer.stop();
      }
      // destroy action timed out failure                                                                                                  
      Watchdog operationTimeoutDelay =
          operationTimeoutDelays.remove(operationName);
      if (operationTimeoutDelay != null) {
        operationTimeoutDelay.stop();
      }
    } else if (isExecuting(operation)) {
      requeuers.get(operationName).pet();  # This is line 371

buildfarm revision: e9befb2 + pull/179 + pull/182
bazel revision: 3fc3ddb
server config:

  • max_size_bytes: 139586437120 # 130G

worker config:

  • cas_cache_max_size_bytes: 107374182400 # 100G
  • execute_stage_width: 32

extra command line option for server:

  • --jvm_flag=-Xmx170g

High availability configuration

Is it possible to configure bazel-buildfarm such that multiple buildfarm-server instances can use the same buildfarm-worker instances?

If so, how would this be configured?

Also, how would bazel be invoked such that it can choose among the buildfarm-server instances?


Or, does the speed with which the server and worker processes come up mean that any interruptions are short enough to not be a concern in the design of bazel-buildfarm?

Setup CI server.

It would be great if there was a Travis CI or Circle CI configured for PRs, so that reviewers can immediately see build and test results.

Report action executions errors through `ExecuteResponse`

With and per changes from @ola-rozenfeld:

  // Errors discovered during creation of the `Operation` will be reported
  // as gRPC Status errors, while errors that occurred while running the
  // action will be reported in the `status` field of the `ExecuteResponse`. The
  // server MUST NOT set the `error` field of the `Operation` proto.

We must change our results reporting for all non-creation related error cases.

Compatibility with rules_go?

I'm just now experimenting with buildfarm and it seems to be incompatible with rules_go, even if the platforms are the same (at least I think they are the same). My experiments running a single server process and a single worker process on localhost halt with:

java.lang.IllegalArgumentException: outputDir specified: [bazel-out/host/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg]
	at build.buildfarm.worker.operationqueue.Worker.parseOutputDirectories(Worker.java:291)
	at build.buildfarm.worker.operationqueue.Worker.access$200(Worker.java:77)
	at build.buildfarm.worker.operationqueue.Worker$2.createActionRoot(Worker.java:412)
	at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:50)
	at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:61)
	at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:42)
	at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:49)
	at java.lang.Thread.run(Thread.java:748)
Stage has exited at priority 2
Closing stage at priority 4
Interrupting unterminated closed thread at priority 4
Stage has exited at priority 4
Closing stage at priority 3
Interrupting unterminated closed thread at priority 3
Stage has exited at priority 3

In this case the Action protobuf argument to parseOutputDirectories:

  • has an empty output_files list
  • has a non-empty output_directories with a single entry bazel-out/host/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg.

Which always leads to an exception being thrown in

private OutputDirectory parseOutputDirectories(Iterable<String> outputFiles, Iterable<String> outputDirs) {
OutputDirectory outputDirectory = new OutputDirectory();
Stack<OutputDirectory> stack = new Stack<>();
OutputDirectory currentOutputDirectory = outputDirectory;
String prefix = "";
for (String outputFile : outputFiles) {
while (!outputFile.startsWith(prefix)) {
currentOutputDirectory = stack.pop();
int upPathSeparatorIndex = prefix.lastIndexOf('/', prefix.length() - 2);
prefix = prefix.substring(0, upPathSeparatorIndex + 1);
}
String prefixedFile = outputFile.substring(prefix.length());
int separatorIndex = prefixedFile.indexOf('/');
while (separatorIndex >= 0) {
if (separatorIndex == 0) {
throw new IllegalArgumentException("double separator in output file");
}
String directoryName = prefixedFile.substring(0, separatorIndex);
prefix += directoryName + '/';
prefixedFile = prefixedFile.substring(separatorIndex + 1);
stack.push(currentOutputDirectory);
OutputDirectory nextOutputDirectory = new OutputDirectory();
currentOutputDirectory.directories.put(directoryName, nextOutputDirectory);
currentOutputDirectory = nextOutputDirectory;
separatorIndex = prefixedFile.indexOf('/');
}
}
if (!Iterables.isEmpty(outputDirs)) {
throw new IllegalArgumentException("outputDir specified: " + outputDirs);
}
return outputDirectory;
}

Not sure if this is the same issue as @jmillikin-stripe is reporting in bazelbuild/rules_go#1507.

Or perhaps it is related to #128.

Undeclared inclusion when worker is running on Windows

Trying to compile a small sample project (one .cc file one .h file) with a worker running on Windows:

BUILD file:

cc_binary(
    name = "product",
    srcs = ["main.cc",
            "main.h"],
)

When using a windows worker I get the following error:

INFO: Analysed target //main:product (0 packages loaded).
INFO: Found 1 target...
ERROR: C:/git/example/main/BUILD:3:1: undeclared inclusion(s) in rule '//main:product':
this rule is missing dependency declarations for the following files included by 'main/main.cc':
  'C:/bazel/build/default_memory_instance/operations/415c9ac2-20e2-40b3-9815-3d72b873245b/main/main.h'
Target //main:product failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1,172s, Critical Path: 0,84s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

Building the same with a linux (ubuntu) worker, and it works:

Starting local Bazel server and connecting to it...
INFO: Analysed target //main:product (8 packages loaded).
INFO: Found 1 target...
Target //main:product up-to-date:
  bazel-bin/main/product
INFO: Elapsed time: 7.017s, Critical Path: 0.51s
INFO: 3 processes: 3 remote cache hit.
INFO: Build completed successfully, 7 total actions

Also tried the following:
Windows server - Windows worker [Fail]
Linux server - Linux worker [Success]
Linux server - Windows worker [Fail]

client: bazel 0.18
bazel-buildfarm SHA1: ec7a053

java.lang.IllegalStateException thrown from MemoryInstance.java

Hi,

buildfarm-server gave me the following error. The source revision I used was 8cdd8f0 (master) with pull/179 applied on top it.

SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@76245ff9
java.lang.IllegalStateException
        at build.buildfarm.instance.memory.MemoryInstance.updateOperationWatchers(MemoryInstance.java:167)
        at build.buildfarm.instance.AbstractServerInstance.putOperation(AbstractServerInstance.java:640)
        at build.buildfarm.instance.memory.MemoryInstance.putOperation(MemoryInstance.java:221)
        at build.buildfarm.server.OperationQueueService.put(OperationQueueService.java:91)
        at build.buildfarm.v1test.OperationQueueGrpc$MethodHandlers.invoke(OperationQueueGrpc.java:369)
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

The error reported by bazel (0.16) is this:

ERROR: /home/ubuntu/driving/src/program/plant_evaluator/plant_only/BUILD:36:1: C++ compilation of rule '//program/plant_evaluator/plant_only:plant_evaluator_test' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed: com.google.devtools.build.lib.remote.Retrier$RetryException: Call failed with not retriable error: java.lang.IllegalStateException: Unexpected result of remote execution: no output files.
	at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:261)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:113)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:132)
	at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:212)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:95)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:63)
	at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
	at com.google.devtools.build.lib.rules.cpp.SpawnGccStrategy.execWithReply(SpawnGccStrategy.java:66)
	at com.google.devtools.build.lib.rules.cpp.CppCompileAction.execute(CppCompileAction.java:1024)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:978)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:910)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:120)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:763)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:718)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:457)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:513)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:227)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:400)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unexpected result of remote execution: no output files.
	at com.google.common.base.Preconditions.checkState(Preconditions.java:504)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:97)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$1(GrpcRemoteExecutor.java:173)
	at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:243)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:113)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:143)
	at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:243)

I used the following parameters:

  • cas_max_size_bytes: 107374182400 (100G) in server.config.example
  • cas_cache_max_size_bytes: 107374182400 (100G) in worker.config.example
  • execute_stage_width: 32 in worker.config.example

All the other parameters were left untouched.

I also used --jvm_flag=-Xmx120g to give the server enough heap memory.

I do have a remote grpc log for this, but it's probably too large (700MB) to attach here.

Inconsistent action timeouts

if (timeout.getSeconds() > maximum.getSeconds() ||
(timeout.getSeconds() == maximum.getSeconds() && timeout.getNanos() > maximum.getNanos())) {
return false;
}

Preconditions.checkState(
timeout.getSeconds() < maximum.getSeconds() ||
(timeout.getSeconds() == maximum.getSeconds() && timeout.getNanos() < maximum.getNanos()));
}

Notice that the second code sample means that an "enormous/eternal" test in bazel will be refused by bazel-buildfarm, when using the sample bazel-buildfarm configuration, as reported in #198.

# a limit on the action timeout specified in the action, above which
# the operation will report a failed result immediately
maximum_action_timeout: {
seconds: 3600
nanos: 0
}

I suggest that in the second instance, the code is changed to read timeout.getNanos() <= maximum.getNanos() (add equals).

Remove getTree implementation/usage

The getTree method has been deprecated and should no longer be used.

Adapt the OperationQueue::take response to include both the Action and Directory Tree list

Handle Server errors

When I start Worker with a instanceName not known to Server I get:

Exception in thread "main" io.grpc.StatusRuntimeException: UNKNOWN
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:227)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:208)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:141)
	at build.buildfarm.v1test.OperationQueueGrpc$OperationQueueBlockingStub.take(OperationQueueGrpc.java:209)
	at build.buildfarm.instance.stub.StubInstance.match(StubInstance.java:330)
	at build.buildfarm.worker.Worker.start(Worker.java:171)
	at build.buildfarm.worker.Worker.main(Worker.java:536)

(Server throws exception)

Not sure how you are supposed to pass/handle errors with GRPC, but the current approach is probably not the best we can do.

server and worker java_binary targets public

Currently one cannot use the output artifacts in other bazel rules.

Would you be open to a PR that makes the @build_buildfarm//src/main/java/build/buildfarm:buildfarm-server and @build_buildfarm//src/main/java/build/buildfarm:buildfarm-worker targets declare public visibility?

Example .bazelrc suggests wrong hash function

I've just tried using the instructions in the README.md to build java-tutorial from the Bazel examples. The server throws this error:

java.lang.NumberFormatException: [4bcb5221829460f4ad398ec9085d25d95cd18204] is not a valid SHA256 hash.

I think that's because the example .bazelrc contains startup --host_jvm_args=-Dbazel.DigestFunction=SHA1. Changing that to SHA256 enabled me to do a remote build.

Document default settings

Since Protocol Buffers 3 does not have default values anymore, these are solely controlled through the implementation. It should be documented, what the default/fallback values for config fields are.

Windows: Crash when setting PosixFilePermissions

Hi there,

I tried running the buildfarm on Windows, got both the server and the client to build and run, but encountered a worker crash when the CAS cache sets posix file permissions, see CASFileCache.java:353.

Terminal output:

Exception in thread "Thread-2" java.lang.UnsupportedOperationException
        at java.nio.file.Files.setPosixFilePermissions(Files.java:2044)
        at build.buildfarm.worker.CASFileCache.setPermissions(CASFileCache.java:355)
        at build.buildfarm.worker.CASFileCache.put(CASFileCache.java:339)
        at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:196)
        at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:212)
        at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:212)
        at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:212)
        at build.buildfarm.worker.operationqueue.Worker.linkInputs(Worker.java:212)
        at build.buildfarm.worker.operationqueue.Worker.fetchInputs(Worker.java:178)
        at build.buildfarm.worker.operationqueue.Worker.access$300(Worker.java:71)
        at build.buildfarm.worker.operationqueue.Worker$2.createActionRoot(Worker.java:392)
        at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:50)
        at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:61)
        at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:42)
        at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:49)
        at java.lang.Thread.run(Thread.java:748)

Happy to contribute patches if needed. Is Windows support generally planned?

UNIMPLEMENTED: Method not found: google.devtools.remoteexecution.v1test.ActionCache/GetActionResult

After upgrading to latest master I started seeing the following error when building:

[0 / 761] [-----] BazelWorkspaceStatusAction stable-status.txt
react    | [15 / 791] [-----] Writing file build-info-redacted.properties
react    | [4,876 / 13,281] [-----] Writing file src/frontend/module2/mod37/friends/friend5/src_es5_tsconfig.json
react    | ERROR: /root/.cache/bazel/_bazel_root/71bb567c2ca1228670a0714ec6ce73d9/external/bazel_tools/tools/jdk/BUILD:192:1: Executing genrule @bazel_tools//tools/jdk:platformclasspath failed (Exit 34). Note: Remote connection/protocol failed with: executionfailed: com.google.devtools.build.lib.remote.Retrier$RetryException: Call failed with not retriable error: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Method not found: google.devtools.remoteexecution.v1test.ActionCache/GetActionResult
react    |      at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:261)
react    |      at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:113)
react    |      at com.google.devtools.build.lib.remote.GrpcRemoteCache.getCachedActionResult(GrpcRemoteCache.java:356)
react    |      at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:168)
react    |      at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:95)
react    |      at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:63)
react    |      at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
react    |      at com.google.devtools.build.lib.analysis.actions.SpawnAction.internalExecute(SpawnAction.java:287)
react    |      at com.google.devtools.build.lib.rules.genrule.GenRuleAction.internalExecute(GenRuleAction.java:83)
react    |      at com.google.devtools.build.lib.analysis.actions.SpawnAction.execute(SpawnAction.java:294)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:978)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:910)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:120)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:763)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:718)
react    |      at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
react    |      at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:457)
react    |      at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:513)
react    |      at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:227)
react    |      at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:400)
react    |      at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
react    |      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
react    |      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
react    |      at java.base/java.lang.Thread.run(Unknown Source)
react    | Caused by: io.grpc.StatusRuntimeException: UNIMPLEMENTED: Method not found: google.devtools.remoteexecution.v1test.ActionCache/GetActionResult
react    |      at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
react    |      at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
react    |      at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
react    |      at com.google.devtools.remoteexecution.v1test.ActionCacheGrpc$ActionCacheBlockingStub.getActionResult(ActionCacheGrpc.java:335)
react    |      at com.google.devtools.build.lib.remote.GrpcRemoteCache.lambda$getCachedActionResult$2(GrpcRemoteCache.java:359)
react    |      at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:243)
react    |      ... 23 more

I am assuming the error is on my side, but I did see some commit related to a v2 version of the remote execution api.

Is this perhaps an issue where certain rules are not compatible with the new api since there are UNIMPLEMENTED: Method not found errors?

Add example with two worker instances

I have been trying to get two default worker instances (default_memory_instance and another_memory_instance) to run on two different machines.

In my setup default_memory_instance runs on the same computer as the server instance.

another_memory_instance runs on a different computer, so I have updated the config to point to the remote server instance.

Both workers spin up without error, but I from what I can tell, only the worker designated as default_instance_name receives traffic. I can redirect builds to any of the workers by changing the default instance. But I can't seem to scale the build by running both workers in parallel.

Changing it to default_instance_name: "" as suggested in the config makes the build error out.
The only error msg I see is: Note: Remote connection/protocol failed with: execution failed

Is there an example available with a working multi worker/multi server setup?

Handle and make build identification (headers) queryable

Recent changes have augmented most remote ex calls with headers available to identify and group requests under builds with additional metadata. This information should be queryable, likely through filters on listOperations, and possibly through other api mechanisms to be defined by remote ex/buildfarm

java.nio.file.DirectoryNotEmptyException at build.buildfarm.worker.CASFileCache$2.postVisitDirectory(CASFileCache.java:225); java.lang.NullPointerException at build.buildfarm.worker.CASFileCache.decrementReferencesSynchronized(CASFileCache.java:166)

Hi, I started BuildFarm on local machine, using the example configs, and pointed a test //... of https://github.com/jdanekrh/activemq-artemis/tree/bazel to it. I was running the coordinator server without logging, I had debug logging on my workers.

I ran build //... first, which passed fine. Then I ran test //... and went to sleep ;P Today, I found bazel finished with

Executed 19 out of 648 tests: 15 tests pass, 4 fail remotely and 629 were skipped.
FAILED: Build did NOT complete successfully

and both workers were repeatedly printing into log

WARNING: Executor::run(default_memory_instance/operations/4631350f-02fd-4c18-8c15-8fa09d5160bd): could not transition to EXECUTING [Sat Nov 24 12:03:19 CET 2018]
FINE: ExecuteActionStage::iterate(default_memory_instance/operations/4631350f-02fd-4c18-8c15-8fa09d5160bd): 2.17600ms (0.00000ms stalled) exit code: -1, 0/1 [Sat Nov 24 12:03:19 CET 2018]
FINE: MatchStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): Starting [Sat Nov 24 12:03:19 CET 2018]
FINE: MatchStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): 23.3410ms (12.3980ms stalled) Success [Sat Nov 24 12:03:19 CET 2018]
FINE: InputFetchStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): Starting [Sat Nov 24 12:03:19 CET 2018]
FINE: InputFetchStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): 11.9310ms (0.0150000ms stalled) Success [Sat Nov 24 12:03:19 CET 2018]
FINE: ExecuteActionStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): 1/1 [Sat Nov 24 12:03:19 CET 2018]
FINE: MatchStage::iterate(): Starting [Sat Nov 24 12:03:19 CET 2018]
WARNING: Executor::run(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): could not transition to EXECUTING [Sat Nov 24 12:03:19 CET 2018]
FINE: ExecuteActionStage::iterate(default_memory_instance/operations/12b0c40d-5ea5-41f4-9974-e35bcccb63b8): 1.09100ms (0.00000ms stalled) exit code: -1, 0/1 [Sat Nov 24 12:03:19 CET 2018]
FINE: MatchStage::iterate(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): Starting [Sat Nov 24 12:03:28 CET 2018]
FINE: MatchStage::iterate(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): 9319.59ms (12.4410ms stalled) Success [Sat Nov 24 12:03:28 CET 2018]
FINE: InputFetchStage::iterate(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): Starting [Sat Nov 24 12:03:28 CET 2018]
FINE: InputFetchStage::iterate(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): 14.1470ms (0.0240000ms stalled) Success [Sat Nov 24 12:03:28 CET 2018]
FINE: MatchStage::iterate(): Starting [Sat Nov 24 12:03:28 CET 2018]
FINE: ExecuteActionStage::iterate(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): 1/1 [Sat Nov 24 12:03:28 CET 2018]
WARNING: Executor::run(default_memory_instance/operations/e4bd4d90-52c7-42a2-a92c-e5ee5a9d5c3d): could not transition to EXECUTING [Sat Nov 24 12:03:29 CET 2018]

I've restarted one worker, I let the other one sitting around, continue printing that "could not transition to EXECUTING" message.

Then I have one more bazel run in log, which ended with bazel printing

ERROR: /home/jdanek/projects/tmp/activemq-artemis/tests/integration-tests/BUILD:186:1:  failed (Exit 34). Note: Remote connection/protocol failed with: execution failed Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION

I do not have worker logs for this, sorry.... I honestly forgot the details of this run.

Anyways, I've started bazel one more time, and my worker then died with exception

[...]
WARNING: Executor::run(default_memory_instance/operations/3a7762c7-c6a5-4974-9511-aad9e5598b95): could not transition to EXECUTING [Sat Nov 24 11:10:17 CET 2018]
FINE: ExecuteActionStage::iterate(default_memory_instance/operations/3a7762c7-c6a5-4974-9511-aad9e5598b95): 1.38500ms (0.00000ms stalled) exit code: -1, 0/1 [Sat Nov 24 11:10:17 CET 2018]
FINE: MatchStage::iterate(default_memory_instance/operations/218e1e88-798c-4482-b60c-e71e3efc1358): 217.134ms (213.143ms stalled) Success [Sat Nov 24 11:10:17 CET 2018]
FINE: InputFetchStage::iterate(default_memory_instance/operations/218e1e88-798c-4482-b60c-e71e3efc1358): Starting [Sat Nov 24 11:10:17 CET 2018]
java.nio.file.DirectoryNotEmptyException: /tmp/worker/default_memory_instance/operations/218e1e88-798c-4482-b60c-e71e3efc1358/bazel-out/k8-fastbuild/bin/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/client/ConcurrentCreateDeleteProduceTest.runfiles/__main__/target/tmp/junit6779563616686032929/page0-L/14b68833-efd1-11e8-bad1-62a6d3e94057
	at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
	at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
	at java.nio.file.Files.delete(Files.java:1126)
	at build.buildfarm.worker.CASFileCache$2.postVisitDirectory(CASFileCache.java:225)
	at build.buildfarm.worker.CASFileCache$2.postVisitDirectory(CASFileCache.java:216)
	at java.nio.file.Files.walkFileTree(Files.java:2688)
	at java.nio.file.Files.walkFileTree(Files.java:2742)
	at build.buildfarm.worker.CASFileCache.removeDirectory(CASFileCache.java:216)
	at build.buildfarm.worker.operationqueue.Worker$2.createActionRoot(Worker.java:396)
	at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:58)
	at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:69)
	at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:45)
	at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:52)
	at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-2" java.lang.NullPointerException
	at build.buildfarm.worker.CASFileCache.decrementReferencesSynchronized(CASFileCache.java:166)
	at build.buildfarm.worker.CASFileCache.decrementReferences(CASFileCache.java:158)
	at build.buildfarm.worker.operationqueue.Worker$2.destroyActionRoot(Worker.java:421)
	at build.buildfarm.worker.InputFetchStage.tick(InputFetchStage.java:68)
	at build.buildfarm.worker.PipelineStage.iterate(PipelineStage.java:69)
	at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:45)
	at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:52)
	at java.lang.Thread.run(Thread.java:748)
INFO: Stage has exited at priority 1 [Sat Nov 24 11:10:18 CET 2018]
INFO: Stage has exited at priority 2 [Sat Nov 24 11:10:18 CET 2018]
INFO: Closing stage at priority 4 [Sat Nov 24 11:10:19 CET 2018]
INFO: Interrupting unterminated closed thread at priority 4 [Sat Nov 24 11:10:19 CET 2018]
INFO: Stage has exited at priority 4 [Sat Nov 24 11:10:19 CET 2018]
INFO: Closing stage at priority 3 [Sat Nov 24 11:10:20 CET 2018]
INFO: Interrupting unterminated closed thread at priority 3 [Sat Nov 24 11:10:20 CET 2018]
INFO: Stage has exited at priority 3 [Sat Nov 24 11:10:20 CET 2018]

and bazel again finished prematurely but cleanly (maybe because I only had that one worker active, and the coordinator server ended the build when there weren't workers available?)

//tests/integration-tests:src/test/java/org/apache/activemq/artemis/tests/integration/rest/RestDeserializationTest FAILED in 17.2s
  /home/jdanek/.cache/bazel/_bazel_jdanek/bd71841a1890c0ee9828b055e789fb65/execroot/__main__/bazel-out/k8-fastbuild/testlogs/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/rest/RestDeserializationTest/test.log

Executed 218 out of 648 tests: 216 tests pass, 8 fail remotely and 424 were skipped.
FAILED: Build did NOT complete successfully

I am running bazel from the rhel 7 package

% bazel version
Build label: 0.19.2- (@non-git)
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Nov 19 18:07:49 2018 (1542650869)
Build timestamp: 1542650869
Build timestamp as int: 1542650869

and I have buildfarm running on the same machine, compiled from master ec7a053

Error building the samples on latest version of Bazel (v0.15)

After upgrading to the latest version of Bazel (version 0.15) on OSX I am seeing an error when building the sample:

Running

bazel build //src/main/java/build/buildfarm:buildfarm-server && bazel-bin/src/main/java/build/buildfarm/buildfarm-server examples/server.config.example --verbose_failures

Here is the error I am seeing:

INFO: Analysed target //src/main/java/build/buildfarm:buildfarm-server (0 packages loaded).
INFO: Found 1 target...
ERROR: /private/var/tmp/_bazel_tor/d8f7af2381d65437490eef72d27a0bdb/external/com_google_protobuf/BUILD:260:1: Linking of rule '@com_google_protobuf//:js_embed' failed (Exit 1)
ld: unknown option: -no-as-needed
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //src/main/java/build/buildfarm:buildfarm-server failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.438s, Critical Path: 0.28s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Could this mean that something is using some deprecated Bazel option (-no-as-needed)?

IllegalArgumentException: Cannot put empty blob

Hi,

I got the following IllegalArgumentException exception from buildfarm-server:

Jun 22, 2018 9:37:52 AM io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@59e50929
java.lang.IllegalArgumentException: Cannot put empty blob
	at build.buildfarm.instance.memory.MemoryLRUContentAddressableStorage.put(MemoryLRUContentAddressableStorage.java:76)
	at build.buildfarm.instance.memory.DelegateCASMap.put(DelegateCASMap.java:51)
	at build.buildfarm.instance.memory.DelegateCASMap.put(DelegateCASMap.java:31)
	at build.buildfarm.instance.AbstractServerInstance.putActionResult(AbstractServerInstance.java:120)
	at build.buildfarm.server.ActionCacheService.updateActionResult(ActionCacheService.java:70)
	at com.google.devtools.remoteexecution.v1test.ActionCacheGrpc$MethodHandlers.invoke(ActionCacheGrpc.java:443)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
	at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
	at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The exception happened with a3ecfde (latest as of 6/22/2018).
I can provide a remote grpc log, if that helps.

Someone forgot to commit src/main/protobuf/BUILD

$ bazel build src/main/java/build/buildfarm

src/main/java/build/buildfarm/BUILD:64:1: no such package 'src/main/protobuf': BUILD file not found on package path and referenced by '//src/main/java/build/buildfarm:buildfarm'

Remote connection/protocol failed with: execution failed Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION

bazel test //... dies for me with

ERROR: /home/fedora/activemq-artemis/tests/integration-tests/BUILD:186:1:  failed (Exit 34). Note: Remote connection/protocol failed with: execution failed Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Call failed with not retriable error: io.grpc.StatusRuntimeException: FAILED_PRECONDITION

I tried adding debugging options to cmd line, which did not help much (I could not figure out anything from it). The options I tried were --experimental_remote_grpc_log=grpc_log.bin --explain explanations.log --verbose_explanations --jobs=1 and the logs produced are attached.

When I put e.printStackTrace() at appropriate place into buildfarm-server source, I got

java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:485)
        at build.buildfarm.instance.memory.MemoryInstance.validateAction(MemoryInstance.java:335)
        at build.buildfarm.instance.AbstractServerInstance.execute(AbstractServerInstance.java:536)
        at build.buildfarm.server.ExecutionService.execute(ExecutionService.java:99)
        at build.bazel.remote.execution.v2.ExecutionGrpc$MethodHandlers.invoke(ExecutionGrpc.java:497)
        at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
        at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:33)
        at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
        at io.grpc.util.TransmitStatusRuntimeExceptionInterceptor$1.onHalfClose(TransmitStatusRuntimeExceptionInterceptor.java:74)
        at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:271)
        at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:648)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

How does Worker tie together with Server

I seem to not really understand the point of the Workers. On the one hand they are independent binaries.
On the other, you have to configure the instance on the Server. As such, when adding a Worker you have to restart the Server, which seems odd to me.

"Is a directory" failure

Hello,

I tried to use buildfarm for build my project.
With notes from #76 it works, except following problem:

java.io.IOException: Is a directory
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:159)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
	at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
	at com.google.protobuf.ByteString.readChunk(ByteString.java:493)
	at com.google.protobuf.ByteString.readFrom(ByteString.java:467)
	at com.google.protobuf.ByteString.readFrom(ByteString.java:429)
	at build.buildfarm.worker.Worker.execute(Worker.java:247)
	at build.buildfarm.worker.Worker.lambda$start$0(Worker.java:167)
	at build.buildfarm.instance.stub.StubInstance.match(StubInstance.java:330)
	at build.buildfarm.worker.Worker.start(Worker.java:165)
	at build.buildfarm.worker.Worker.main(Worker.java:528)

rules_go declare some output as file (by "ctx.actions.declare_file"), but really returns the directory.
Same problem with our in-house rule for creating virtualenv.

It is hack, but it works locally.

Without this hash would be too complex to support tools like virtualenv, or golang compilter.

Can we allow behavior like this for buildfarm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.