Code Monkey home page Code Monkey logo

java-bigtable-hbase's Introduction

Maven Stack Overflow

Google Cloud Bigtable is Google's NoSQL Big Data database service. It's the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail.

Bigtable is designed to handle massive workloads at consistent low latency and high throughput, so it's a great choice for both operational and analytical applications, including IoT, user analytics, and financial data analysis.

Bigtable provisions and scales to hundreds of petabytes automatically, and can smoothly handle millions of operations per second. Changes to the deployment configuration are immediate, so there is no downtime during reconfiguration.

Bigtable integrates easily with popular Big Data tools like Hadoop, as well as Google Cloud Platform products like Cloud Dataflow and Dataproc. Plus, Bigtable supports the open-source, industry-standard HBase API, which makes it easy for development teams to get started.

Note: These artifacts are meant to wrap HBase over the Bigtable API. If you are looking for a Java client to access Bigtable APIs directly, please use google-cloud-bigtable.

Project setup, installation, and configuration

Prerequisites

Using the Java client

  • Add the appropriate Cloud Bigtable artifact dependencies to your Maven project.
    • bigtable-hbase-1.x: use for standalone applications where you are in control of your dependencies.
    • bigtable-hbase-1.x-hadoop: use in hadoop environments.
    • bigtable-hbase-1.x-mapreduce: use for map/reduce utilities.
    • bigtable-hbase-1.x-shaded: use in environments (other than hadoop) that require older versions of protobuf, guava, etc.
    • bigtable-hbase-2.x: use for standalone applications where you are in control of your dependencies. This includes an HBase async client.
    • bigtable-hbase-2.x-hadoop: use in hadoop environments.
    • bigtable-hbase-2.x-shaded: use in environments (other than hadoop) that require older versions of protobuf, guava, etc.

Maven:

<dependency>
  <groupId>com.google.cloud.bigtable</groupId>
  <artifactId>bigtable-hbase-1.x</artifactId>
  <version>2.6.5</version>
</dependency>

Gradle:

compile 'com.google.cloud.bigtable:bigtable-hbase-1.x:2.6.5'

SBT:

libraryDependencies += "com.google.cloud.bigtable" % "bigtable-hbase-1.x" % "2.6.5"

Enabling Client Side Metrics

Cloud Bigtable client supports publishing client side metrics to Cloud Monitoring under the bigtable.googleapis.com/client namespace.

This feature is available once you upgrade to version 2.6.4 and above. Follow the guide on https://cloud.google.com/bigtable/docs/client-side-metrics-setup to enable.

Note: Beam / Dataflow integration is currently not supported.

OpenCensus Integration

Tracing

The code example below shows how to enable tracing. For more details, see here.

Maven Setup

If you are not using the shaded Bigtable HBase Client artifact, you need to define the OpenCensus dependencies.

<!-- OpenCensus dependencies -->
<dependency>
    <groupId>com.google.cloud.bigtable</groupId>
    <artifactId>bigtable-hbase-1.x</artifactId>
    <version>2.6.5</version>
</dependency>
<dependency>
    <groupId>io.opencensus</groupId>
    <artifactId>opencensus-impl</artifactId>
    <version>0.31.1</version>
    <scope>runtime</scope>
</dependency>
<dependency>
    <groupId>io.opencensus</groupId>
    <artifactId>opencensus-exporter-trace-stackdriver</artifactId>
    <version>0.31.1</version>
    <exclusions>
        <exclusion>
            <groupId>io.grpc</groupId>
            <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.google.auth</groupId>
            <artifactId>*</artifactId>
        </exclusion>
    </exclusions>
</dependency>

If you are using the shaded Bigtable HBase Client artifact, then the OpenCensus dependencies are embedded in the shaded artifact; i.e. nothing additional for you to do.

<!-- OpenCensus dependencies -->
<dependency>
    <groupId>com.google.cloud.bigtable</groupId>
    <artifactId>bigtable-hbase-1.x-shaded</artifactId>
    <version>2.6.5</version>
</dependency>
Java Example
// For the non-shaded client, remove the package prefix "com.google.bigtable.repackaged."
import com.google.bigtable.repackaged.io.opencensus.exporter.trace.stackdriver.StackdriverTraceConfiguration;
import com.google.bigtable.repackaged.io.opencensus.exporter.trace.stackdriver.StackdriverTraceExporter;
import com.google.bigtable.repackaged.io.opencensus.trace.Tracing;
import com.google.bigtable.repackaged.io.opencensus.trace.samplers.Samplers;

import java.io.IOException;

public class OpenCensusExample {
    String projectId = "your-project-id";

    void setupTracing() throws Exception {
        // Setup tracing.
        StackdriverTraceExporter.createAndRegister(
                StackdriverTraceConfiguration.builder()
                        .setProjectId(projectId)
                        .build()
        );
        Tracing.getTraceConfig().updateActiveTraceParams(
                Tracing.getTraceConfig().getActiveTraceParams().toBuilder()
                        // Adjust the sampling rate as you see fit.
                        .setSampler(Samplers.probabilitySampler(0.01))
                        .build()
        );
    }
}

Stats


Note: We recommend enabling client side built-in metrics if you want to view your metrics on cloud monitoring. This integration is only for exporting the metrics to a third party dashboard.

The code example below shows how to enable metrics. For more details, see the gRPC Java Guide.

Maven Setup

If you are not using the shaded Bigtable HBase Client artifact, you need to define the OpenCensus dependencies.

<!-- OpenCensus dependencies -->
<dependency>
    <groupId>com.google.cloud.bigtable</groupId>
    <artifactId>bigtable-hbase-1.x</artifactId>
    <version>2.6.5</version>
</dependency>
<dependency>
    <groupId>io.opencensus</groupId>
    <artifactId>opencensus-impl</artifactId>
    <version>0.31.1</version>
    <scope>runtime</scope>
</dependency>
<dependency>
    <groupId>io.opencensus</groupId>
    <artifactId>opencensus-exporter-stats-stackdriver</artifactId>
    <version>0.31.1</version>
    <exclusions>
        <exclusion>
            <groupId>io.grpc</groupId>
            <artifactId>*</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.google.auth</groupId>
            <artifactId>*</artifactId>
        </exclusion>
    </exclusions>
</dependency>

If you are using the shaded Bigtable HBase Client artifact, then the OpenCensus dependencies are embedded in the shaded artifact; i.e. nothing additional for you to do.

<!-- OpenCensus dependencies -->
<dependency>
    <groupId>com.google.cloud.bigtable</groupId>
    <artifactId>bigtable-hbase-1.x-shaded</artifactId>
    <version>2.6.5</version>
</dependency>
Java Example
// For the non-shaded client, remove the package prefix "com.google.bigtable.repackaged."
import com.google.bigtable.repackaged.io.opencensus.contrib.grpc.metrics.RpcViews;
import com.google.bigtable.repackaged.io.opencensus.exporter.stats.stackdriver.StackdriverStatsConfiguration;
import com.google.bigtable.repackaged.io.opencensus.exporter.stats.stackdriver.StackdriverStatsExporter;

import java.io.IOException;

public class OpenCensusExample {
    String projectId = "your-project-id";

    void setupStatsExport() throws Exception {
        // Option 1: Automatic Configuration (from GCP Resources only):
        // If you are running from a GCP Resource (e.g. a GCE VM), the Stackdriver metrics are automatically
        // configured to upload to your resource.
        // For examples of monitored resources, see here: https://cloud.google.com/monitoring/api/resources
        StackdriverStatsExporter.createAndRegister();

        // Then register your gRPC views in OpenCensus.
        RpcViews.registerClientGrpcViews();


        // Option 2: Manual Configuration
        // If you are not running from a GCP Resource (e.g. if you are running on-prem), then you should
        // configure the monitored resource yourself.
        // Use the code snippet below as a starting example.
        // For examples of monitored resources, see here: https://cloud.google.com/monitoring/api/resources
        StackdriverStatsExporter.createAndRegister(
                StackdriverStatsConfiguration.builder()
                        .setProjectId(projectId)
                        // This example uses generic_node as the MonitoredResource, with your host name as the node ID.
                        .setMonitoredResource(MonitoredResource.newBuilder()
                                .setType("generic_node")
                                .putLabels("project_id", projectId)
                                .putLabels("location", "us-west1-b")  // Specify the region in which your service is running (e.g. us-west1-b). 
                                .putLabels("namespace", "anyNamespaceYouChoose")
                                .putLabels("node_id", InetAddress.getLocalHost().getHostName())  // Specify any node you choose (e.g. the local hostname).
                                .build())
                        .build()
        );
        
        // Then register your gRPC views in OpenCensus.
        RpcViews.registerClientGrpcViews();
    }
}
Viewing Your Metrics in Google Cloud Console

The above steps will expose Bigtable's gRPC metrics under the custom.googleapis.com/opencensus/grpc.io/client prefix.

Follow these instructions for viewing the metrics in Google Cloud Console.

Be sure to choose your Resource Type as the one you defined in your Stackdriver configuration in the code.

Questions and discussions

If you have questions or run into issues with Google Cloud Bigtable or the client libraries, use any of the following forums:

You can also subscribe to google-cloud-bigtable-announce@ list to receive infrequent product and client library announcements.

Contributing

Contributions to this library are always welcome and highly encouraged.

See CONTRIBUTING for more information how to get started.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See Code of Conduct for more information.

License

Apache 2.0 - See LICENSE for more information.

CI Status

Java Version Status
Java 8 Kokoro CI
Java 8 OSX Kokoro CI
Java 11 Kokoro CI
Integration Kokoro CI

Java is a registered trademark of Oracle and/or its affiliates.

java-bigtable-hbase's People

Contributors

ajaaym avatar angusdavis avatar astath avatar billyjacobson avatar carterpage avatar chingor13 avatar dmitry-fa avatar dmmcerlean avatar elisheva-qlogic avatar garye avatar gcf-owl-bot[bot] avatar hegemonic avatar igorbernstein2 avatar kevinsi4508 avatar kolea2 avatar mbrukman avatar mgarolera avatar mutianf avatar neenu1995 avatar pawan-qlogic avatar rahulkql avatar rameshdharan avatar release-please[bot] avatar renovate-bot avatar rupeshitpatekarq avatar sduskis avatar spollapally avatar sushanb avatar vermas2012 avatar yoshi-automation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

java-bigtable-hbase's Issues

Too many open files

Hello,
With a fresh bigtable database, we serve put requests that will create a table if it is not found (exception case).

With large load, and emptied bigtable, we will create a lot of tables, and we have a too many open files error.

lsof shows large amount of lines.

[FATAL] 16-09-2015 16:12:43,670 org.apache.hadoop.conf.Configuration loadResource - error parsing conf hbase-site.xml
java.io.FileNotFoundException: /tmp/jetty-0.0.0.0-8080-globulus.war-_globulus-any-7404471105146041798.dir/webapp/WEB-INF/classes/hbase-site.xml (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
at java.net.URL.openStream(URL.java:1038)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2205)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:858)
at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877)
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278)
at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:67)
at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:81)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:96)
at org.apache.hadoop.hbase.HColumnDescriptor.(HColumnDescriptor.java:147)
at org.apache.hadoop.hbase.HTableDescriptor.(HTableDescriptor.java:1456)
at com.accengage.globulus.util.bigtable.BigtableHelper.createTable(BigtableHelper.java:78)

Use GrpcSslContexts.forClient()

Currently, our configuration for GrpcSslContexts.forClient() breaks a client by not finding an appropriate cypher. We should fix the issue and use GrpcSslContexts.forClient() instead of GrpcSslContexts.forClient() in BigtableSession.

BigTable should repackage io.grpc ?

It appears that the hbase bigtable jar is shipping a version of the io.grpc.protobuf.ProtoUtils
that's been shaded to operate on com.google.bigtable.repackaged.com.google.protobuf.*
messages.

This makes it impossible to build applications that are using both BigTable and GRPC (since our own classes need a version of ProtoUtils that works om com.google.protobuf.Messages)

My impression is that BigTable's ProtoUtils should be repackaged too to a private package?

RetriesExhaustedWithDetailsException within DataFlow pipeline causing effectively a halt of the pipeline

This is a followup to http://stackoverflow.com/questions/32486421/exceptions-in-googles-dataflow-pipelines-to-bigtable:

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 129 actions: StatusRuntimeException: 129 times, 
   at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:408) 
   at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.doFlush(BigtableBufferedMutator.java:285) 
   at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.close(BigtableBufferedMutator.java:258) 
   at org.apache.hadoop.hbase.client.AbstractBigtableConnection$2.close(AbstractBigtableConnection.java:181) 
   at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn.finishBundle(CloudBigtableIO.java:613)

With our own DoFN and with BigTableClient-DoFN code the following appears. We'd like to use BigTableClient-DoFN as this performs better:

Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
Pipeline p = Pipeline.create(options);
CloudBigtableIO.initializeForWrite(p)
    .apply(TextIO.Read.from(options.getInputFile()))
    .apply(ParDo.named("Parse Fix").of(new DoFn<String, Mutation>() {
        @Override
        public void processElement(DoFn<String, Mutation>.ProcessContext c) {
            List<Put> putStatements = createPutStatements();

            for (Mutation mutation : putStatements) {
                c.output(mutation);
            }
    }}))
    .apply(CloudBigtableIO.writeToTable(CloudBigtableTableConfiguration.fromCBTOptions(options)));

p.run();

This seems to happen in parallel for all of our workers: The amount of 129 matches the amount of workers we have.
Looking in parallel at the BigTable metrics, the write throughput plumps which looks like no further mutations are pushed to BigTable.
It appeared 20 minutes after staring the DataFlow job. Expected execution time was something around 60 minutes.

We are using a Cluster of 60 BigTable servers.
Yesterday (same input data) with max. 27 BigTable servers the issue was not appearing.

ResultScannerBlocking

request for method:

Could we add a bool mightBlock() to the ResultScanner?

Why? I would like to be able to wrap the scanner with something like a Spool[T] from twitter util which is essentially a linked list with the tail as Future[T]. The trick to not making this ungodly efficient is to only make a new task to be deferred only when one needs to. An example:

https://github.com/websudos/phantom/blob/develop/phantom-dsl/src/main/scala/com/websudos/phantom/iteratee/ResultSpool.scala

(NB: I am actually the author of that code see here)

I think for the current impl backed by a blocking queue this is as simple as adding a method that calls isEmpty() on the underlying queue. I'd be happy to contribute a patch if you guys are ok with the idea in principle.

TTL not working, still able to get value after TTL reached

Hello,

We found an issue with bigtable, where the table as a ttl to one minute, but the value inserted not deleted.

'testBBO', {NAME => 'cd', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '60 SECONDS (1 MINUTE)', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
150 : 1
10361 : 1
20470 : 1
30594 : 1
40670 : 1
50780 : 1
60811 : 1 // should be deleted
70857 : 1
80980 : 1
91104 : 1
101225 : 1
111294 : 1
121351 : 1
131366 : 1
...
10 minutes later still here

Here is the code used:

package com.accengage.bigdata.test.bigtable;

import java.io.IOException;

import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Test;

import com.accengage.bigdata.db.bigtable.Bigtable;

public class BigtableTtl {

    private static TableName tableName = TableName.valueOf("testBBO");
    private static Admin admin;
    private static long debut;

    /**
     * delete test table, close hbase connection
     *
     * @throws IOException
     */
    @AfterClass
    public static void clean() throws IOException {
        deleteTestTable();
        Assert.assertFalse(admin.tableExists(tableName));
        admin.close();
        Bigtable.getInstance().close();
    }

    /**
     * create hbase connection, create test table, fill data
     *
     * @throws IOException
     */
    @BeforeClass
    public static void setup() throws IOException {
        final Bigtable bigtable = Bigtable.getInstance();
        admin = bigtable.getAdmin();
        deleteTestTable();
        Assert.assertFalse(admin.tableExists(tableName));
        HColumnDescriptor hcolDesc = new HColumnDescriptor(Bytes.toBytes("cd"));
        hcolDesc.setTimeToLive(60);
        final HTableDescriptor htableDescriptor = new HTableDescriptor(tableName);
        htableDescriptor.addFamily(hcolDesc);
        admin.createTable(htableDescriptor);
        Assert.assertTrue(admin.tableExists(tableName));
        fillTable();
    }

    private static void deleteTestTable() throws IOException {
        if (admin.tableExists(tableName)) {
            if (admin.isTableEnabled(tableName)) {
                admin.disableTable(tableName);
            }
            admin.deleteTable(tableName);
        }
    }

    private static void fillTable() throws IOException {
        final Bigtable bigtable = Bigtable.getInstance();
        Table table = bigtable.getTable(tableName);
        Put put = new Put(Bytes.fromHex("abcdef"));
        put.addColumn(Bytes.toBytes("cd"), Bytes.toBytes("counter"), Bytes.toBytes(1L));
        table.put(put);
        debut = System.currentTimeMillis();
        System.out.println(admin.getTableDescriptor(tableName));
    }

    @Test
    public void testTtl() throws IOException, InterruptedException {
        Get get = new Get(Bytes.fromHex("abcdef"));
        Table table = Bigtable.getInstance().getTable(tableName);
        for (int i = 0; i < 60; i++) {
            Result result = table.get(get);
            byte[] v = result.getValue(Bytes.toBytes("cd"), Bytes.toBytes("counter"));
            System.out.println((System.currentTimeMillis() - debut) + " : " + (v != null ? Bytes.toLong(v) : "null"));
            synchronized (this) {
                wait(10000);
            }
        }
    }
}

We are using the driver 0.2.1 with hbase 1.1.1.

Thank you

FuzzyRowFilterAdapter is missing from FilterAdapter

I've been trying to get FuzzyRowFilter to work, but out of the box I get the following exception:

Caused by: com.google.cloud.bigtable.hbase.adapters.filters.UnsupportedFilterException: Unsupported filters encountered: FilterSupportStatus{isSupported=false, reason='Don't know how to adapt Filter class 'class org.apache.hadoop.hbase.filter.FuzzyRowFilter''}

I can see that there's an adapter for it in the codebase but it's not in the filter map within FileAdapter. Putting it in the map causes the previous exception to obviously go away but then I get a dependency conflict:

Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.protobuf.generated.HBaseProtos$BytesBytesPair.getFirst()Lcom/google/bigtable/repackaged/com/google/protobuf/ByteString;
    at com.google.cloud.bigtable.hbase.adapters.filters.FuzzyRowFilterAdapter.extractFuzzyRowFilterPairs(FuzzyRowFilterAdapter.java:90)
    at com.google.cloud.bigtable.hbase.adapters.filters.FuzzyRowFilterAdapter.adapt(FuzzyRowFilterAdapter.java:48)
    at com.google.cloud.bigtable.hbase.adapters.filters.FuzzyRowFilterAdapter.adapt(FuzzyRowFilterAdapter.java:39)
    at com.google.cloud.bigtable.hbase.adapters.filters.SingleFilterAdapter.adapt(SingleFilterAdapter.java:56)
    at com.google.cloud.bigtable.hbase.adapters.filters.FilterAdapter.adaptFilter(FilterAdapter.java:143)
    at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.createUserFilter(ScanAdapter.java:116)
    at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.buildFilter(ScanAdapter.java:77)
    at com.google.cloud.bigtable.hbase.adapters.ScanAdapter.adapt(ScanAdapter.java:94)
    at com.google.cloud.bigtable.hbase.adapters.HBaseRequestAdapter.adapt(HBaseRequestAdapter.java:70)
    at com.google.cloud.bigtable.hbase.BigtableTable.getScanner(BigtableTable.java:206)

This seems to be a subtle dependency issue due to the mismatch between protobuf versions used by HBase and BigTable but all shading attempts on my part have been futile so far. Any idea how I can side-step this issue?

Shouldn't Cluster.hdd_bytes and Cluster.ssd_bytes be in a oneof block?

I'm implementing the Python library for the API and noticed that both hdd_bytes and ssd_bytes can be set on a CreateCluster call (and they persist when calling GetCluster).

Shouldn't the values (in the Cluster message class) be in a oneof block:

  // The maximum HDD storage usage allowed in this cluster, in bytes.
  int64 hdd_bytes = 6;

  // The maximum SSD storage usage allowed in this cluster, in bytes.
  int64 ssd_bytes = 7;

list hangs.

./quickstart.sh 
Project ID= mmmm-1141
Cluster ID= claster
Zone=       europe-west1-c
[INFO] Scanning for projects...
[INFO] 
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building testing 0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ testing ---
[INFO] Deleting /home/egnyte/pro/bigtable/quickstart/target
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ testing ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ testing ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ testing ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/egnyte/pro/bigtable/quickstart/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ testing ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ testing ---
[INFO] No tests to run.
[INFO] 
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ testing ---
[INFO] Building jar: /home/egnyte/pro/bigtable/quickstart/target/testing-0.1-SNAPSHOT.jar
[INFO] 
[INFO] --- exec-maven-plugin:1.4.0:exec (default-cli) @ testing ---
2015-11-26 16:54:26,698 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-11-26 16:54:26,805 INFO  [main] grpc.BigtableSession: Opening connection for projectId mmmm-1141, zoneId europe-west1-c, clusterId claster, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2015-11-26 16:54:26,839 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2015-11-26 16:54:27,231 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:17:13 PDT 2015

hbase(main):001:0> list
list
TABLE

java.io.IOException: The Application Default Credentials are not available in DataFlow pipeline

Create a simple File/GCS to BigTable Dataflow pipeline:

        Pipeline p = Pipeline.create(options);

        CloudBigtableIO.initializeForWrite(p)
                .apply(TextIO.Read.from(options.getInputFile()))
                .apply(ParDo.named("Parse").of(new DoFn<String, Mutation>() {
                    @Override
                    public void processElement(DoFn<String, Mutation>.ProcessContext c) {
                            List<Put> putStatements = generateStatements();

                            for (Mutation mutation : putStatements) {
                                c.output(mutation);
                            }
                    }}))
                .apply(CloudBigtableIO.writeToTable(CloudBigtableTableConfiguration.fromCBTOptions(options)));

        p.run();

Job is executed on 10 files around 3GB (+- 1.5GB) with autoscaling which jumps to 100 and 328 workers within 4 minutes.

We saw many times this - and it looks like that the same amount of messages didn't made it to BigTable (more precise investigation required)

Sep 9, 2015, 6:43:37 PM
(d8a351d35056269a): java.io.IOException: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. 
at com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:62) 
at com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:66) 
at com.google.cloud.bigtable.config.CredentialFactory.getApplicationDefaultCredential(CredentialFactory.java:173) 
at com.google.cloud.bigtable.config.CredentialFactory.getCredentials(CredentialFactory.java:100) 
at com.google.cloud.bigtable.grpc.BigtableSession$4.call(BigtableSession.java:249) 
at com.google.cloud.bigtable.grpc.BigtableSession$4.call(BigtableSession.java:246) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)

GET errors due to Connection reset under load

While running a large test with a cluster of 60, we see those exceptions bubbling up to our user code

 2015-09-14 16:49:12,018 WARN  | [pool-1-thread-8] (c.s.a.b.LifecycleBuilder:139) | An IOException has occurred Failed to perform operation. Operation='get', projectId='demo', tableName='someTable', rowKey='adbb735f-4718-45d9-badf-b106a61f8ed0|UTXQ'
 java.io.IOException: Failed to perform operation. Operation='get', projectId='sungard-cat-demo', tableName='someTable', rowKey='adbb735f-4718-45d9-badf-b106a61f8ed0|UTXQ'
 at com.google.cloud.bigtable.hbase.BigtableTable.get(BigtableTable.java:199) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.sungard.advtech.bigtable_df.LifecycleBuilder.processElement(LifecycleBuilder.java:95) ~[classes-KrHrvliWr0XWi3H_iWOlEg.zip:na]
 at com.google.cloud.dataflow.sdk.util.DoFnRunner.invokeProcessElement(DoFnRunner.java:189) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.DoFnRunner.processElement(DoFnRunner.java:171) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.ParDoFnBase.processElement(ParDoFnBase.java:193) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:52) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:171) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:117) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:66) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:234) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:171) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:137) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:147) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:132) [google-cloud-dataflow-java-sdk-all-1.0.0-rHz39Me5Bgx6ma4iSO7HHQ.jar:na]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60-ea]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60-ea]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60-ea]
 at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60-ea]
 Caused by: com.google.cloud.bigtable.grpc.io.IOExceptionWithStatus: Error in response stream
 at com.google.cloud.bigtable.grpc.scanner.ResultQueueEntry.getResponseOrThrow(ResultQueueEntry.java:66) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.grpc.scanner.StreamingBigtableResultScanner$ResponseQueueReader.getNextMergedRow(StreamingBigtableResultScanner.java:75) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.grpc.scanner.StreamingBigtableResultScanner.next(StreamingBigtableResultScanner.java:136) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.grpc.scanner.StreamingBigtableResultScanner.next(StreamingBigtableResultScanner.java:32) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.grpc.scanner.ResumingStreamingResultScanner.next(ResumingStreamingResultScanner.java:90) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.grpc.scanner.ResumingStreamingResultScanner.next(ResumingStreamingResultScanner.java:43) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.cloud.bigtable.hbase.BigtableTable.get(BigtableTable.java:195) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 ... 18 common frames omitted
 Caused by: io.grpc.StatusRuntimeException: UNKNOWN
 at io.grpc.Status.asRuntimeException(Status.java:428) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:264) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at io.grpc.ClientCallImpl$ClientStreamListenerImpl$3.run(ClientCallImpl.java:293) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 ... 3 common frames omitted
 Caused by: java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_60-ea]
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_60-ea]
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_60-ea]
 at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_60-ea]
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_60-ea]
 at com.google.bigtable.repackaged.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:854) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:115) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:510) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 at com.google.bigtable.repackaged.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:703) ~[bigtable-hbase-1.0-0.2.2-SJ-UlknvKCKvgD59j6kCsTcCg.jar:na]
 ... 1 common frames omitted

Just as a note:
bigtable-hbase-1.0-0.2.2-SJ.jar is master + #480 + #481

DataFlow CloudBigtableIO leaks threads

We have a DataFlow job that's structurally very similar to the WordCount example, reading from pubsub, producing per X minute windowed counts that are written to CBT.

We're using 0.2.3-SNAPSHOT of bigtable-hbase-dataflow and bigtable-hbase-1.1 in order to work around #613.

When deploying our job late last week we discovered that it was failing with the DF workers unable to create more threads.

java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
    at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.initializeAsyncMutators(BigtableBufferedMutator.java:206)
    at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.<init>(BigtableBufferedMutator.java:201)
    at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.<init>(BigtableBufferedMutator.java:174)
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection$2.<init>(AbstractBigtableConnection.java:193)
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection.getBufferedMutator(AbstractBigtableConnection.java:192)
    at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn.getBufferedMutator(CloudBigtableIO.java:615)
    at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn.processElement(CloudBigtableIO.java:640)

We SSH'd to one of the DF worker hosts and saw that the worker JVM process had some 32k+ threads. We attempted to redeploy our job a few times, each time hitting the same issue.

The same job deployed the week before had functioned normally so we figured that maybe this issue was caused by some recent change to the CBT client and redeployed our job with an earlier CBT client commit (6cefa81) instead, which works fine.

Looking at the recent commit log I wonder if maybe 1433a52 could be related to this.

We have not attempted to repro outside DataFlow.

Threads hang on put in closed/closing/opening

I have 3 threads stuck with the same stack trace.
Version is 0.2.2

I launched process inserting data into 1 table over 1 connection with 6 threads.
Right after that I truncated table with hbase shell.
I saw in log complains that table is not found.

exception #1548 from bigtable.googleapis.com for \x00\x04\x1A\x8F\x00\x00\x01H\x06\xD0}\x80A\x95
io.grpc.StatusRuntimeException: NOT_FOUND: Failed to read table: table1
        at io.grpc.Status.asRuntimeException(Status.java:430)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:266)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$3.run(ClientCallImpl.java:320)
        at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
exception #1549 from bigtable.googleapis.com for \x00\x04\x1A\x8F\x00\x00\x01H\x06\xD0}\x80EU
io.grpc.StatusRuntimeException: UNKNOWN
        at io.grpc.Status.asRuntimeException(Status.java:430)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:266)
        at io.grpc.ClientInterceptors$CheckedForwardingClientCall.start(ClientInterceptors.java:172)
        at io.grpc.stub.ClientCalls.startCall(ClientCalls.java:193)
        at io.grpc.stub.ClientCalls.asyncUnaryRequestCall(ClientCalls.java:173)
        at io.grpc.stub.ClientCalls.asyncUnaryRequestCall(ClientCalls.java:163)
        at io.grpc.stub.ClientCalls.asyncUnaryCall(ClientCalls.java:68)
        at com.google.cloud.bigtable.grpc.io.ClientCallService$1.listenableAsyncCall(ClientCallService.java:94)
        at com.google.cloud.bigtable.grpc.BigtableDataGrpcClient.mutateRowAsync(BigtableDataGrpcClient.java:150)
        at com.google.cloud.bigtable.grpc.async.AsyncExecutor$1.call(AsyncExecutor.java:58)
        at com.google.cloud.bigtable.grpc.async.AsyncExecutor$1.call(AsyncExecutor.java:55)
--
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Channel is closed
        at com.google.cloud.bigtable.grpc.io.ReconnectingChannel$DelayingCall.start(ReconnectingChannel.java:88)
        at com.google.cloud.bigtable.grpc.io.ChannelPool$1.checkedStart(ChannelPool.java:97)
        at io.grpc.ClientInterceptors$CheckedForwardingClientCall.start(ClientInterceptors.java:164)
        ... 42 more

After one thread got RetriesExhaustedWithDetailsException it reopened the connection.

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 27 actions: StatusRuntimeException: 27 times,
        at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:294)
        at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:318)
        at com.google.cloud.bigtable.hbase.BigtableTable.put(BigtableTable.java:280)

Rest threads access to closed/closing/opening connection. Maybe this was not good idea to do so,
but I think that threads shouldn't hang forever.

Stacktrace of hanging threads:

"pool-12-thread-5" prio=10 tid=0x00002aaab85c6800 nid=0x325c waiting on condition [0x0000000042eb7000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000079a5b03b8> (a com.google.bigtable.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at com.google.bigtable.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
        at com.google.bigtable.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
        at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:280)
        at com.google.cloud.bigtable.hbase.BatchExecutor.batch(BatchExecutor.java:318)
        at com.google.cloud.bigtable.hbase.BigtableTable.put(BigtableTable.java:280)

Some guava classes are not repackaged (version 0.2.2)

I use guava 18.0 and
it has com/google/thirdparty/publicsuffix/PublicSuffixPatterns.class

the client has different version which doesn't have EXACT public static field
which is used by other part of guava.
So I get NoSuchFieldException.

Setting DataHost is not possible in DataFlow

Setting the DataHost is not possible in DataFlow:
CloudBigtableConfiguration and derived class do not provide such a field.
The methods
com.google.cloud.bigtable.dataflow.CloudBigtableConfiguration.toBigtableOptions() and com.google.cloud.bigtable.dataflow.CloudBigtableConfiguration.toHBaseConfig() do not set the required property.
Even adding it there seems to be ignored.

Only changing com.google.cloud.bigtable.config.BigtableOptions.BIGTABLE_DATA_HOST_DEFAULT has the desired effect - which requires a binary release of a custom build of this library just to use a different data host (i.e. for testing or load testing)

Javadoc build target is not working correctly

For some reason (I can't figure out why), commit 3eb991f broke the "offline links" functionality in the Javadoc build target. As a result, the API docs no longer link to the HBase and Dataflow docs. The API docs also show the fully qualified name of each HBase and Dataflow class, which ends up looking silly.

For example, here's how the CloudBigtableIO method summary is supposed to look:
screen shot 2015-09-01 at 3 29 54 pm

Here's how it looks after 3eb991f (there's some slight wonkiness because I had to assemble the image in Photoshop--the table doesn't all fit on one screen):
screen shot 2015-09-01 at 3 30 45 pm copy

StatusRuntimeException: INTERNAL: Invalid protobuf byte sequence

Hello,

I launch a scan on a column family of a million cells.

I have this error:
aoรปt 18, 2015 3:21:15 PM io.grpc.SerializingExecutor$TaskRunner run
GRAVE: Exception while executing runnable io.grpc.ChannelImpl$CallImpl$ClientStreamListenerImpl$2@6965a9c6
io.grpc.StatusRuntimeException: INTERNAL: Invalid protobuf byte sequence
at io.grpc.Status.asRuntimeException(Status.java:428)
at io.grpc.protobuf.ProtoUtils$1.parse(ProtoUtils.java:64)
at io.grpc.protobuf.ProtoUtils$1.parse(ProtoUtils.java:52)
at io.grpc.MethodDescriptor.parseResponse(MethodDescriptor.java:105)
at io.grpc.ChannelImpl$CallImpl$ClientStreamListenerImpl$2.run(ChannelImpl.java:384)
at io.grpc.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.bigtable.repackaged.com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
at com.google.bigtable.repackaged.com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.tryRefillBuffer(CodedInputStream.java:1131)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:1081)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.ensureAvailable(CodedInputStream.java:1068)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.readRawBytesSlowPath(CodedInputStream.java:1203)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:517)
at com.google.bigtable.v1.Column.(Column.java:52)
at com.google.bigtable.v1.Column.(Column.java:13)
at com.google.bigtable.v1.Column$1.parsePartialFrom(Column.java:822)
at com.google.bigtable.v1.Column$1.parsePartialFrom(Column.java:816)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:495)
at com.google.bigtable.v1.Family.(Family.java:61)
at com.google.bigtable.v1.Family.(Family.java:13)
at com.google.bigtable.v1.Family$1.parsePartialFrom(Family.java:923)
at com.google.bigtable.v1.Family$1.parsePartialFrom(Family.java:917)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:495)
at com.google.bigtable.v1.ReadRowsResponse$Chunk.(ReadRowsResponse.java:185)
at com.google.bigtable.v1.ReadRowsResponse$Chunk.(ReadRowsResponse.java:145)
at com.google.bigtable.v1.ReadRowsResponse$Chunk$1.parsePartialFrom(ReadRowsResponse.java:904)
at com.google.bigtable.v1.ReadRowsResponse$Chunk$1.parsePartialFrom(ReadRowsResponse.java:898)
at com.google.bigtable.repackaged.com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:495)
at com.google.bigtable.v1.ReadRowsResponse.(ReadRowsResponse.java:60)
at com.google.bigtable.v1.ReadRowsResponse.(ReadRowsResponse.java:13)
at com.google.bigtable.v1.ReadRowsResponse$1.parsePartialFrom(ReadRowsResponse.java:1651)
at com.google.bigtable.v1.ReadRowsResponse$1.parsePartialFrom(ReadRowsResponse.java:1645)
at com.google.bigtable.repackaged.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:192)
at com.google.bigtable.repackaged.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
at com.google.bigtable.repackaged.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:215)
at com.google.bigtable.repackaged.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at io.grpc.protobuf.ProtoUtils$1.parse(ProtoUtils.java:61)
... 7 more

CredentialOptions#jsonCredentials(InputStream) requires the InputStream to be open until it's used in BigtableSession

When setting up credentials with CredentialOptions#jsonCredentials(InputStream) I was suprised to see that the resulting CredentialOptions require the InputStream to be open.

My code uses an abstraction to setup CredentialOptions instances. The class itself does not expose any statefullness (i.e. isReadable(), close()) meaning there's no reasonable way for a user of an CredentialOptions instance to check if it's readable or not.

I'd prefer if CredentialOptions.jsonCredentials(InputStream) fully read and parsed the given InputStream, and stored the credentials in memory, this would allow for patterns like the following to safely work without leaking an open InputStream.

public CredentialOptions buildCredentials() throws IOException {
    try (final InputStream in = Files.newInputStream(...)) {
        return CredentialOptions.jsonCredentials(in);
    }
}

The kind of exceptions you'll see otherwise are the following:

java.nio.channels.ClosedChannelException
        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:147)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
        at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
        at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:489)
        at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:126)
        at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:215)
        at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1240)
        at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:802)
        at com.fasterxml.jackson.core.JsonFactory.createJsonParser(JsonFactory.java:972)
        at com.google.api.client.json.jackson2.JacksonFactory.createJsonParser(JacksonFactory.java:93)
        at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:85)
        at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:81)
        at com.google.auth.oauth2.GoogleCredentials.fromStream(GoogleCredentials.java:101)
        at com.google.cloud.bigtable.config.CredentialFactory.getInputStreamCredential(CredentialFactory.java:182)
        at com.google.cloud.bigtable.config.CredentialFactory.getCredentials(CredentialFactory.java:108)
        at com.google.cloud.bigtable.grpc.BigtableSession$4.call(BigtableSession.java:249)
        at com.google.cloud.bigtable.grpc.BigtableSession$4.call(BigtableSession.java:246)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

BigtableSession deadlocks when given a thread pool smaller than 3

BigtableSession currently deadlocks when given a thread pool with less than 3 available live threads.

I can always reproduce this.

Test Case

public class BigtableSessionTest {
    public static void main(String[] args) throws Exception {
        CredentialOptions credentials = /* .. */;
        final MySession session = /* .. */;

        final ExecutorService tasks = Executors.newFixedThreadPool(1);

        for (int count = 1; count <= 8; count++) {
            final ExecutorService executorService = Executors.newFixedThreadPool(count);
            final CountDownLatch latch = new CountDownLatch(1);

            final AtomicReference<Throwable> errored = new AtomicReference<Throwable>();

            // thread that builds and closes the BigtableSession...
            final Thread thread = new Thread() {
                public void run() {
                    try {
                        session.build(executorService).close();
                    } catch (Exception e) {
                        errored.set(e);
                    }

                    latch.countDown();
                };
            };

            System.out.println(String.format("%d: starting...", count));
            thread.start();

            if (!latch.await(1, TimeUnit.SECONDS)) {
                System.out.println(String.format("%d: seems to have deadlocked :(", count));
                thread.interrupt();
            }

            // wait one more time in case of deadlock...
            if (!latch.await(1, TimeUnit.SECONDS)) {
                System.out.println(String.format("%d: did not recover, exiting D:", count));
                break;
            }

            if (errored.get() != null) {
                System.out.println(String.format("%d: Task failed:", count));
                errored.get().printStackTrace(System.out);
            } else {
                System.out.println(String.format("%d: Task OK", count));
            }
        }

        System.out.println("bye, bye");
        System.exit(0);
    }

    static class MySession {
        public static final String USER_AGENT = "test";

        final String project;
        final String zone;
        final String cluster;
        final CredentialOptions credentials;

        public MySession(final String project, final String zone, final String cluster, CredentialOptions credentials) {
            this.project = project;
            this.zone = zone;
            this.cluster = cluster;
            this.credentials = credentials;
        }

        public BigtableSession build(final ExecutorService executorService) throws IOException {
            final BigtableOptions options = new BigtableOptions.Builder().setProjectId(project).setZoneId(zone)
                    .setClusterId(cluster).setUserAgent(USER_AGENT).setCredentialOptions(credentials).build();

            return new BigtableSession(options, executorService);
        }
    }
}

Output

1: starting...
Aug 07, 2015 6:04:24 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
1: seems to have deadlocked :(
1: Task failed:
java.io.IOException: Could not initialize the data API client
    at com.google.cloud.bigtable.grpc.BigtableSession.get(BigtableSession.java:352)
    at com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:300)
    at com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:252)
    at com.spotify.bigtable.BigtableSessionTest$MySession.build(BigtableSessionTest.java:89)
    at com.spotify.bigtable.BigtableSessionTest$1.run(BigtableSessionTest.java:37)
Caused by: java.lang.InterruptedException
    at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
    at java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at com.google.cloud.bigtable.grpc.BigtableSession.get(BigtableSession.java:350)
    ... 4 more
2: starting...
Aug 07, 2015 6:04:25 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
2: seems to have deadlocked :(
2: Task failed:
java.io.IOException: Could not initialize the data API client
    at com.google.cloud.bigtable.grpc.BigtableSession.get(BigtableSession.java:352)
    at com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:300)
    at com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:252)
    at com.spotify.bigtable.BigtableSessionTest$MySession.build(BigtableSessionTest.java:89)
    at com.spotify.bigtable.BigtableSessionTest$1.run(BigtableSessionTest.java:37)
Caused by: java.lang.InterruptedException
    at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
    at java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at com.google.cloud.bigtable.grpc.BigtableSession.get(BigtableSession.java:350)
    ... 4 more
3: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
3: Task OK
4: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
4: Task OK
5: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
5: Task OK
6: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
6: Task OK
7: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
7: Task OK
8: starting...
Aug 07, 2015 6:04:26 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
8: Task OK
bye, bye

Allow user-specification of TLS provider

#501 adds support for using OpenSSL for TLS instead the JDK (via Netty ALPN). The current PR tries OpenSSL first, falling back to JDK if OpenSSL support isn't available.

It would be helpful to have a configuration value to specify the TLS transport which should be used, raising an exception if it's unavailable (rather than falling back on a slower method).

NoSuchMethodError occurs while trying to connect to Cloud Bigtable from a Cloud Dataflow worker

I set up a Cloud Dataflow Pipeline in accordance with this article: https://cloud.google.com/bigtable/docs/dataflow-hbase

When I submit it to the Cloud Dataflow managed service, I got the following error at a Cloud Dataflow worker:

Uncaught exception in main thread. Exiting with status code 1.
java.lang.NoSuchMethodError: io.grpc.netty.GrpcSslContexts.forClient()Lcom/google/bigtable/repackaged/io/netty/handler/ssl/SslContextBuilder;
at com.google.cloud.bigtable.grpc.BigtableSession.createSslContext(BigtableSession.java:98)
at com.google.cloud.bigtable.grpc.BigtableSession.access$000(BigtableSession.java:82)
at com.google.cloud.bigtable.grpc.BigtableSession$1.run(BigtableSession.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

How should I handle this problem?

My Cloud Dataflow Pipeline source code is the following:

package mypackage

import com.google.cloud.bigtable.dataflow.CloudBigtableIO;
import com.google.cloud.bigtable.dataflow.CloudBigtableOptions;
import com.google.cloud.bigtable.dataflow.CloudBigtableTableConfiguration;
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.transforms.Create;
import com.google.cloud.dataflow.sdk.transforms.DoFn;
import com.google.cloud.dataflow.sdk.transforms.ParDo;
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;

public class Main {
    // Create a DoFn that creates a Put or Delete.  MUTATION_TRANSFORM is a simplistic example.
    static final DoFn<String, Mutation> MUTATION_TRANSFORM = new DoFn<String, Mutation>() {
        @Override
        public void processElement(DoFn<String, Mutation>.ProcessContext c) throws Exception {
            c.output(new Put(c.element().getBytes()).addColumn("v".getBytes(), "v".getBytes(), "value".getBytes()));
        }
    };

    public static void main(String[] args) {
        // CloudBigtableOptions is one way to retrieve the options.  It's not required to use this
        // specific PipelineOptions extension; CloudBigtableOptions is there as a convenience.
        CloudBigtableOptions options =
                PipelineOptionsFactory.fromArgs(args).withValidation().as(CloudBigtableOptions.class);

        // CloudBigtableTableConfiguration contains the project, zone, cluster and table to connect to
        CloudBigtableTableConfiguration config = CloudBigtableTableConfiguration.fromCBTOptions(options);

        Pipeline p = Pipeline.create(options);
        // This sets up serialization for Puts and Deletes so that Dataflow can potentially move them through
        // the network
        CloudBigtableIO.initializeForWrite(p);

        p
                .apply(Create.of("Hello", "World"))
                .apply(ParDo.of(MUTATION_TRANSFORM))
                .apply(CloudBigtableIO.writeToTable(config));

        p.run();
    }
}

Support hbase 1.1

This is a bit problematic. hbase 1.1 adds some methods in Admin. One of them has a signature that isn't backwards compatible...

QuotaRetriever getQuotaRetriever(QuotaFilter filter)

That QuotaRetriever and QuotaFilter are new to hbase 1.1, so we can't implement them and still be backwards compatible.

Docker & OpenSSL

Hello,

We want to replace ALPN by OpenSSL with the Netty TC support.

It works on my ubuntu dev machine, but not inside a Docker.

Inside the Docker, ALPN mode is still activated, therefore I've an error that ALPN library is missing.

Dockerfile is
FROM ubuntu:wily
RUN apt-get update && apt-get install -y openjdk-8-jre-headless openssl ssl-cert
ENV dir '/opt'
RUN mkdir -p $dir
WORKDIR $dir
COPY gceOAuth.json /etc/gceOAuth.json

ENV LD_LIBRARY_PATH /usr/lib

ENV GOOGLE_APPLICATION_CREDENTIALS /etc/gceOAuth.json
COPY bigtable.jar $dir/bigtable.jar
CMD java -Xmx2048m -Dtcnative.classifier=linux-x86_64 -jar bigtable.jar
EXPOSE 8080

I tried LD_LIBRARY_PATH=/usr/lib & LD_LIBRARY_PATH=/usr/local/lib, in env and pre-pend to the command.

Reference:
https://cloud.google.com/bigtable/docs/using-maven#openssl-encryption

I hope you managed to do this and you can help.

Thank you.

Authentication fails

Hi,

I started the same project after this weekend and discovered
tests started to fail:

Exception is thrown again and again. It loops somewhere.

2015-12-14 17:32:21,993 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-12-14 17:32:22,115 INFO  [main] grpc.BigtableSession: Opening connection for projectId audit-bt-dev-qa-1099, zoneId europe-west1-c, clusterId audit-qa, on data host bigtable.googleapis.com, table admin host bigtabletableadmin.googleapis.com.
2015-12-14 17:32:22,134 INFO  [BigtableSession-startup-0] grpc.BigtableSession: gRPC is using the JDK provider (alpn-boot jar)
2015-12-14 17:32:22,429 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
2015-12-14 17:32:22,935 WARN  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Got an unexpected IOException when refreshing google credentials.
java.io.IOException: Error getting access token for service account: 
    at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:224)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.refreshCredentialsWithRetry(RefreshingOAuth2CredentialsInterceptor.java:268)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.doRefresh(RefreshingOAuth2CredentialsInterceptor.java:238)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor$1.run(RefreshingOAuth2CredentialsInterceptor.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.bigtable.repackaged.com.google.api.client.http.HttpResponseException: 400 Bad Request
{
  "error" : "invalid_grant"
}
    at com.google.bigtable.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1054)
    at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:222)
    ... 6 more
2015-12-14 17:32:22,947 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
2015-12-14 17:32:23,021 WARN  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Got an unexpected IOException when refreshing google credentials.
java.io.IOException: Error getting access token for service account: 
    at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:224)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.refreshCredentialsWithRetry(RefreshingOAuth2CredentialsInterceptor.java:268)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.doRefresh(RefreshingOAuth2CredentialsInterceptor.java:238)
    at com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor$1.run(RefreshingOAuth2CredentialsInterceptor.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.bigtable.repackaged.com.google.api.client.http.HttpResponseException: 400 Bad Request
{
  "error" : "invalid_grant"
}
    at com.google.bigtable.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1054)
    at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:222)
    ... 6 more
2015-12-14 17:32:23,029 INFO  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Refreshing the OAuth token
2015-12-14 17:32:23,098 WARN  [bigtable-connection-shared-executor-pool1-t2] io.RefreshingOAuth2CredentialsInterceptor: Got an unexpected IOException when refreshing google credentials.

backup

how can I do a backup of data on bigtable?

Refactor BigtableChannels and BigtableOptionsFactory

The options configuration mechanism isn't ideal. There's some user configuration options mixed with runtime concerns like executors as well as configuration defaults. We need to refactor the code to make this process clearer and more extensible.

We propose a BigtableOptions/ChannelOptions/*Options refactoring to only contain primitives. BigtableOptions.Builder should contain sensible defaults. There should be a BigtableSession object, or something like that that manages executors and *GrpcClient lifecycle handling.

Add mock tests in CloudBigtableIOTest

Add tests for read, splitIntoBundles, and other operations in Source/Sink classes. Can you mock CloudBigTableConnection for this ? Please see DatastoreIOTest, FileBasedSourceTest for examples.

Check version of HBase

Hello,

We encounter some errors, mostly when we create a table or use admin.

The problem is: why there is a version check in bigtable?
I tried many version of the bigtable client, and was able to produce this error.

I hope you can help on this, because it blocks the access on bigtable.

Thank you

java.lang.RuntimeException: hbase-default.xml file seems to be for an older version of HBase (1.1.1), this version is Unknown
at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:71)
at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:81)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:96)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:90)
at com.accengage.globulus.util.bigtable.BigTableConnectionFactory.create(BigTableConnectionFactory.java:25)
at com.accengage.globulus.util.bigtable.BigtableHelper.getConnection(BigtableHelper.java:109)
at com.accengage.globulus.util.bigtable.BigtableHelper.getAdmin(BigtableHelper.java:528)
at com.accengage.globulus.util.bigtable.IndexesHelper.(IndexesHelper.java:53)

Make AbstractCloudBigtableTableWriteFn public & rename

The class AbstractCloudBigtableTableWriteFn provides connection pool managment for situations that go beyond batch-read/writing.

Right now the class is package-private and has 'Write' in its name whereas the class has no writing-functionality.
It would be great to have this class public & generic named to be used as starting point for complex DoFns - like processing the output of a Scan while aggregating with additional data from Gets.

SingleValueColumnFilter unit testing

Hello,

I've made a JUnit test of SingleValueColumnFilter of Long values in a Scan using Bigtable.

Only EQUAL works, for LESS I have 10 values, for GREATER I have 0 values.

I am using the HBase 1.1.1 and Bigtable 0.2.1 with maven.

Thank you for the help.

package com.accengage.globulus.test.segment;

import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.UUID;

import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Test;

import io.grpc.StatusRuntimeException;

public class SupInfEqDiffHbaseTest {

    private static Connection connection;
    private static TableName tableName = TableName.valueOf("testBBO");
    private static Table table;
    private static Admin admin;
    private static int nValue = 10;
    private static Map<String, Long> result = new HashMap<>();

    @AfterClass
    public static void clean() throws IOException {
        SupInfEqDiffHbaseTest.deleteTestTable();
        Assert.assertFalse(SupInfEqDiffHbaseTest.admin.tableExists(SupInfEqDiffHbaseTest.tableName));
        SupInfEqDiffHbaseTest.admin.close();
        SupInfEqDiffHbaseTest.connection.close();
        connection.close();
    }

    private static void deleteTestTable() throws IOException {
        if (SupInfEqDiffHbaseTest.admin.tableExists(SupInfEqDiffHbaseTest.tableName)) {
            if (SupInfEqDiffHbaseTest.admin.isTableEnabled(SupInfEqDiffHbaseTest.tableName)) {
                SupInfEqDiffHbaseTest.admin.disableTable(SupInfEqDiffHbaseTest.tableName);
            }
            SupInfEqDiffHbaseTest.admin.deleteTable(SupInfEqDiffHbaseTest.tableName);
        }
    }

    private static void fillTable() throws IOException {
        table = connection.getTable(tableName);
        final int n = SupInfEqDiffHbaseTest.nValue;
        for (long i = 0; i < n; i++) {
            final UUID rowKey = UUID.randomUUID();
            byte[] row = Bytes.toBytes(rowKey.toString());
            Put put = new Put(row);
            put.addColumn(Bytes.toBytes("cd"), Bytes.toBytes("value"), Bytes.toBytes(i));
            result.put(rowKey.toString(), i);
            table.put(put);
        }
    }

    @BeforeClass
    public static void setup() throws IOException {
        SupInfEqDiffHbaseTest.connection = ConnectionFactory.createConnection();
        SupInfEqDiffHbaseTest.admin = SupInfEqDiffHbaseTest.connection.getAdmin();
        SupInfEqDiffHbaseTest.deleteTestTable();
        Assert.assertFalse(SupInfEqDiffHbaseTest.admin.tableExists(SupInfEqDiffHbaseTest.tableName));
        createTable(SupInfEqDiffHbaseTest.admin, SupInfEqDiffHbaseTest.tableName);
        Assert.assertTrue(SupInfEqDiffHbaseTest.admin.tableExists(SupInfEqDiffHbaseTest.tableName));
        SupInfEqDiffHbaseTest.fillTable();
    }

    @Test
    public void testInf() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.LESS_OR_EQUAL, Bytes.toBytes(2L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(3, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v <= 2);
        }
    }

    private Set<String> getResult(Scan scan) throws IOException {
        Set<String> rowKeys = new HashSet<>();
        ResultScanner scanner = table.getScanner(scan);
        for (Result result = scanner.next(); result != null; result = scanner.next()) {
            rowKeys.add(Bytes.toString(result.getRow()));
        }
        return rowKeys;
    }

    @Test
    public void testStrictInf() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.LESS, Bytes.toBytes(4L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(4, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v < 4);
        }
    }

    @Test
    public void testStrictSup() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.GREATER, Bytes.toBytes(4L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(5, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v > 4);
        }
    }

    @Test
    public void testEqual() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.EQUAL, Bytes.toBytes(4L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(1, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v == 4);
        }
    }

    @Test
    public void testSup() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.GREATER_OR_EQUAL, Bytes.toBytes(5L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(5, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v >= 5);
        }
    }

    @Test
    public void testNot() throws IOException {
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cd"), Bytes.toBytes("value"),
                CompareOp.NOT_EQUAL, Bytes.toBytes(5L));
        scan.setFilter(filter);
        Set<String> rowKeys = getResult(scan);
        Assert.assertEquals(9, rowKeys.size());
        for (final String k : rowKeys) {
            Long v = result.get(k);
            Assert.assertNotNull(v);
            Assert.assertTrue(v != 5);
        }
    }

    private static void createTable(final Admin admin, final TableName tableName) throws IOException {
        if (!admin.tableExists(tableName)) {
            final HTableDescriptor htableDescriptor = new HTableDescriptor(tableName);
            HColumnDescriptor hcolDesc = new HColumnDescriptor(Bytes.toBytes("cd"));
            htableDescriptor.addFamily(hcolDesc);
            try {
                admin.createTable(htableDescriptor);
            } catch (final StatusRuntimeException e) {
                if (e.getCause() == null || !e.getCause().getMessage().startsWith("ALREADY_EXISTS")) {
                    throw e;
                }
            }
        }
    }
}

BigtableSession closes provided ExecutorService on #close() which is prone to deadlocks

I'm attempting to incorporate BigtableSession into a service which uses a shared ExecutorService for common tasks and establishes multiple bigtable backends.

Currently I am required to create an ExecutorService per BigtableSession instance because the BigtableSession#close() method aggressively attempts to assert that the provided ExecutorService has been shut down.

I initially ended up queuing up a task to close the BigtableSession on the same thread pool, causing BigtableSession#close() to either deadlock (or wait for a long time) on a ExecutorService#isTerminated() loop (see test below).

I believe the either 1) behavior should be modified so that BigtableSession does not shutdown a user-provided ExecutorService, or 2) it does not allow the user to provide a thread pool at all.

Test Case

This test case always deadlocks for me.

public class BigtableSessionTest {
    public static final String USER_AGENT = "test";

    public static void main(String[] args) throws Exception {
        CredentialOptions credentials = /* ... */;
        final MySession builder = /* ... */;

        final ExecutorService executorService = Executors.newFixedThreadPool(4);

        final Future<Void> future = executorService.submit(new Callable<Void>() {
            @Override
            public Void call() throws Exception {
                // attempt to close on the same thread pool...
                builder.build(executorService).close();
                return null;
            }
        });

        try {
            future.get(30, TimeUnit.SECONDS);
        } catch (TimeoutException e) {
            System.out.println("Timeout :(");
            e.printStackTrace();
        }

        final ExecutorService executorService2 = Executors.newFixedThreadPool(4);

        final BigtableSession session = builder.build(executorService2);

        System.out.println("Main closing...");
        session.close();
        System.out.println("Closed without problems!");

        System.out.println("bye, bye");
        System.exit(0);
    }

    static class MySession {
        final String project;
        final String zone;
        final String cluster;
        final CredentialOptions credentials;

        public MySession(final String project, final String zone, final String cluster, CredentialOptions credentials) {
            this.project = project;
            this.zone = zone;
            this.cluster = cluster;
            this.credentials = credentials;
        }

        public BigtableSession build(final ExecutorService executorService) throws IOException {
            final BigtableOptions options = new BigtableOptions.Builder().setProjectId(project).setZoneId(zone)
                    .setClusterId(cluster).setUserAgent(USER_AGENT).setCredentialOptions(credentials).build();

            return new BigtableSession(options, executorService);
        }
    }
}

Output

Aug 07, 2015 6:36:21 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
Timeout :(
java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at com.spotify.bigtable.BigtableSessionTest.main(BigtableSessionTest.java:40)
Aug 07, 2015 6:36:51 PM com.google.cloud.bigtable.config.Logger info
INFO: Opening connection for ...
Main thread closeing...
Closed without problems!
bye, bye

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.