Code Monkey home page Code Monkey logo

Comments (12)

sduskis avatar sduskis commented on June 24, 2024

#626 is definitely related to this. The Dataflow PubSub setup is extremely tricky to get right. They create a boat load of CloudBigtableSingleTableWriteFn's, each with its own buffered mutator. Apparently, there are also cases where the WriteFn will not be cleaned up by dataflow.

I'll work on this ASAP.

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

I was hoping to finish everything by EOD today, but I still have a couple of things to do. I fixed some of the issues. #635 should be a decent bandaid, for now.

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

I just deployed a new snapshot with a boat load of changes linked fro this issue. Can you please take it for a test drive?

from java-bigtable-hbase.

danielnorberg avatar danielnorberg commented on June 24, 2024

Took the current 0.2.3-SNAPSHOT for a spin and things look a lot better. Thread count is high at ~7k but doesn't seem to be growing.

Did a jstack of the DF worker java process. Here's a rough breakdown of the thread counts:

  • 3600 bigtable-connection-shared-executor-poolY-tX threads
  • 1800 bigtable-grpc-elg-X threads
  • 2100 reconnection-async-close-X threads
  • ~300 threads blocking on loading hbase/bigtable configuration, all with the below stack trace:
"Thread-313" #350 daemon prio=1 os_prio=0 tid=0x00007f93482b7000 nid=0x193 waiting for monitor entry [0x00007f920a7e3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.zip.ZipFile.getEntry(ZipFile.java:308)
    - waiting to lock <0x0000000083309e70> (a java.util.jar.JarFile)
    at java.util.jar.JarFile.getEntry(JarFile.java:240)
    at java.util.jar.JarFile.getJarEntry(JarFile.java:223)
    at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1005)
    at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:983)
    at sun.misc.URLClassPath$1.next(URLClassPath.java:240)
    at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:250)
    at java.net.URLClassLoader$3$1.run(URLClassLoader.java:601)
    at java.net.URLClassLoader$3$1.run(URLClassLoader.java:599)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader$3.next(URLClassLoader.java:598)
    at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:623)
    at sun.misc.CompoundEnumeration.next(CompoundEnumeration.java:45)
    at sun.misc.CompoundEnumeration.hasMoreElements(CompoundEnumeration.java:54)
    at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:354)
    at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393)
    at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474)
    at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
    at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2218)
    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195)
    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)
    - locked <0x00000007145de8c8> (a org.apache.hadoop.conf.Configuration)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:989)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:961)
    at com.google.cloud.bigtable.dataflow.CloudBigtableConfiguration.toHBaseConfig(CloudBigtableConfiguration.java:182)
    at com.google.cloud.bigtable.dataflow.AbstractCloudBigtableTableDoFn.getConnection(AbstractCloudBigtableTableDoFn.java:49)
    - eliminated <0x00000007145dba80> (a com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn)
    at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn.getBufferedMutator(CloudBigtableIO.java:615)
    - locked <0x00000007145dba80> (a com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn)
    at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableWriteFn.processElement(CloudBigtableIO.java:640)
    at com.google.cloud.dataflow.sdk.util.DoFnRunner.invokeProcessElement(DoFnRunner.java:189)
    at com.google.cloud.dataflow.sdk.util.DoFnRunner.processElement(DoFnRunner.java:171)
...

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

That's a lot of connections. We have 2 reconnection-async-close threads per connection. It's not good that you have over 1,000 opened connections at once. I did some work to reduce the number of Connections to 1 for all writes on a single VM. I see you're doing writes, are you also doing reads? Do you open your own Connections?

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

I'll see if I can fix the Configuration issue. We shouldn't be reading xml files from Dataflow.

from java-bigtable-hbase.

danielnorberg avatar danielnorberg commented on June 24, 2024

We're only doing writes, applying a CloudBigtableIO.writeToTable(cbtConfig) to a PCollection<Mutation> of Put operations, pretty much exactly like in this example. We're not opening our own connections.

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

Thanks for your patience with this. I just built a -SNAPSHOT that should fix the issue related to this problem:

   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.zip.ZipFile.getEntry(ZipFile.java:308)
    - waiting to lock <0x0000000083309e70> (a java.util.jar.JarFile)
    at java.util.jar.JarFile.getEntry(JarFile.java:240)
    at java.util.jar.JarFile.getJarEntry(JarFile.java:223)
    at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1005)
   ...

I'm not sure why we're creating so many connections. I'd like to figure out why that's happening.

In the meantime, I'll put in a fix to share threading resources across connections, since the thread pools auto-expand anyway.

from java-bigtable-hbase.

danielnorberg avatar danielnorberg commented on June 24, 2024

Are you able to reproduce the issue of many connections being created?

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

Other users were able to reproduce the problem in dataflow with a pub-sub source. I have not tried that specific scenario yet. I'm working with a test that creates a whole bunch of Connections to see the effects. We have some performance testing as well with a lot of connections. I think that I'm going to address the underlying issues first, and then look at the Dataflow components.

Do you have any objections to that?

from java-bigtable-hbase.

danielnorberg avatar danielnorberg commented on June 24, 2024

Sounds good to me 👍

from java-bigtable-hbase.

sduskis avatar sduskis commented on June 24, 2024

This was fixed a while ago. CLosing.

from java-bigtable-hbase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.