Code Monkey home page Code Monkey logo

libp2p's People

Contributors

317787106 avatar halibobo1205 avatar jakevsky avatar jwrct avatar ss334452 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

libp2p's Issues

Optimize thread pool and thread name in Libp2p

Rationale

Background

At present, the libp2p thread name is not clearly defined, and the thread names in the log are confusing. A function may be called by multiple threads; when a problem occurs, it is impossible to quickly find out which task triggered the corresponding log, which is very inconvenient for finding the problem.

Specification of creating a single thread

Thread resources must be provided through the thread pool, and it is not allowed to explicitly create threads in an application.

Description: The advantage of using the thread pool is to reduce the time spent on creating/killing threads and to reduce the overhead of the system, solving the problem of insufficient resources. If the thread pool is not used, it may cause the system to create a large number of threads of the same type, resulting in memory consumption or "over-switching".

Specification of creating a thread pool

Thread pools are not allowed to be created using Executors, but through ThreadPoolExecutor. This processing method allows other developers to clarify the running rules of the thread pool and avoid the risk of resource exhaustion.

Description: The disadvantages of each method of Executors:

  1. newFixedThreadPool and newSingleThreadExecutor: the main problem is that the accumulated request processing queue may consume a very large amount of memory or even incurs OOM.
  2. newCachedThreadPool and newScheduledThreadPool: the main problem is that the maximum number of threads is Integer.MAX_VALUE, which may create a very large number of threads or even incurs OOM.

Specification of thread’s name

At present, most threads do not specify the thread name related to the logic, so the default thread names are used. If a meaningful thread name is given, it is easy to trace and quickly troubleshoot problems when an exception occurs, comparing the default thread names.

Example: A NioEventLoopGroup thread pool created by java-tron does not specify a thread name, and the default name is nioEventLoopGroup-X-Y:

18:17:48.177 INFO  [nioEventLoopGroup-6-1] [net](P2pEventHandlerImpl.java:212) Message processing costs 85 ms, peer: /132.145.129.253:14774, type: BLOCK, time tag: ~~
18:17:48.181 INFO  [nioEventLoopGroup-6-32] [net](P2pEventHandlerImpl.java:167) Receive message from  peer: /45.33.38.111:46800, type: INVENTORY 
invType: BLOCK, size: 1, First hash: 000000000328fcefa48204c354fbf5caa2483e13c0f61ffa54188fd69aa33003

The meaning of the first serial number X: if there are multiple NioEventLoopGroup thread pools with the default thread name in an application, this represents the serial number of the thread pool, starting from 1.

The meaning of the second serial number Y: the serial number of a thread in a NioEventLoopGroup thread pool, starting from 1.

Analyzing the source code of NioEventLoopGroup, when the thread name format is not specified, the default construction method is as follows:

image
Source of "nioEventLoopGroup"

image
Source of "nioEventLoopGroup-X-"

The appearance of this thread name in java-tron is very unfriendly to analyzing logs because users would not know which part of the logic created the thread pool.

Implementation

For the current situation, there are mainly two aspects to improve, including optimizing the logic of creating a thread pool and specifying the thread name related to the scene.

Optimize the creation logic of the thread pool

Correct example 1: (single thread)

//org.apache.commons.lang3.concurrent.BasicThreadFactory 
ScheduledExecutorService executorService = new ScheduledThreadPoolExecutor(1, 
new BasicThreadFactory.Builder().namingPattern("example-schedule-pool-%d").daemon(true).build());

Correct example 2: (multithreading)

ThreadFactory namedThreadFactory = new ThreadFactoryBuilder() 
    .setNameFormat("demo-pool-%d").build(); 
//Generic thread pool
ExecutorService pool = new ThreadPoolExecutor(5, 200,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(1024), namedThreadFactory, new ThreadPoolExecutor.AbortPolicy());
pool.execute(()-> System.out.println(Thread.currentThread().getName()));
pool.shutdown();//gracefully shutdown

Correct example 3:

// Please introduce a thread factory object with factory name already set

//in code
userThreadPool.execute(thread); 

Specify the thread‘s name related to the scene

For a single-threaded thread pool, use the format .namingPattern("example-schedule-pool"); for a multi-threaded thread pool, use the format .namingPattern("example-schedule-pool-%d"). The first letter of all custom thread names should be lowercase uniformly.

Stop submit new task when closing libp2p

System information

Libp2p version: v2.0.0
OS & Version: Linux & macOS

Expected Behavior

When libp2p is closing, stop submitting any new task to ScheduledExecutorService. such as ConnPoolService should not trigger a new TCP connection.

Actual Behavior

When libp2p is closing, ConnPoolService will execute close to eliminate all active channels in turn. At this time the close event is caught, a task of initiating TCP connection is triggered and will be added to ScheduledExecutorService poolLoopExecutor of ConnPoolService by the method triggerConnect of ChannelManager subsequently, exception of java.util.concurrent.RejectedExecutionException will be caught.

image

This is not excepted. Once closing libp2p, any new TCP connection should not be triggered. Another example is that NodeDetectService tries to submit a new detect task when libp2p is closing.

Steps to Reproduce the Behavior

When java-tron fullnode with libp2p is running, use the command kill -15 <pid> to shut down the node.

Backtrace

Exception log 1:

13:08:45.935 ERROR [pool-64-thread-1] [i.n.u.c.D.rejectedExecution](Slf4JLogger.java:181) Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
    at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:842)
    at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328)
    at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321)
    at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:765)
    at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:491)
    at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:80)
    at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:74)
    at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
    at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:333)
    at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:163)
    at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:119)
    at org.tron.p2p.connection.socket.PeerClient.connectAsync(PeerClient.java:92)
    at org.tron.p2p.connection.socket.PeerClient.connectAsync(PeerClient.java:62)
    at org.tron.p2p.connection.business.pool.ConnPoolService.lambda$connect$4(ConnPoolService.java:189)
    at java.util.ArrayList.forEach(ArrayList.java:1257)
    at org.tron.p2p.connection.business.pool.ConnPoolService.connect(ConnPoolService.java:187)
    at org.tron.p2p.connection.business.pool.ConnPoolService.lambda$triggerConnect$9(ConnPoolService.java:267)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Exception log 2:

11:35:22.699 ERROR [pool-65-thread-1] [i.n.u.c.D.rejectedExecution](Slf4JLogger.java:181) Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:842)
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:328)
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:321)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:765)
        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:768)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:432)
        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:162)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
        at org.tron.p2p.connection.socket.PeerClient.connectAsync(PeerClient.java:66)
        at org.tron.p2p.connection.business.detect.NodeDetectService.detect(NodeDetectService.java:131)
        at org.tron.p2p.connection.business.detect.NodeDetectService.work(NodeDetectService.java:93)
        at org.tron.p2p.connection.business.detect.NodeDetectService.lambda$init$0(NodeDetectService.java:58)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Exception log 3:

18:38:30.815 WARN  [pool-63-thread-1] [i.n.c.AbstractChannel](AbstractChannel.java:490) Force-closing a channel whose registration task was not accepted by an event loop: [id: 0x7b14b9c9]
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:934)
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:351)
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:344)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:836)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute0(SingleThreadEventExecutor.java:827)
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:817)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:483)
        at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
        at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
        at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
        at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
        at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
        at org.tron.p2p.connection.socket.PeerClient.connectAsync(PeerClient.java:97)
        at org.tron.p2p.connection.socket.PeerClient.connectAsync(PeerClient.java:64)
        at org.tron.p2p.connection.business.pool.ConnPoolService.lambda$connect$4(ConnPoolService.java:190)
        at java.util.ArrayList.forEach(ArrayList.java:1257)
        at org.tron.p2p.connection.business.pool.ConnPoolService.connect(ConnPoolService.java:188)
        at org.tron.p2p.connection.business.pool.ConnPoolService.lambda$triggerConnect$9(ConnPoolService.java:269)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Optimize p2p handshake logic

Rationale

After the handshake is completed, it will be negotiated whether to use message compression to communicate.

There should be no other message exchange before the handshake is completed, which may cause message parsing failure and disconnection.

image

Background

After the TCP connection is established, if there is a network jitter, the handshake time may exceed 20s. At this time, the channel keep-alive function will be triggered and a ping message will be sent. Since the handshake is not completed, the ping packet sent at this time is an uncompressed packet. If the handshake result is communicated in a compressed way, the uncompressed message will fail to parse and the connection will be disconnected.

image

Implementation

For the current version release_v4.7.2, after the TCP connection is established, there should be no packet or message exchange until the p2p handshake is completed except for the channel keep-alive message. So the ping message should be sent after the handshake is completed, otherwise, disconnection may be incurred.

Optimize Disconnect Logic of Libp2p

Rationale

1. Disconnect logic

When libp2p fails to process the peer's message or when a specific business logic is triggered actively, it will disconnect with a particular peer, namely close the channel at once without notifying the peer of the reason for the disconnection. This is very unfriendly to analyze problems for peers. This article analyzes the disconnection logic in libp2p and java-tron and then tries to send a disconnect message to the peer before disconnecting the peer so that optimizes the user experience.

2. Scene analysis of disconnection

We analyzed the disconnection scenario as follows:
image

when libp2p disconnects with peers?

Most disconnection happens when P2pException occurs, the red part marks the case where the TypeEnum of P2pException is not specified. We add a disconnect message, which sends a TCP message before the channel is actually disconnected.

Implementation

1. Define DisconnectReason

Append the following types to protos/Connect.proto of libp2p:

enum DisconnectReason {
  PEER_QUITING = 0x00;
  BAD_PROTOCOL = 0x01;
  TOO_MANY_PEERS = 0x02;
  DUPLICATE_PEER = 0x03;
  DIFFERENT_VERSION = 0x04;
  RANDOM_ELIMINATION = 0x05;
  EMPTY_MESSAGE = 0X06;
  PING_TIMEOUT = 0x07;
  DISCOVER_MODE = 0x08;
  DETECT_COMPLETE = 0x09;
  NO_SUCH_MESSAGE = 0x0A;
  BAD_MESSAGE = 0x0B;
  TOO_MANY_PEERS_WITH_SAME_IP = 0x0C;
  RECENT_DISCONNECT = 0x0D;
  DUP_HANDSHAKE = 0x0E;
  UNKNOWN = 0xFF;
}

message P2pDisconnectMessage {
  DisconnectReason reason = 1;
}

Convert the TypeEnum of P2pException to DisconnectReason:

    P2pException pe = (P2pException) e;
    DisconnectReason disconnectReason;
    switch (pe.getType()) {
      case EMPTY_MESSAGE:
        disconnectReason = DisconnectReason.EMPTY_MESSAGE;
        break;
      case BAD_PROTOCOL:
        disconnectReason = DisconnectReason.BAD_PROTOCOL;
        break;
      case NO_SUCH_MESSAGE:
        disconnectReason = DisconnectReason.NO_SUCH_MESSAGE;
        break;
      case BAD_MESSAGE:
      case PARSE_MESSAGE_FAILED:
      case MESSAGE_WITH_WRONG_LENGTH:
      case TYPE_ALREADY_REGISTERED:
        disconnectReason = DisconnectReason.BAD_MESSAGE;
        break;
      default:
        disconnectReason = DisconnectReason.UNKNOWN;
    }

2. Send disconnect messages

Before the channel actively disconnects, send TCP P2pDisconnectMessage as much as possible to inform the peer of the disconnection reason:

channel.send(new P2pDisconnectMessage(disconnectReason));

Compatibility

For libp2p nodes that do not support P2pDisconnectMessage, a P2pException of NO_SUCH_MESSAGE will be thrown after receiving the message, and then the channel will be disconnected as well.

Optimize Logic of Acquiring Internet IP

Rationale

Background

The Libp2p module used in java-tron might start slowly in some network environments, sometimes as long as ten seconds. It is necessary to boost the startup speed to optimize the user experience.

Cause Analysis

Currently, libp2p obtains the internet IP(v4) by visiting the address http://checkip.amazonaws.com, and the internet IP(v6) through https://v6.ident.me. Users from different regions around the world access this two domain may be delayed for up to ten seconds or even cannot get access at all due to some reasons. Since the acquisition of IP is in the main thread, in some cases, the startup speed is slow.

Implementation

Design

  1. Several candidate domains of IPv4 & IPv6 located in different regions around the world are provided, which use concurrent requests respectively, and whichever returns first would be adopted, and other requests would be ignored.
  2. If the external IP has been provided there is no need to obtain it again when P2pConfig initializes.

Solution

Three domains are provided for IPv4: "http://checkip.amazonaws.com", "https://ifconfig.me/ ", "https://4.ipw.cn/", and two domains are provided for IPv6: "https://v6.ident.me", "http://6.ipw.cn/". Then, use ExecutorCompletionService to get the result. Therefore the burden of IP acquisition is shared and faster service is offered.

The core logic is as follows:

private static String getIp(List<String> multiSrcUrls) {
    ExecutorService executor = Executors.newCachedThreadPool();
    CompletionService<String> completionService = new ExecutorCompletionService<>(executor);

    List<Callable<String>> tasks = new ArrayList<>();
    multiSrcUrls.forEach(url -> tasks.add(() -> getExternalIp(url)));

    for (Callable<String> task : tasks) {
      completionService.submit(task);
    }

    Future<String> future;
    String result = null;
    try {
      future = completionService.take();
      result = future.get();
    } catch (InterruptedException | ExecutionException e) {
      //ignore
    } finally {
      executor.shutdownNow();
    }

    return result;
  }

Testing

Time to obtain IPv4 and IPv6 addresses:

Region hosts Stacks Before (ms) Optimized (ms)
Beijing, China testgroup022 IPv4 + IPv6 3497 486
Hong Kong, China testgroup026-hk IPv4 661 360
Virginia, USA pubchain-hadoop009 IPv4 198 188
Frankfurt, Germany depubchain-hadoop009 IPv4 293 269

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.