Having an SRQ on its way (<a class="issue-link js-issue-link" data-error-text="Failed

Hello. I have a question. The qp_num in <code class="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Resolved by <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Find a way to track `struct rpma_conn` from which the `struct rpma_completion` comes from about rpma HOT 8 CLOSED

pmem commented on July 16, 2024

Find a way to track `struct rpma_conn` from which the `struct rpma_completion` comes from

from rpma.

Comments (8)

ultra-nexus commented on July 16, 2024

Hello. I have a question. The qp_num in ibv_wc is used to identify a QP uniquely. It seems the most useful usecase is when we use a CQ for multiple QPs. I can't understand why this member is relevant for Receive Work Completions that are associated with an SRQ. I can't figure out the relationship bwtween qp_num and SRQ.

Back to the question. I think the second approach would be more suitable because it's convenient for API users.

from rpma.

janekmi commented on July 16, 2024

@ultra-nexus:

The qp_num in ibv_wc is used to identify a QP uniquely. It seems the most useful usecase is when we use a CQ for multiple QPs. I can't understand why this member is relevant for Receive Work Completions that are associated with an SRQ. I can't figure out the relationship bwtween qp_num and SRQ.

In general, a QP can have two CQs (for send completions and recv completions). These CQs can be a single CQ (as it is currently in librpma) or two separate CQs (#737). Each of these CQs might be connected to one or more connections at will.

SRQ is a feature that allows sharing RQ (part of QP) between QPs. Having a shared RQ makes sharing a recv CQ between connections potentially a very common use-case. I think this is why qp_num is especially relevant for SRQ use-cases. Nonetheless, I hope it is available all the time. :-)

I think the second approach would be more suitable because it's convenient for API users.

I do not like the second idea (allow getting directly struct rpma_conn related to a particular struct rpma_completion) for exactly the same reason I have voted against the internal SRQ table maintained by the librpma library (#737 (comment)). It has to be MT-safe and very efficient since processing completions is the performance-critical path. Whereas MT-safety does not go well with efficiency. As of now, I think the application is better equipped to address this issue in an optimal way.

from rpma.

ultra-nexus commented on July 16, 2024

SRQ is a feature that allows sharing RQ (part of QP) between QPs. Having a shared RQ makes sharing a recv CQ between connections potentially a very common use-case. I think this is why qp_num is especially relevant for SRQ use-cases. Nonetheless, I hope it is available all the time. :-)

Thanks for explanation. Does this mean we firstly need to make sure the qp_num is available in non-SRQ scenario?

I think the second approach would be more suitable because it's convenient for API users.

I do not like the second idea (allow getting directly struct rpma_conn related to a particular struct rpma_completion) for exactly the same reason I have voted against the internal SRQ table maintained by the librpma library (#737 (comment)). It has to be MT-safe and very efficient since processing completions is the performance-critical path. Whereas MT-safety does not go well with efficiency. As of now, I think the application is better equipped to address this issue in an optimal way.

But the first approach also needs a data structure such as hashtable to build relationship between qp_num and rpma_conn. Does this data structure is also created and managed by user applications? If this data structure is managed by rpma, we also need to make it MT-safe and efficient enough.

from rpma.

janekmi commented on July 16, 2024

Does this mean we firstly need to make sure the qp_num is available in non-SRQ scenario?

It is very important information for this discussion. :-)

the first approach also needs a data structure such as hashtable to build relationship between qu_num and rpma_conn

No. It is not. You can imagine an application having few connections and looking for a connection might be a lot simpler using e.g. a table or set of comparisons if the number of connections is known upfront.

from rpma.

ultra-nexus commented on July 16, 2024

Does this mean we firstly need to make sure the qp_num is available in non-SRQ scenario?

It is very important information for this discussion. :-)

After reading the implementation of ibv_cq, I find there are two APIs to poll CQ. One is ibv_poll_cq and the other is ibv_start_poll. The latter will call mlx5_parse_cqe() with the last parameter being set to 1:

mlx5_parse_cqe(cq, cqe64, cqe, &cq->cur_rsc, &cq->cur_srq, NULL, cqe_ver, 1);

If the last argument is 1, mlx5_parse_cqe will use lazy mode which won't fill wc->qp_num.

qpn = be32toh(cqe64->sop_drop_qpn) & 0xffffff;
if (lazy) {
        cq->cqe64 = cqe64;
        cq->flags &= (~MLX5_CQ_LAZY_FLAGS);
} else {
        wc->wc_flags = 0;
        wc->qp_num = qpn; // qp num
}

The lazy mode is introduced in this patch. I haven't understood it completely but it seems qp_num may not be reliable.

from rpma.

janekmi commented on July 16, 2024

You are right that the MLX implementation of the mlx5_parse_cqe() can set or not set qp_num depending on lazy argument. But you have also noticed that lazy == 1 only for mlx5_parse_lazy_cqe() which is used only by mlx5_start_poll() and mlx5_next_poll() which are NOT part of the standard ibv_poll_cq() implementation.

The mlx5_start_poll() and mlx5_next_poll() functions are part of MLX-specific interface which we should not use at least till we decide to incorporate any MLX-specific optimizations. This is not the case right now.

Ref: https://manpages.debian.org/stretch/libibverbs-dev/ibv_create_cq_ex.3.en.html

TL;DR: Don't worry about it. qp_num is set for mlx5_poll_one() -> poll_cq() -> mlx5_poll_cq() -> ibv_poll_cq().

from rpma.

yangx-jy commented on July 16, 2024

Hi @janekmi

How about the following design:

/*  get qp_num from rpma_conn object  */
int rpma_conn_get_qp_num(const struct rpma_conn *conn, uint32_t *qp_num)
{
    if (conn == NULL ||  == NULL)
        return RPMA_E_INVAL;

     *qp_num = conn->id->qp->qp_num;

     return 0;
}

/* add qp_num in struct rpma_completion */
struct rpma_completion {
    ...
    uint32_t qp_num;
};

/* copy the qp_num from struct ibv_wc to struct rpma_completion */
int rpma_cq_get_completion(struct rpma_cq *cq, struct rpma_completion *cmpl)
{
    ...
    cmpl->qp_num = wc.qp_num;
}

Note: let applications compare two qp_num by themselves.

from rpma.

grom72 commented on July 16, 2024

Resolved by #1087

from rpma.

Find a way to track `struct rpma_conn` from which the `struct rpma_completion` comes from about rpma HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent