Code Monkey home page Code Monkey logo

psi's Introduction

SecretFlow PSI Library

CircleCI OpenSSF Scorecard

The repo of Private Set Intersection(PSI) and Private Information Retrieval(PIR) from SecretFlow.

This repo is formerly psi/pir part from secretflow/spu repo.

Note
We invite you to try Easy PSI, a standalone PSI product powered by this library.

PSI Quick Start with v2 API

For PSI v1 API and PIR, please check documentation.

Release Docker

In the following example, we are going to run PSI at a single host.

  1. Check official release docker image at dockerhub. We also have mirrors at Alibaba Cloud: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8.

  2. Prepare data and config.

receiver.config:

{
    "psi_config": {
        "protocol_config": {
            "protocol": "PROTOCOL_KKRT",
            "role": "ROLE_RECEIVER",
            "broadcast_result": true
        },
        "input_config": {
            "type": "IO_TYPE_FILE_CSV",
            "path": "/root/receiver/receiver_input.csv"
        },
        "output_config": {
            "type": "IO_TYPE_FILE_CSV",
            "path": "/root/receiver/receiver_output.csv"
        },
        "keys": [
            "id0",
            "id1"
        ],
        "debug_options": {
            "trace_path": "/root/receiver/receiver.trace"
        }
    },
    "self_link_party": "receiver",
    "link_config": {
        "parties": [
            {
                "id": "receiver",
                "host": "127.0.0.1:5300"
            },
            {
                "id": "sender",
                "host": "127.0.0.1:5400"
            }
        ]
    }
}

sender.config:

{
    "psi_config": {
        "protocol_config": {
            "protocol": "PROTOCOL_KKRT",
            "role": "ROLE_SENDER",
            "broadcast_result": true
        },
        "input_config": {
            "type": "IO_TYPE_FILE_CSV",
            "path": "/root/sender/sender_input.csv"
        },
        "output_config": {
            "type": "IO_TYPE_FILE_CSV",
            "path": "/root/sender/sender_output.csv"
        },
        "keys": [
            "id0",
            "id1"
        ],
        "debug_options": {
            "trace_path": "/root/sender/sender.trace"
        }
    },
    "self_link_party": "sender",
    "link_config": {
        "parties": [
            {
                "id": "receiver",
                "host": "127.0.0.1:5300"
            },
            {
                "id": "sender",
                "host": "127.0.0.1:5400"
            }
        ]
    }
}
File Name Location Description
receiver.config /tmp/receiver/receiver.config Config for receiver.
sender.config /tmp/sender/sender.config Config for sender.
receiver_input.csv /tmp/receiver/receiver_input.csv Input for receiver. Make sure the file contains two id keys - id0 and id1.
sender_input.csv /tmp/sender/sender_input.csv Input for sender. Make sure the file contains two id keys - id0 and id1.
  1. Run PSI

In the first terminal, run the following command

docker run -it  --rm  --network host --mount type=bind,source=/tmp/receiver,target=/root/receiver --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:latest --config receiver/receiver.config

In the other terminal, run the following command simultaneously.

docker run -it  --rm  --network host --mount type=bind,source=/tmp/sender,target=/root/sender  --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:latest --config sender/sender.config

You could also pass a minified JSON config directly. A minified JSON is a compact one without white space and line breaks.

e.g.

docker run -it  --rm  --network host --mount type=bind,source=/tmp/sender,target=/root/sender  --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:latest --json '{"psi_config":{"protocol_config":{"protocol":"PROTOCOL_KKRT","role":"ROLE_RECEIVER","broadcast_result":true},"input_config":{"type":"IO_TYPE_FILE_CSV","path":"/root/receiver/receiver_input.csv"},"output_config":{"type":"IO_TYPE_FILE_CSV","path":"/root/receiver/receiver_output.csv"},"keys":["id0","id1"],"debug_options":{"trace_path":"/root/receiver/receiver.trace"}},"self_link_party":"receiver","link_config":{"parties":[{"id":"receiver","host":"127.0.0.1:5300"},{"id":"sender","host":"127.0.0.1:5400"}]}}'

Building SecretFlow PSI Library

System Setup

Dev Docker

We use secretflow/ubuntu-base-ci docker image. You may check at dockerhub.

# start container
docker run -d -it --name psi-dev-$(whoami) \
         --mount type=bind,source="$(pwd)",target=/home/admin/dev/ \
         -w /home/admin/dev \
         --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
         --cap-add=NET_ADMIN \
         --privileged=true \
         --entrypoint="bash" \
         secretflow/ubuntu-base-ci:latest

# attach to build container
docker exec -it psi-dev-$(whoami) bash

Linux

Install gcc>=11.2, cmake>=3.26, ninja, nasm>=2.15, python>=3.8, bazel, golang, xxd, lld

Note
Please install bazel with version in .bazelversion or use bazelisk.

Build & UnitTest

# build as debug
bazel build //... -c dbg

# build as release
bazel build //... -c opt

# test
bazel test //...

Trace

We use Perfetto from Google for tracing.

Please use debug_options.trace_path field in PsiConfig to modify trace file path. The default path is /tmp/psi.trace.

After running psi binaries, please check trace by using Trace Viewer. If this is not applicable, please check this link to deploy your own website.

The alternate way to visualize trace is to use chrome://tracing:

  1. Download perfetto assets from https://github.com/google/perfetto/releases/tag/v37.0
  2. You should find traceconv binary in assets folder.
  3. Transfer trace file to JSON format:
chmod +x traceconv

./traceconv json [trace file path] [json file path]
  1. Open chrome://tracing in your chrome and load JSON file.

PSI V2 Benchamrk

Please refer to PSI V2 Benchmark

psi's People

Contributors

6fj avatar anakinxc avatar greyjeremyji avatar icavan avatar jamie-cui avatar qxzhou1010 avatar renovate[bot] avatar tarantula-leo avatar zhangwfjh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

psi's Issues

关于运行基于cache的ECDH_OPRF_UB_PSI时遇到的问题

您好,打扰您了。
我在尝试实现基于cache的ECDH_OPRF_UB_PSI遇到了问题。
我的方式是使用ECDH_OPRF_UB_PSI_2PC_GEN_CACHE来作为offline阶段,然后使用ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE作为online阶段。目前在运行ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE时产生报错。下面时两方运行的代码以及产生的错误。
Sender(Server):

import secretflow as sf
import spu
import time

cluster_config = {
    'parties' : {
        'alice': {
            'address': '127.0.0.1:59179',
            'listen_addr': '0.0.0.0:59179'
        },
        'bob': {
            'address': '127.0.0.1:53341',
            'listen_addr': '0.0.0.0:53341'
        }
    },
    'self_party': 'bob'
}
sf.shutdown
sf.init(address='local', cluster_config=cluster_config)
cluster_def = {
    "nodes": [
        {
            "party": "alice",
            "address": "127.0.0.1:45413"
        },
        {
            "party": "bob",
            "address": "127.0.0.1:47480"
        },
    ],
    "runtime_config": {
        "protocol": spu.spu_pb2.SEMI2K,
        "field": spu.spu_pb2.FM128
    },
}

spu = sf.SPU(
    cluster_def,
    link_desc={
        "connect_retry_times": 60,
        "connect_retry_interval_ms": 1000,
    }
)

alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu.psi_csv(
    key={alice:['name'], bob:['name']}, 
    input_path={alice:'/root/project/psi1/alice_exactpsi_1e6_unique.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv'}, 
    output_path={alice:'/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv'}, 
    receiver='bob', 
    broadcast_result=False, 
    protocol='ECDH_OPRF_UB_PSI_2PC_GEN_CACHE', 
    preprocess_path='preprocess_cache',
    ecdh_secret_key_path="/root/project/psi1/alice_oprf_key",
    curve_type='CURVE_FOURQ', 
)
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("Complete offline phase")
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
spu.psi_csv(
    key={alice:['name'], bob:['name']}, 
    input_path={alice:'/root/project/psi1/alice_exactpsi_1e6_unique.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv'}, 
    output_path={alice:'/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv'}, 
    receiver='bob', 
    broadcast_result=False, 
    protocol='ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE', 
    preprocess_path='preprocess_cache',
    ecdh_secret_key_path="/root/project/psi1/alice_oprf_key",
    curve_type='CURVE_FOURQ', 
)
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("Complete online phase")
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

输出:

/root/anaconda3/envs/psi/lib/python3.10/subprocess.py:1796: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = _posixsubprocess.fork_exec(
/root/anaconda3/envs/psi/lib/python3.10/subprocess.py:1796: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = _posixsubprocess.fork_exec(
2024-05-23 15:49:58,963 INFO worker.py:1621 -- Started a local Ray instance.
2024-05-23 15:49:59.719 INFO api.py:233 [alice] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': '0.0.0.0:59179', 'bob': '127.0.0.1:53341'}, 'CURRENT_PARTY_NAME': 'alice', 'TLS_CONFIG': {}}
2024-05-23 15:50:00.424 INFO barriers.py:284 [alice] -- [Anonymous_job] Succeeded to create receiver proxy actor.
(ReceiverProxyActor pid=19280) 2024-05-23 15:50:00.419 INFO grpc_proxy.py:359 [alice] -- [Anonymous_job] ReceiverProxy binding port 59179, options: (('grpc.enable_retries', 1), ('grpc.so_reuseport', 0), ('grpc.max_send_message_length', 524288000), ('grpc.max_receive_message_length', 524288000), ('grpc.service_config', '{"methodConfig": [{"name": [{"service": "GrpcService"}], "retryPolicy": {"maxAttempts": 5, "initialBackoff": "5s", "maxBackoff": "30s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'))...
(ReceiverProxyActor pid=19280) 2024-05-23 15:50:00.422 INFO grpc_proxy.py:379 [alice] -- [Anonymous_job] Successfully start Grpc service without credentials.
2024-05-23 15:50:01.120 INFO barriers.py:333 [alice] -- [Anonymous_job] SenderProxyActor has successfully created.
2024-05-23 15:50:01.121 INFO barriers.py:520 [alice] -- [Anonymous_job] Try ping ['bob'] at 0 attemp, up to 3600 attemps.
2024-05-23 15:50:04.124 INFO barriers.py:520 [alice] -- [Anonymous_job] Try ping ['bob'] at 1 attemp, up to 3600 attemps.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Complete offline phase
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.337] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE","receiver_rank":1,"input_params":{"path":"/root/project/psi1/alice_exactpsi_1e6_unique.csv","select_fields":["name"],"precheck":true},"output_params":{"path":"/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv","need_sort":true},"curve_type":"CURVE_FOURQ","bucket_size":1048576}
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.337] [info] [bucket_psi.cc:400] bucket size set to 1048576
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.337] [info] [bucket_psi.cc:425] Run psi protocol=8, self_items_count=0
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.338] [info] [bucket_ub_psi.cc:93] input file path:/root/project/psi1/alice_exactpsi_1e6_unique.csv
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.338] [info] [bucket_ub_psi.cc:94] output file path:/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv
(SPURuntime(device_id=None, party=alice) pid=19752) [2024-05-23 15:50:08.338] [info] [ecdh_oprf_selector.cc:76] use fourq
Traceback (most recent call last):
  File "/root/project/psi1/sf_connect_ub.py", line 60, in <module>
    spu.psi_csv(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py", line 1848, in psi_csv
    return dispatch(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py", line 111, in dispatch
    return _registrar.dispatch(self.device_type, name, self, *args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py", line 80, in dispatch
    return self._ops[device_type][name](*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/kernels/spu.py", line 321, in psi_csv
    return sfd.get(res)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/distributed/primitive.py", line 156, in get
    return fed.get(object_refs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py", line 602, in get
    values = ray.get(ray_refs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/worker.py", line 2524, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(MemoryError): ray::SPURuntime.psi_csv() (pid=19752, ip=192.168.15.7, actor_id=8c7c48f35d3db7ae1ba590c601000000, repr=SPURuntime(device_id=None, party=alice))
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py", line 866, in psi_csv
    report = psi.bucket_psi(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/spu/psi.py", line 69, in bucket_psi
    report_str = libpsi.libs.bucket_psi(
MemoryError: std::bad_alloc
2024-05-23 15:50:08.424 WARNING cleanup.py:154 [alice] -- [Anonymous_job] Failed to send ObjectRef(85748392bcd969ccc2dc0ecdcc67afbe6255b5ff0100000001000000) to bob, error: ray::SenderProxyActor.send() (pid=19322, ip=192.168.15.7, actor_id=c2dc0ecdcc67afbe6255b5ff01000000, repr=<fed.proxy.barriers.SenderProxyActor object at 0x7fb88c2b5ed0>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::SPURuntime.psi_csv() (pid=19752, ip=192.168.15.7, actor_id=8c7c48f35d3db7ae1ba590c601000000, repr=SPURuntime(device_id=None, party=alice))
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py", line 866, in psi_csv
    report = psi.bucket_psi(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/spu/psi.py", line 69, in bucket_psi
    report_str = libpsi.libs.bucket_psi(
MemoryError: std::bad_alloc,upstream_seq_id: 8#0, downstream_seq_id: 10.
2024-05-23 15:50:08.424 INFO cleanup.py:161 [alice] -- [Anonymous_job] Sending error std::bad_alloc to bob.
2024-05-23 15:50:08.425 WARNING cleanup.py:127 [alice] -- [Anonymous_job] Signal SIGINT to exit.
2024-05-23 15:50:08.425 WARNING api.py:60 [alice] -- [Anonymous_job] Stop signal received (e.g. via SIGINT/Ctrl+C), try to shutdown fed. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.
2024-05-23 15:50:08.426 WARNING api.py:325 [alice] -- [Anonymous_job] Shutdowning rayfed unintendedly...
2024-05-23 15:50:08.426 ERROR api.py:330 [alice] -- [Anonymous_job] Cross-silo sending error occured. ray::SenderProxyActor.send() (pid=19322, ip=192.168.15.7, actor_id=c2dc0ecdcc67afbe6255b5ff01000000, repr=<fed.proxy.barriers.SenderProxyActor object at 0x7fb88c2b5ed0>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::SPURuntime.psi_csv() (pid=19752, ip=192.168.15.7, actor_id=8c7c48f35d3db7ae1ba590c601000000, repr=SPURuntime(device_id=None, party=alice))
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py", line 866, in psi_csv
    report = psi.bucket_psi(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/spu/psi.py", line 69, in bucket_psi
    report_str = libpsi.libs.bucket_psi(
MemoryError: std::bad_alloc
2024-05-23 15:50:08.426 INFO api.py:337 [alice] -- [Anonymous_job] No wait for data sending.
2024-05-23 15:50:08.427 INFO message_queue.py:70 [alice] -- [Anonymous_job] Notify message polling thread[DataSendingQueueThread] to exit.
2024-05-23 15:50:08.427 INFO message_queue.py:70 [alice] -- [Anonymous_job] Notify message polling thread[ErrorSendingQueueThread] to exit.
2024-05-23 15:50:08.427 INFO api.py:352 [alice] -- [Anonymous_job] Shutdowned rayfed.
2024-05-23 15:50:08.427 CRITICAL api.py:356 [alice] -- [Anonymous_job] Exit now due to the previous error.
Exception ignored in: <module 'threading' from '/root/anaconda3/envs/psi/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/root/anaconda3/envs/psi/lib/python3.10/threading.py", line 1567, in _shutdown
    lock.acquire()
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py", line 65, in _signal_handler
    _shutdown(intended=False)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py", line 357, in _shutdown
    sys.exit(1)
SystemExit: 1

Receiver(Client):

import secretflow as sf
import spu
import time

cluster_config = {
    'parties' : {
        'alice': {
            'address': '127.0.0.1:59179',
            'listen_addr': '0.0.0.0:59179'
        },
        'bob': {
            'address': '127.0.0.1:53341',
            'listen_addr': '0.0.0.0:53341'
        }
    },
    'self_party': 'bob'
}
sf.shutdown
sf.init(address='local', cluster_config=cluster_config)
cluster_def = {
    "nodes": [
        {
            "party": "alice",
            "address": "127.0.0.1:45413"
        },
        {
            "party": "bob",
            "address": "127.0.0.1:47480"
        },
    ],
    "runtime_config": {
        "protocol": spu.spu_pb2.SEMI2K,
        "field": spu.spu_pb2.FM128
    },
}

spu = sf.SPU(
    cluster_def,
    link_desc={
        "connect_retry_times": 60,
        "connect_retry_interval_ms": 1000,
    }
)

alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu.psi_csv(
    key={alice:['name'], bob:['name']}, 
    input_path={alice:'/root/project/psi1/alice_exactpsi_1e6_unique.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv'}, 
    output_path={alice:'/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv'}, 
    receiver='bob', 
    broadcast_result=False, 
    protocol='ECDH_OPRF_UB_PSI_2PC_GEN_CACHE', 
    preprocess_path='preprocess_cache',
    ecdh_secret_key_path="/root/project/psi1/alice_oprf_key",
    curve_type='CURVE_FOURQ', 
)
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("Complete offline phase")
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
spu.psi_csv(
    key={alice:['name'], bob:['name']}, 
    input_path={alice:'/root/project/psi1/alice_exactpsi_1e6_unique.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv'}, 
    output_path={alice:'/root/project/psi1/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv', bob:'/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv'}, 
    receiver='bob', 
    broadcast_result=False, 
    protocol='ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE', 
    preprocess_path='preprocess_cache',
    ecdh_secret_key_path="/root/project/psi1/alice_oprf_key",
    curve_type='CURVE_FOURQ', 
)
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
print("Complete online phase")
print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

输出:

/root/anaconda3/envs/psi/lib/python3.10/subprocess.py:1796: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = _posixsubprocess.fork_exec(
/root/anaconda3/envs/psi/lib/python3.10/subprocess.py:1796: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = _posixsubprocess.fork_exec(
2024-05-23 15:50:01,526 INFO worker.py:1621 -- Started a local Ray instance.
2024-05-23 15:50:02.104 INFO api.py:233 [bob] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': '127.0.0.1:59179', 'bob': '0.0.0.0:53341'}, 'CURRENT_PARTY_NAME': 'bob', 'TLS_CONFIG': {}}
2024-05-23 15:50:02.743 INFO barriers.py:284 [bob] -- [Anonymous_job] Succeeded to create receiver proxy actor.
(ReceiverProxyActor pid=19639) 2024-05-23 15:50:02.740 INFO grpc_proxy.py:359 [bob] -- [Anonymous_job] ReceiverProxy binding port 53341, options: (('grpc.enable_retries', 1), ('grpc.so_reuseport', 0), ('grpc.max_send_message_length', 524288000), ('grpc.max_receive_message_length', 524288000), ('grpc.service_config', '{"methodConfig": [{"name": [{"service": "GrpcService"}], "retryPolicy": {"maxAttempts": 5, "initialBackoff": "5s", "maxBackoff": "30s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'))...
(ReceiverProxyActor pid=19639) 2024-05-23 15:50:02.742 INFO grpc_proxy.py:379 [bob] -- [Anonymous_job] Successfully start Grpc service without credentials.
2024-05-23 15:50:03.339 INFO barriers.py:333 [bob] -- [Anonymous_job] SenderProxyActor has successfully created.
2024-05-23 15:50:03.339 INFO barriers.py:520 [bob] -- [Anonymous_job] Try ping ['alice'] at 0 attemp, up to 3600 attemps.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Complete offline phase
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.320] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"ECDH_OPRF_UB_PSI_2PC_GEN_CACHE","receiver_rank":1,"input_params":{"path":"/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv","select_fields":["name"],"precheck":true},"output_params":{"path":"/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv","need_sort":true},"curve_type":"CURVE_FOURQ","bucket_size":1048576,"ecdh_secret_key_path":"/root/project/psi1/alice_oprf_key"}
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.320] [info] [bucket_psi.cc:425] Run psi protocol=7, self_items_count=0
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [bucket_ub_psi.cc:93] input file path:/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [bucket_ub_psi.cc:94] output file path:/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [ecdh_oprf_selector.cc:33] use fourq
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [batch_provider.cc:328] ReadAndShuffle start, idx:0, provider_batch_size:1048576
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [batch_provider.cc:350] ReadAndShuffle end, idx:0 , size:100
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.321] [info] [ecdh_oprf_psi.cc:108] omp_get_num_threads:1 cpus:8
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.325] [info] [batch_provider.cc:318] cursor_index_:0, bucket_index_:0, 0-0
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.325] [info] [batch_provider.cc:240] cursor_index_:0, bucket_index_:0, 0-0
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.327] [info] [ecdh_oprf_psi.cc:192] FullEvaluate finished, batch_count=1 items_count=100
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.331] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"ECDH_OPRF_UB_PSI_2PC_TRANSFER_CACHE","receiver_rank":1,"input_params":{"path":"/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv","select_fields":["name"],"precheck":true},"output_params":{"path":"/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv","need_sort":true},"curve_type":"CURVE_FOURQ","bucket_size":1048576,"preprocess_path":"preprocess_cache"}
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.331] [info] [bucket_psi.cc:400] bucket size set to 1048576
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.337] [info] [bucket_psi.cc:425] Run psi protocol=8, self_items_count=0
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.337] [info] [bucket_ub_psi.cc:93] input file path:/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique.csv
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.337] [info] [bucket_ub_psi.cc:94] output file path:/root/project/psi2/bob_exactpsi_1e6_to_1e2_unique_output_ubpsi_cache.csv
(SPURuntime(device_id=None, party=bob) pid=19724) [2024-05-23 15:50:08.337] [info] [bucket_ub_psi.cc:186] Start Sync
2024-05-23 15:50:08.430 WARNING api.py:607 [bob] -- [Anonymous_job] Encounter RemoteError happend in other parties, error message: FedRemoteError occurred at alice
Traceback (most recent call last):
  File "/root/project/psi2/sf_connect_ub.py", line 60, in <module>
    spu.psi_csv(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py", line 1848, in psi_csv
    return dispatch(
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py", line 111, in dispatch
    return _registrar.dispatch(self.device_type, name, self, *args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py", line 80, in dispatch
    return self._ops[device_type][name](*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/kernels/spu.py", line 321, in psi_csv
    return sfd.get(res)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/distributed/primitive.py", line 156, in get
    return fed.get(object_refs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py", line 613, in get
    raise e
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py", line 602, in get
    values = ray.get(ray_refs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/worker.py", line 2524, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FedRemoteError): ray::ReceiverProxyActor.get_data() (pid=19639, ip=192.168.15.7, actor_id=d23873c80f4140328261fbb701000000, repr=<fed.proxy.barriers.ReceiverProxyActor object at 0x7ff0b8729c90>)
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/proxy/barriers.py", line 236, in get_data
    raise data
fed.exceptions.FedRemoteError: FedRemoteError occurred at alice

感谢您的帮助!

[Bug]: psi_test数据容量问题

Issue Type

Usability

Modules Involved

PSI

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu 0.7.0b0

OS Platform and Distribution

centos 7

Python Version

3.8

Compiler Version

No response

Current Behavior?

我修改了bob.csv,扩大了数据量,显示出错,self.assertEqual( 数1 != 数2)。我跟着代码from .psi_pb2 import( # type: ignoreBucketPsiConfig,CurveType,InputParams,MemoryPsiConfig,OutputParams,PsiResultReport,PsiType,),没有找到这个psi_pb2这个文件。

Standalone code to reproduce the issue

print("A bug")

Relevant log output

No response

对于星核杯隐匿查询案例代码运行有关Ray报错

Issue Type

Bug

Source

binary

Secretflow Version

secretflow 1.0.0b3

OS Platform and Distribution

Asianux

Python version

3.8.16

Bazel version

No response

GCC/Compiler version

No response

What happend and What you expected to happen.

运行星河杯隐匿查询代码报错(数据量:1000w), 提示是节点在内存上运行太慢,工作进程被杀。
运行环境:4核8G内存
swap_space: 20G
请问应该对ray集群进行怎样的配置以满足需求。

Reproduction code to reproduce the issue.

ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class name: SPURuntime


The actoris dead because its worker process has died. Worker exit type: NODE.OUT_OF MEMORY Worker exit detail: Task was lilled due to the node running low on memory
Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html.
Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task.
Set max restarts and max task retries to enable retry when the task crashes due to OOM.
To adjust the kill threshold, set the environment variable 'RAY_memory_usage_ threshold' when starting Ray.
To disable worker killing, set the environment variable 'RAY_memory_monitor refresh ms' to zero.

特征数量对求交速度的影响

我使用p2p模式的kuscia API进行了求交测试,其中100W与100w求交没有特征只有标签y,31S就搞定了,然而50w与50w求交各450维特征,却要7分多,很明显原始数据的特征数量会影响求交的速度,而按逻辑求交只是索引间的计算,这是为什么呢?

labelpsi 序列化处理解决cache内存大问题

您好请问labelpsi这里通过serializable进行序列化存储BinBundle,以降低整个bundle-cache占用内存大问题。然后在查询阶段是否还是要将整个serializable在反序列化为senderdb。另外,请问这三个变量(
std::shared_ptryacl::io::KVStore meta_info_store_;
std::vector<std::shared_ptryacl::io::IndexStore> bundles_store_;
std::vector<size_t> bundles_store_idx_;)
分别是存储什么信息?代码看的有点蒙[捂脸哭]
``
image

[Feature]: pir协议咨询

Related problem

请问,pir协议目前支持如下两种?其中第一种是什么?有具体描述吗?
image

Solution

pir咨询

Alternatives

pir咨询

Additional context

pir咨询

[Bug]: 使用SPU跑pir任务,使用旧的setup目录一直会卡死

Issue Type

Support

Modules Involved

PIR

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

0.6.0b0

OS Platform and Distribution

CentOS Linux 7

Python Version

3.8

Compiler Version

No response

Current Behavior?

在跑pir的时候,如果每次生成新的setup文件,pir任务是正常的。
如果使用旧的setup文件,两方就会卡死。alice方一直不会进入到pir_query的方法。
ray的启动命令
ray start --head --node-ip-address=$nodeip --port=$nodeport --include-dashboard=False --disable-usage-stats

Standalone code to reproduce the issue

print("A bug")

Relevant log output

No response

做求交的时候会有以下的日志

Describe the bug

image
这些日志是在做什么?

Steps To Reproduce

想了解求交过程。

Expected behavior

想了解求交过程。

Version

sf 1.3

Operating system

Anolis OS 8.8

Hardware Resources

8c16g

PIR setup阶段出现问题

我是按照https://www.secretflow.org.cn/en/docs/psi/v0.4.0beta/user_guide/pir 中的流程来部署的,执行
docker run -it --rm --network host --mount type=bind,source=/tmp/server,target=/root/server --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:0.1.0beta --config server/apsi_server_setup.json代码时出现如下问题:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "--config": executable file not found in $PATH: unknown.

[Bug]: RR22_LOWCOMM_PSI_2PC对抖动高敏感

Issue Type

Usability

Modules Involved

PSI

Have you reproduced the bug with SPU HEAD?

No

Have you searched existing issues?

Yes

SPU Version

spu 0.8.0b0

OS Platform and Distribution

Centos

Python Version

3.9

Compiler Version

No response

Current Behavior?

在测试有限网络环境下不同PSI算法的表现时,我发现RR22_LOWCOMM似乎对抖动敏感。在10Mbps下,180ms延迟+45ms抖动会使求交任务链接建立但数据传输出现问题。这是实现问题还是算法局限?
我会进一步对不同的延迟和抖动进行测试。

Standalone code to reproduce the issue

如上

Relevant log output

如上

关于隐语实现的ECDH-OPRF协议问题请教

我看文档中说ECDH-OPRF协议,是基于RA18的第三章
084753af398fc4e19d3dad3ce5fd6da

但是,RA18的第三章实现的算法,并没有使用OPRF。
f8c26afe0d6c6929fff9ea6dc420b0a

请问咱们这边是自己添加的吗?还是基于什么论文实现的?

[Bug]: psi_test中,用两台机器测非平衡ecdh_oprf_unbalanced显示get不到数据

Issue Type

Usability

Modules Involved

SPU runtime, PSI

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu 0.7.0b0

OS Platform and Distribution

centos 7

Python Version

3.8

Compiler Version

No response

Current Behavior?

我将psi_test里的ecdh_oprf_unbalanced单独测试,并将alice和bob的rank修改为0和1,分开在两台机器上单独测试,显示可以连接上,但是在Start Sync之后,显示 Get data timeout, key=1709604787477:1:ALLGATHER。

Standalone code to reproduce the issue

这是alice的代码import subprocess
import time
import unittest
from socket import socket

import multiprocess

import spu.libspu.link as link
import spu.psi as psi
from spu.utils.simulation import PropagatingThread


def get_free_port():
    with socket() as s:
        s.bind(("localhost", 0))
        return s.getsockname()[1]


def wc_count(file_name):
    out = subprocess.getoutput("wc -l %s" % file_name)
    return int(out.split()[0])


class UnitTests(unittest.TestCase):
    # def run_psi(self, fn):
    #     # wsize = 2
    #     wsize = 1
    #     print("used")

    #     lctx_desc = link.Desc()
    #     for rank in range(wsize):
    #         lctx_desc.add_party(f"id_{rank}", f"thread_{rank}")
    #         # print("wsize = ",wsize)

    #     def wrap(rank):
    #         lctx = link.create_mem(lctx_desc, rank)
    #         # print("rank = ",rank)
    #         return fn(lctx)

    #     jobs = [PropagatingThread(target=wrap, args=(rank,)) for rank in range(wsize)]

    #     [job.start() for job in jobs]
    #     [job.join() for job in jobs]
   

    def test_ecdh_oprf_unbalanced(self):
        print("----------test_ecdh_oprf_unbalanced-------------")

        offline_path = ["", "/home/spu/spu_new/spu/tests/data/bob.csv"]
        online_path = ["/home/spu/spu_new/spu/tests/data/alice.csv", "/home/spu/spu_new/spu/tests/data/bob.csv"]
        outputs = ["./alice-aliceout.csv", "./bob-bobout.csv"]
        preprocess_path = ["./alice-preprocess-disout.csv", ""]
        secret_key_path = ["", "./secret_key.bin"]
        selected_fields = ["id"]

        with open(secret_key_path[1], 'wb') as f:
            f.write(
                bytes.fromhex(
                    "000102030405060708090a0b0c0d0e0ff0e0d0c0b0a090807060504030201000"
                )
            )

        time_stamp = time.time()
        lctx_desc = link.Desc()
        lctx_desc.id = str(round(time_stamp * 1000))

        # for rank in range(1):   secretflow/spu#2->1
        #     port = get_free_port()
        #     lctx_desc.add_party(f"id_{rank}", f"127.0.0.1:{port}")

        # alice_port = get_free_port()
        alice_port = 17788
        print("alice port is ",alice_port)
        lctx_desc.add_party("id_0", f"10.19.93.78:{alice_port}")
       
      
        bob_port = 17788
        lctx_desc.add_party("id_1", f"10.19.93.80:{bob_port}")



        receiver_rank = 0
        server_rank = 1
        client_rank = 0
        # one-way PSI, just one party get result
        broadcast_result = False

        precheck_input = False
        server_cache_path = "server_cache.bin"

        def wrap(
            rank,#alice rank is 0
            offline_path,
            online_path,
            out_path,
            preprocess_path,
            ub_secret_key_path,
        ):
            print("alice rank is ",rank)
            link_ctx = link.create_brpc(lctx_desc, rank,log_details=True)
            # link_ctx = link.create_brpc(lctx_desc, 1-rank)
            
###
            print("used")
            print("===== offline phase =====")
            offline_config = psi.BucketPsiConfig(
                psi_type=psi.PsiType.Value('ECDH_OPRF_UB_PSI_2PC_OFFLINE'),
                broadcast_result=broadcast_result,
                receiver_rank=client_rank,
                input_params=psi.InputParams(
                    path=offline_path,
                    select_fields=selected_fields,
                    precheck=precheck_input,
                ),
                output_params=psi.OutputParams(path="./fake.out", need_sort=False),
                bucket_size=1000000,
                curve_type=psi.CurveType.CURVE_FOURQ,
            )

            if client_rank == link_ctx.rank:
                print("dummy") ########
                offline_config.preprocess_path = preprocess_path
                offline_config.input_params.path = "./dummy.csv"
            else:
                print("else dummy")
                offline_config.ecdh_secret_key_path = ub_secret_key_path

            start = time.time()
            offline_report = psi.bucket_psi(link_ctx, offline_config)

            if receiver_rank != link_ctx.rank:
                server_source_count = wc_count(offline_path)
                self.assertEqual(offline_report.original_count, server_source_count - 1)

            print(f"offline cost time: {time.time() - start}")
            print(
                f"offline: rank: {rank} original_count: {offline_report.original_count}"
            )
            print(
                f"offline: rank: {rank} intersection_count: {offline_report.intersection_count}"
            )
            
            print("===== online phase =====")
            online_config = psi.BucketPsiConfig(
                psi_type=psi.PsiType.Value('ECDH_OPRF_UB_PSI_2PC_ONLINE'),
                broadcast_result=broadcast_result,
                receiver_rank=client_rank,
                input_params=psi.InputParams(
                    path=online_path,
                    select_fields=selected_fields,
                    precheck=precheck_input,
                ),
                output_params=psi.OutputParams(path=out_path, need_sort=False),
                bucket_size=300000,
                curve_type=psi.CurveType.CURVE_FOURQ,
            )

            if receiver_rank == link_ctx.rank:
                online_config.preprocess_path = preprocess_path
            else:
                online_config.ecdh_secret_key_path = ub_secret_key_path
                online_config.input_params.path = "dummy.csv"

            start = time.time()
            report_online = psi.bucket_psi(link_ctx, online_config)

            if receiver_rank == link_ctx.rank:
                client_source_count = wc_count(online_path)
                self.assertEqual(report_online.original_count, client_source_count - 1)

            print(f"online cost time: {time.time() - start}")
            print(f"online: rank:{rank} original_count: {report_online.original_count}")
            print(f"intersection_count: {report_online.intersection_count}")
            
            link_ctx.stop_link()

        # launch with multiprocess
        jobs = [
            multiprocess.Process(
                target=wrap,
                args=(
                    rank,
                    offline_path[rank],
                    online_path[rank],
                    outputs[rank],
                    preprocess_path[rank],
                    secret_key_path[rank],
                ),
            )
            for rank in range(0,1)    # 0
            # for rank in range(1,2)    # 1
        ]
        [job.start() for job in jobs]
        for job in jobs:
            job.join()
            self.assertEqual(job.exitcode, 0)

if __name__ == '__main__':
    unittest.main()
这是bob的代码import subprocess
import time
import unittest
from socket import socket

import multiprocess

import spu.libspu.link as link
import spu.psi as psi
from spu.utils.simulation import PropagatingThread


def get_free_port():
    with socket() as s:
        s.bind(("localhost", 0))
        return s.getsockname()[1]


def wc_count(file_name):
    out = subprocess.getoutput("wc -l %s" % file_name)
    return int(out.split()[0])


class UnitTests(unittest.TestCase):
    # def run_psi(self, fn):
    #     # wsize = 2
    #     wsize = 1
    #     print("used")

    #     lctx_desc = link.Desc()
    #     for rank in range(wsize):
    #         lctx_desc.add_party(f"id_{rank}", f"thread_{rank}")
    #         # print("wsize = ",wsize)

    #     def wrap(rank):
    #         lctx = link.create_mem(lctx_desc, rank)
    #         # print("rank = ",rank)
    #         return fn(lctx)

    #     jobs = [PropagatingThread(target=wrap, args=(rank,)) for rank in range(wsize)]

    #     [job.start() for job in jobs]
    #     [job.join() for job in jobs]
   

    def test_ecdh_oprf_unbalanced(self):
        print("----------test_ecdh_oprf_unbalanced-------------")

        offline_path = ["", "/home/spu_new/spu/tests/data/bob.csv"]
        online_path = ["/home/spu_new/spu/tests/data/alice.csv", "/home/spu_new/spu/tests/data/bob.csv"]
        outputs = ["./alice-aliceout.csv", "./bob-bobout.csv"]
        preprocess_path = ["./alice-preprocess-disout.csv", ""]
        secret_key_path = ["", "./secret_key.bin"]
        selected_fields = ["id"]

        with open(secret_key_path[1], 'wb') as f:
            f.write(
                bytes.fromhex(
                    "000102030405060708090a0b0c0d0e0ff0e0d0c0b0a090807060504030201000"
                )
            )

        time_stamp = time.time()
        lctx_desc = link.Desc()
        lctx_desc.id = str(round(time_stamp * 1000))

        # for rank in range(1):   secretflow/spu#2->1
        #     port = get_free_port()
        #     lctx_desc.add_party(f"id_{rank}", f"127.0.0.1:{port}")

        # lctx_desc.add_party(f"id_0", "10.19.93.78:12345")
        # lctx_desc.add_party(f"id_1", "10.19.93.80:12345")
        


        # # alice_port = input("input alice's port: ")
        alice_port = 17788
        lctx_desc.add_party("id_0", f"10.19.93.78:{alice_port}")

        # bob_port = get_free_port()
        bob_port = 17788
        print("bob_port port is ",bob_port)
        lctx_desc.add_party("id_1", f"10.19.93.80:{bob_port}")
       

        receiver_rank = 0
        server_rank = 1
        client_rank = 0
        # one-way PSI, just one party get result
        broadcast_result = False

        precheck_input = False
        server_cache_path = "server_cache.bin"

        def wrap(
            rank,#bob rank is 1
            offline_path,
            online_path,
            out_path,
            preprocess_path,
            ub_secret_key_path,
        ):
            print("bob rank is ",rank)
            link_ctx = link.create_brpc(lctx_desc, rank,log_details=True)
            # link_ctx = link.create_brpc(lctx_desc, 1-rank)
            
###
            print("used")
            print("===== offline phase =====")
            offline_config = psi.BucketPsiConfig(
                psi_type=psi.PsiType.Value('ECDH_OPRF_UB_PSI_2PC_OFFLINE'),
                broadcast_result=broadcast_result,
                receiver_rank=client_rank,
                input_params=psi.InputParams(
                    path=offline_path,
                    select_fields=selected_fields,
                    precheck=precheck_input,
                ),
                output_params=psi.OutputParams(path="./fake.out", need_sort=False),
                bucket_size=1000000,
                curve_type=psi.CurveType.CURVE_FOURQ,
            )

            if client_rank == link_ctx.rank:
                print("dummy")
                offline_config.preprocess_path = preprocess_path
                offline_config.input_params.path = "./dummy.csv"
            else:
                print("else dummy")############
                offline_config.ecdh_secret_key_path = ub_secret_key_path

            start = time.time()
            offline_report = psi.bucket_psi(link_ctx, offline_config) #######################

            if receiver_rank != link_ctx.rank:
                server_source_count = wc_count(offline_path)
                self.assertEqual(offline_report.original_count, server_source_count - 1)

            print(f"offline cost time: {time.time() - start}")
            print(
                f"offline: rank: {rank} original_count: {offline_report.original_count}"
            )
            print(
                f"offline: rank: {rank} intersection_count: {offline_report.intersection_count}"
            )
            
            print("===== online phase =====")
            online_config = psi.BucketPsiConfig(
                psi_type=psi.PsiType.Value('ECDH_OPRF_UB_PSI_2PC_ONLINE'),
                broadcast_result=broadcast_result,
                receiver_rank=client_rank,
                input_params=psi.InputParams(
                    path=online_path,
                    select_fields=selected_fields,
                    precheck=precheck_input,
                ),
                output_params=psi.OutputParams(path=out_path, need_sort=False),
                bucket_size=300000,
                curve_type=psi.CurveType.CURVE_FOURQ,
            )

            if receiver_rank == link_ctx.rank:
                online_config.preprocess_path = preprocess_path
            else:
                online_config.ecdh_secret_key_path = ub_secret_key_path
                online_config.input_params.path = "dummy.csv"

            start = time.time()
            report_online = psi.bucket_psi(link_ctx, online_config)

            if receiver_rank == link_ctx.rank:
                client_source_count = wc_count(online_path)
                self.assertEqual(report_online.original_count, client_source_count - 1)

            print(f"online cost time: {time.time() - start}")
            print(f"online: rank:{rank} original_count: {report_online.original_count}")
            print(f"intersection_count: {report_online.intersection_count}")
            
            link_ctx.stop_link()

        # launch with multiprocess
        jobs = [
            multiprocess.Process(
                target=wrap,
                args=(
                    rank,
                    offline_path[rank],
                    online_path[rank],
                    outputs[rank],
                    preprocess_path[rank],
                    secret_key_path[rank],
                ),
            )
            for rank in range(1,2)    # 1
            # for rank in range(0,1)    # 0
        ]
        [job.start() for job in jobs]
        for job in jobs:
            job.join()
            self.assertEqual(job.exitcode, 0)#0

if __name__ == '__main__':
    unittest.main()

Relevant log output

这是日志信息:
----------test_ecdh_oprf_unbalanced-------------
bob_port port is  17788
bob rank is  1
I0305 10:13:07.500635 36827 external/com_github_brpc_brpc/src/brpc/server.cpp:1158] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=17788.
W0305 10:13:07.500801 36827 external/com_github_brpc_brpc/src/brpc/server.cpp:1164] Builtin services are disabled according to ServerOptions.has_builtin_services
[2024-03-05 10:13:07.509] [info] [context.cc:141] connecting to mesh, id=1709604787477, self=1
[2024-03-05 10:13:07.509] [info] [context.cc:146] attempt to connect to rank=0
I0305 10:13:07.611303 36883 external/com_github_brpc_brpc/src/brpc/socket.cpp:2466] Checking Socket{id=0 addr=10.19.93.78:17788} (0x39362c0)
[2024-03-05 10:13:16.554] [info] [context.cc:149] try connect to rank=0 failed with error [external/yacl/yacl/link/transport/interconnection_link.cc:56] cntl ErrorCode '112', http status code '0', response header '', response body '', error msg '[E112]Not connected to 10.19.93.78:17788 yet, server_id=0'
[2024-03-05 10:13:16.555] [info] [context.cc:168] try_connect to rank 0 not succeed, sleep_for 1000ms and retry.
I0305 10:13:16.615165 36834 external/com_github_brpc_brpc/src/brpc/socket.cpp:2526] Revived Socket{id=0 addr=10.19.93.78:17788} (0x39362c0) (Connectable)
[2024-03-05 10:13:17.555] [info] [context.cc:146] attempt to connect to rank=0
[2024-03-05 10:13:17.557] [info] [context.cc:188] connecting to mesh, all partners launched
[2024-03-05 10:13:17.557] [info] [context.cc:198] connected to mesh, id=1709604787477, self=1
used
===== offline phase =====
else dummy
[2024-03-05 10:13:17.559] [info] [bucket_psi.cc:401] bucket size set to 1000000
[2024-03-05 10:13:17.561] [info] [bucket_psi.cc:426] Run psi protocol=9, self_items_count=0
[2024-03-05 10:13:17.567] [info] [bucket_ub_psi.cc:93] input file path:/home/spu_new/spu/tests/data/bob.csv
[2024-03-05 10:13:17.567] [info] [bucket_ub_psi.cc:94] output file path:./fake.out
[2024-03-05 10:13:17.568] [info] [ecdh_oprf_selector.cc:33] use fourq
[2024-03-05 10:13:17.568] [info] [batch_provider.cc:328] ReadAndShuffle start, idx:0, provider_batch_size:1000000
[2024-03-05 10:13:17.579] [info] [batch_provider.cc:350] ReadAndShuffle end, idx:0 , size:10001
[2024-03-05 10:13:17.579] [info] [bucket_ub_psi.cc:326] Start sync
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "psi-bob.py", line 148, in wrap
    offline_report = psi.bucket_psi(link_ctx, offline_config) #######################
  File "/usr/local/lib/python3.8/site-packages/spu/psi.py", line 68, in bucket_psi
    report_str = libpsi.libs.bucket_psi(
RuntimeError: what: 
        [external/yacl/yacl/link/transport/channel.cc:411] Get data timeout, key=1709604787477:1:ALLGATHER
stacktrace: 
#0 yacl::link::Context::RecvInternal()+0x7efbe66dd067
secretflow/spu#1 yacl::link::AllGatherImpl<>()+0x7efbe66d7bc1
secretflow/spu#2 yacl::link::AllGather()+0x7efbe66d8053
secretflow/spu#3 psi::psi::AllGatherItemsSize()+0x7efbe66d5ca5
secretflow/spu#4 psi::psi::UbPsiServerOffline()+0x7efbe53681e5
secretflow/spu#5 psi::psi::UbPsi()+0x7efbe536c69b
secretflow/spu#6 psi::psi::BucketPsi::RunPsi()+0x7efbe5360458
secretflow/spu#7 psi::psi::BucketPsi::Run()+0x7efbe53629b0
secretflow/spu#8 psi::BindLibs()::{lambda()#2}::operator()()+0x7efbe52aba32
secretflow/spu#9 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7efbe52ac052
secretflow/spu#10 pybind11::cpp_function::dispatcher()+0x7efbe528eb73
secretflow/spu#11 PyCFunction_Call+0x43bcda


I0305 10:13:47.610506 36827 external/com_github_brpc_brpc/src/brpc/server.cpp:1218] Server[yacl::link::transport::internal::ReceiverServiceImpl] is going to quit
[2024-03-05 10:13:47.611] [warning] [channel.h:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
F
======================================================================
FAIL: test_ecdh_oprf_unbalanced (__main__.UnitTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "psi-bob.py", line 215, in test_ecdh_oprf_unbalanced
    self.assertEqual(job.exitcode, 0)#0
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 1 test in 40.150s

FAILED (failures=1)

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: File contents are invalid JSON but parse using JSON5. Support for this will be removed in a future release so please change to a support .json5 file name or ensure correct JSON syntax.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

bazel
bazel/repositories.bzl
  • yacl 0.4.5b4_nightly_20240731
  • platforms 0.0.8
  • com_github_facebook_zstd v1.5.5
  • com_github_emptoolkit_emp_tool 0.2.5
  • com_github_emptoolkit_emp_ot 0.2.4
  • com_github_intel_ipp ippcp_2021.8
  • com_github_emptoolkit_emp_zk 0.2.1
  • com_github_microsoft_seal v4.1.1
  • com_github_microsoft_apsi v0.11.0
  • com_github_microsoft_gsl v4.0.0
  • com_github_microsoft_kuku v2.1.0
  • com_google_flatbuffers v24.3.25
  • org_apache_arrow apache-arrow-10.0.0
  • com_github_grpc_grpc v1.51.0
  • com_github_tencent_rapidjson v1.1.0
  • brotli v1.1.0
  • org_apache_thrift v0.19.0
  • com_google_double_conversion v3.3.0
  • com_github_google_snappy 1.1.9
  • com_github_google_perfetto v41.0
  • com_github_floodyberry_curve25519_donna 2fe66b65ea1acb788024f40a3373b8b3e6f4bbb2
  • com_github_ridiculousfish_libdivide 5.0
  • com_github_sparsehash_sparsehash sparsehash-2.0.4
  • com_github_zeromq_cppzmq v4.10.0
  • com_github_zeromq_libzmq v4.3.5
  • com_github_log4cplus_log4cplus REL_2_1_1
  • com_github_open_source_parsers_jsoncpp 1.9.5
bazelisk
.bazelversion
circleci
.circleci/config.yml
  • path-filtering 1.0.0
  • continuation 1.0.0
.circleci/release.yml
  • cimg/deploy 2023.06.1
.circleci/test.yml
dockerfile
docker/Dockerfile
  • openanolis/anolisos 8.8
  • openanolis/anolisos 8.8
github-actions
.github/workflows/buildifier.yml
.github/workflows/cla.yml
.github/workflows/clang-format.yml
.github/workflows/publish_docker_image.yml
  • CircleCI-Public/trigger-circleci-pipeline-action v1.1.0
.github/workflows/scorecard.yml
  • actions/checkout v4.1.1@b4ffde65f46336ab88eb53be808477a3936bae11
  • ossf/scorecard-action v2.3.1@0864cf19026789058feabb7e87baa5f140aac736
  • github/codeql-action v3.23.1@0b21cf2492b6b02c465a3e5d7c473717ad7721ba
.github/workflows/stale.yml
.github/workflows/yaml-linter.yml
pip_requirements
docs/requirements.txt
  • nbsphinx ==0.8.9
  • sphinx ==5.3.0
  • myst-parser ==0.18.1
  • sphinx-intl ==2.1.0

  • Check this box to trigger a request for Renovate to run again on this repository

关于unbalance psi中EcdhOprfPsiServer::FullEvaluate函数实现的疑问

EcdhOprfPsiServer::FullEvaluate是unbalance psi调用链路中的一个函数,我关注到在这个函数中调用了oprf_server_->SimpleEvaluate(batch_items[0])来计算最后的oprf值,我想了解下这一实现是否存在问题,因为这看起来其实算出来的是一个HASH值。

按照https://www.secretflow.org.cn/en/docs/psi/v0.3.0beta/development/psi_protocol_intro#ecdh-oprf-based-psi
中的协议描述,此处似乎应该调用oprf_server_->FullEvaluate(batch_items[0])?

[Bug]: 关于psi-anolis疑问

Describe the bug

我通过官方docker镜像方式部署了EasyPSI整套工程,我有如下疑问:
疑问1:我解压之后得到secretflow-XXXX.tar,然后我导入该镜像后才发现该镜像名称是secretflow/psi-anolis,因此,我是不是可以认定EasyPSI镜像中没有secretflow工程,只有PSI工程?
疑问2:psi-anolis,是否支持ECDH_OPRF算法?
疑问3:如果psi-anolis支持ECDH_OPRF算法,哪里可以得到案例脚本?
疑问4:如果psi-anolis支持ECDH_OPRF算法,我该如何改造前端的React/Java工程,让其可以调用ECDH_OPRF算法?
疑问5:在ECDH_OPRF中,是只对求交列进行求哈希运算,还是对整条记录所有字段进行求哈希运算?

Steps To Reproduce

Expected behavior

Version

0.2.0.dev240123

Operating system

centos7

Hardware Resources

8c16g

SealPIR数据库进行预处理的相关问题

对于算法实现的一些疑问:
对于SealPIR,进行多次查询时 是否 每一次都需要重新对数据库进行预处理?
数据预处理形成的setup文件夹 是否 就是数据库的HE明文数据存储的位置?
当数据量较大时如何进行分桶处理,分桶之后是对每个桶进行一次查询吗?

关于Labeled PSI交互流程的疑问

Issue Type

Others

Source

source

Secretflow Version

1.5.0.dev240319

OS Platform and Distribution

centos 7.9

Python version

3.10

Bazel version

6.5.0

GCC/Compiler version

11.2.1

What happend and What you expected to happen.

老师,我在查看Labeled PSI(APSI)时根据官方的资料,其整个流程要经过4个阶段,分别是:Request Params -> Setup Server DB -> Request OPRF -> Request Query,我又结合《星河杯“黑名单共享查询”赛题基于隐语实现baseline》对keyword PIR的解释,我有如下疑问:
Q1:Request Params要获取什么方面的参数,这些参数在未来阶段起到什么作用?具体参数是felts_per_item,还是hash_func_count、table_size、max_items_per_bin,还是其他内容?
Q2:Setup Server DB是不是负责拆分key-value为查询多项式、插值多项式的阶段?这个阶段需要客户端公钥参与吗?
Q3:为什么会存在Request Query,既然通过OPRF得出了同态计算结果,直接返回给客户端即可,那客户端为什么要再发一次Query?

Reproduction code to reproduce the issue.

sealPir 其他安全强度支持(4096安全参数支持)

此问题发生在尝试使用4096安全强度来进行匿踪查询,经过为期1周的排查,问题终于浮出水面并得以解决。
问题代码:
seal_pir.cc中std::vectorseal::Ciphertext SealPirServer::ExpandQuery函数:

std::vector<seal::Ciphertext> SealPirServer::ExpandQuery(
    const seal::Ciphertext &encrypted, std::uint32_t m) {
  uint64_t plain_mod = seal_params_->plain_modulus().value();

  seal::GaloisKeys &galkey = galois_key_;

  // Assume that m is a power of 2. If not, round it to the next power of 2.
  uint32_t logm = std::ceil(std::log2(m));

  std::vector<int> galois_elts;
  auto n = seal_params_->poly_modulus_degree();
  YACL_ENFORCE(logm <= std::ceil(std::log2(n)), "m > n is not allowed.");

  galois_elts.reserve(std::ceil(std::log2(n)));
  for (size_t i = 0; i < std::ceil(std::log2(n)); i++) {
    galois_elts.push_back((n + seal::util::exponentiate_uint(2, i)) /
                          seal::util::exponentiate_uint(2, i));
  }

  std::vector<seal::Ciphertext> results(1);
  results[0] = encrypted;
  seal::Plaintext tempPt;
  for (size_t j = 0; j < logm - 1; j++) {
    std::vector<seal::Ciphertext> results2(1 << (j + 1));
    int step = 1 << j;
    seal::Plaintext pt0(n);
    seal::Plaintext pt1(n);

    pt0.set_zero();
    pt0[n - step] = plain_mod - 1;
    std::cout << "plain_mods:" << plain_mod << std::endl;
    int index_raw = (n << 1) - (1 << j);  // -2^j
    int index = (index_raw * galois_elts[j]) % (n << 1);
    pt1.set_zero();
    pt1[index] = 1;
    std::cout << "pt0:" << pt0.to_string() << std::endl;
    std::cout << "pt1:" << pt1.to_string() << std::endl;
    // int nstep = -step;
    yacl::parallel_for(0, step, [&](int64_t begin, int64_t end) {
      for (int k = begin; k < end; k++) {
        seal::Ciphertext c0;
        seal::Ciphertext c1;
        seal::Ciphertext t0;
        seal::Ciphertext t1;

        c0 = results[k];

        // SPDLOG_INFO("apply_galois j:{} k:{}", j, k);
        evaluator_->apply_galois(c0, galois_elts[j], galkey,
                                 t0);          // t0 = Sub(c0,N/(2^i)+1)
        evaluator_->add(c0, t0, results2[k]);  // c0 + Sub(c0,N/(2^i)+1)
        // multiply_power_of_X(c0, c1, index_raw);
        evaluator_->multiply_plain(c0, pt0, c1);  // c1 = c0*(-x)^(-2j)
        evaluator_->multiply_plain(t0, pt1, t1);
        // Sub(c0,N/(2^i)+1) * x^(-2j*(N+2^i)/(2^i))=Sub(c1,N/2^j+1)
        evaluator_->add(c1, t1, results2[k + step]);
      }
    });
    results = results2;
  }

  // Last step of the loop
  std::vector<seal::Ciphertext> results2(results.size() << 1);
  seal::Plaintext two("2");

  seal::Plaintext pt0(n);
  seal::Plaintext pt1(n);

  pt0.set_zero();
  pt0[n - results.size()] = plain_mod - 1;

  int index_raw = (n << 1) - (1 << (logm - 1));
  int index = (index_raw * galois_elts[logm - 1]) % (n << 1);
  pt1.set_zero();
  pt1[index] = 1;

  for (uint32_t k = 0; k < results.size(); k++) {
    if (k >= (m - (1 << (logm - 1)))) {  // corner case.
      evaluator_->multiply_plain(results[k], two,
                                 results2[k]);  // plain multiplication by 2.
    } else {
      seal::Ciphertext c0;
      seal::Ciphertext c1;
      seal::Ciphertext t0;
      seal::Ciphertext t1;

      c0 = results[k];
      evaluator_->apply_galois(c0, galois_elts[logm - 1], galkey, t0);
      evaluator_->add(c0, t0, results2[k]);
      // multiply_power_of_X(c0, c1, index_raw);

      evaluator_->multiply_plain(c0, pt0, c1);
      evaluator_->multiply_plain(t0, pt1, t1);
      evaluator_->add(c1, t1, results2[k + results.size()]);
    }
  }

  auto first = results2.begin();
  auto last = results2.begin() + m;
  std::vector<seal::Ciphertext> new_vec(first, last);
  return new_vec;
}

建议修改为:

std::vector<seal::Ciphertext> SealPirServer::ExpandQuery(
    const seal::Ciphertext &encrypted, std::uint32_t m) {


  seal::GaloisKeys &galkey = galois_key_;

  // Assume that m is a power of 2. If not, round it to the next power of 2.
  uint32_t logm = std::ceil(std::log2(m));

  std::vector<int> galois_elts;
  auto n = seal_params_->poly_modulus_degree();
  YACL_ENFORCE(logm <= std::ceil(std::log2(n)), "m > n is not allowed.");

  galois_elts.reserve(std::ceil(std::log2(n)));
  for (size_t i = 0; i < std::ceil(std::log2(n)); i++) {
    galois_elts.push_back((n + seal::util::exponentiate_uint(2, i)) /
                          seal::util::exponentiate_uint(2, i));
  }

  std::vector<seal::Ciphertext> results(1);
  results[0] = encrypted;
  seal::Plaintext tempPt;
  for (size_t j = 0; j < logm - 1; j++) {
    std::vector<seal::Ciphertext> results2(1 << (j + 1));
    int step = 1 << j;

    int index_raw = (n << 1) - (1 << j); 
    int index = (index_raw * galois_elts[j]) % (n << 1);

    // int nstep = -step;
    yacl::parallel_for(0, step, [&](int64_t begin, int64_t end) {
      for (int k = begin; k < end; k++) {
        seal::Ciphertext c0;
        seal::Ciphertext c1;
        seal::Ciphertext t0;
        seal::Ciphertext t1;

        c0 = results[k];
        // SPDLOG_INFO("apply_galois j:{} k:{}", j, k);
        evaluator_->apply_galois(c0, galois_elts[j], galkey,
                                 t0);          
        evaluator_->add(c0, t0, results2[k]);  
        multiply_power_of_X(c0, c1, index_raw);
        multiply_power_of_X(t0, t1, index);

        evaluator_->add(c1, t1, results2[k + step]);
      }
    });
    results = results2;
  }

  // Last step of the loop
  std::vector<seal::Ciphertext> results2(results.size() << 1);
  seal::Plaintext two("2");

  seal::Plaintext pt0(n);
  seal::Plaintext pt1(n);

  int index_raw = (n << 1) - (1 << (logm - 1));
  int index = (index_raw * galois_elts[logm - 1]) % (n << 1);


  for (uint32_t k = 0; k < results.size(); k++) {
    if (k >= (m - (1 << (logm - 1)))) {  // corner case.
      evaluator_->multiply_plain(results[k], two,
                                 results2[k]);  // plain multiplication by 2.
    } else {
      seal::Ciphertext c0;
      seal::Ciphertext c1;
      seal::Ciphertext t0;
      seal::Ciphertext t1;

      c0 = results[k];
      evaluator_->apply_galois(c0, galois_elts[logm - 1], galkey, t0);
      evaluator_->add(c0, t0, results2[k]);


      multiply_power_of_X(c0, c1, index_raw);
      multiply_power_of_X(t0, t1, index);
      evaluator_->add(c1, t1, results2[k + results.size()]);
    }
  }

  auto first = results2.begin();
  auto last = results2.begin() + m;
  std::vector<seal::Ciphertext> new_vec(first, last);
  return new_vec;
}

void SealPirServer::multiply_power_of_X(const seal::Ciphertext &encrypted,
                                        seal::Ciphertext &destination,
                                        uint32_t index) {
  auto coeff_mod_count = seal_params_->coeff_modulus().size() - 1;
  auto coeff_count = seal_params_->poly_modulus_degree();
  auto encrypted_count = encrypted.size();

  destination = encrypted;
  for (size_t i = 0; i < encrypted_count; i++) {
    for (size_t j = 0; j < coeff_mod_count; j++) {
      seal::util::negacyclic_shift_poly_coeffmod(
          encrypted.data(i) + (j * coeff_count), coeff_count, index,
          seal_params_->coeff_modulus()[j],
          destination.data(i) + (j * coeff_count));
    }
  }
}

主要原因是,multiply_plain会严重损耗seal密态计算的噪音,但是negacyclic_shift_poly_coeffmod不会导致噪音增大,并且在乘x^n时该函数具有更快的计算速度。

README PSI Quick Start with v2 API 报错

报错如下
(base) root@k8s-node1:/approot1/secretflow-learn/quikstart# docker run -it --rm --network host --mount type=bind,source=/tmp/receiver,target=/root/receiver -w /root --cap-a dd=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:latest bash -c "./ main --config receiver/receiver.config"
[2024-01-25 20:15:45.540] [info] [main.cc:44] SecretFlow PSI Library v0.2.0.dev240123 Copyright 2023 Ant Group Co., Ltd.
terminate called after throwing an instance of 'yacl::EnforceNotMet'
what(): [Enforce fail at psi/main.cc:70] status.ok(). Launch config JSON string couldn't be parsed: {
"psi_config": {
"protocol_config": {
"protocol": "PROTOCOL_KKRT",
"role": "ROLE_RECEIVER",
"broadcast_result": true
},
"input_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/root/receiver/receiver_input.csv"
},
"output_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/root/receiver/receiver_output.csv"
},
"keys": [
"id0",
"id1"
],
"debug_options": {
"trace_path": "/root/receiver/receiver.trace"
},
"link_config": {
"parties": [
{
"id": "receiver",
"host": "127.0.0.1:5300"
},
{
"id": "sender",
"host": "127.0.0.1:5400"
}
]
}
},
"self_link_party": "receiver"
}
Stacktrace:
#0 __libc_start_main+0x7f503283acf3

[Bug]: apsi_test测试时,若接收方的数据量达到某一阈值程序会卡在某处,不执行也不终止

Describe the bug

首次在本地16G的mac上,执行项目中的psi/apsi/apsi_test.cc代码无误,但当我将TestParams{100, 100000, false}修改为TestParams{10000, 100000, false}后,程序会卡住,日志如下:

[2024-07-12 07:46:20.569] [info] [ecdh_oprf_selector.cc:33] use fourq
[2024-07-12 07:46:20.697] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.697] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7
[2024-07-12 07:46:20.719] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.721] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.727] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.728] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.729] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.750] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.753] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.758] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.758] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.759] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.781] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.784] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.788] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.789] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.790] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.811] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.814] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.818] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.819] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.820] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.842] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.845] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.849] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.849] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.850] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.874] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.877] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.882] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.883] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.884] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.907] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.910] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.915] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.915] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.916] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.942] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.945] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.947] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.948] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.949] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:20.971] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:20.974] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:20.978] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:20.978] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:20.979] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.001] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.004] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.006] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.007] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.008] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.029] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.032] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.034] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.034] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.036] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.055] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.058] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.060] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.061] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.062] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.117] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.119] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.121] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.122] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.123] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.144] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.147] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.149] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.149] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.151] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.172] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.174] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.178] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.178] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.179] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.199] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.202] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.205] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.205] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.207] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.230] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.233] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.236] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.237] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.238] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.263] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.266] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.268] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.269] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.271] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.296] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.299] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.302] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.302] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.304] [info] [sender_memdb.cc:696] Start inserting 5000 items in SenderMemDB
[2024-07-12 07:46:21.331] [info] [sender_memdb.cc:730] Found 5000 new items to insert in SenderDB
[2024-07-12 07:46:21.334] [info] [sender_memdb.cc:335] Launching 8 insert-or-assign worker tasks
[2024-07-12 07:46:21.337] [info] [sender_memdb.cc:351] Finished insert-or-assign worker tasks
[2024-07-12 07:46:21.337] [info] [sender_memdb.cc:748] Finished inserting 5000 items in SenderDB
[2024-07-12 07:46:21.338] [info] [sender_memdb.cc:945] count: 20, item_count:100000
[2024-07-12 07:46:21.338] [info] [sender_memdb.cc:526] Start generating bin bundle caches

到上面最后一条日志就卡住了,此时本机内存情况也是正常的。

然后我更换了一台内存更大的电脑,TestParams{10000, 100000, false}能够顺利执行,但TestParams{100000, 100000, false}仍然会出现上述问题。此外,卡住之后不会大量占用内存和CPU。

Steps To Reproduce

运行命令 bazel run -c opt //psi/apsi:apsi_test

Expected behavior

期望修复该问题,接收方10000的查询数据,应该也不是很大。

Version

v0.4.0.dev240524

Operating system

macbook m1 16G

Hardware Resources

8c16g

执行build as release报错问题?

centos执行build as release报错
最新branchtag:v0.4.0.dev240401在执行bazel build //... -c opt报错问题
image
examples/pir/BUILD.bazel 缺少配置:linkopts = ["-ldl"],添加后build通过
generate_pir_data-2.params文件第二行显示 -fuse-ld=gold

但是拉起secretflow/ubuntu-base-ci:latest,执行build as release正常
generate_pir_data-2.params文件第二行显示 -fuse-ld=lld
是否与该链接器的选择有关?

同时dev docker命令好像有误
CapAdd and privileged are mutually exclusive options

关于psi_df函数的输入数据问题

Describe the bug

两台机器用 psi_df进行PSI时,疑问:怎么将对方PYUobject数据给传送过来;或者不传送,数据仍在对方那里,我这边程序中怎么找到该数据。(psi_csv是通过配置对方数据的路径,不明白psi_df该怎么传送或设置?)。

Steps To Reproduce

这是在本地执行的:


sf.shutdown()
sf.init(['alice','bob','carol'],address='local')

conn = pymysql.connect(host='10.3.0.12',port=3306,user='root',passwd='root',database='prac',charset='utf8',use_unicode=True)
sql_1 = 'select * from alice'
sql_2 = 'select * from bob'
sql_3 = 'select * from carol'

da = pd.read_sql(sql_1,conn).sample(frac=0.9)
db = pd.read_sql(sql_2,conn).sample(frac=0.8)
dc = pd.read_sql(sql_3,conn).sample(frac=0.7)

a_obj_ref = ray.put(da)
b_obj_ref = ray.put(db)
c_obj_ref = ray.put(dc)

alice,bob,carol = sf.PYUObject(sf.PYU('alice'),a_obj_ref),sf.PYUObject(sf.PYU('bob'),b_obj_ref),sf.PYUObject(sf.PYU('carol'),c_obj_ref)


spu_3pc = sf.SPU(sf.utils.testing.cluster_def(['alice','bob','carol']))

psi_3pc = spu_3pc.psi_df(['uid','month'],[alice,bob,carol],'alice',protocol='ECDH_PSI_3PC')
for d in psi_3pc:
    print(d)
    print(d.device)
    print(d.data)
    print(ray.get(d.data))
    print(type(ray.get(d.data)))
#    print(type(d.device))
#    sf.PYU(d.device).dump(obj=d,path='./output.csv')
print(type(psi_3pc[0]))
print(type(psi_3pc[0].device))

Expected behavior

本地执行的话可以直接在程序中把输入PYUobject创建出来,
alice,bob,carol = sf.PYUObject(sf.PYU('alice'),a_obj_ref),sf.PYUObject(sf.PYU('bob'),b_obj_ref),sf.PYUObject(sf.PYU('carol'),c_obj_ref)
疑问:如果两台机器来执行,我怎么把这个PYUobject传过来?

Version

Secretflow 1.6.1b0

Operating system

centos 7 x64

Hardware Resources

8C80G

build as release编译报错问题

centos执行build as release报错
最新branchtag:v0.4.0.dev240401在执行bazel build //... -c opt报错问题
image
examples/pir/BUILD.bazel 缺少配置:linkopts = ["-ldl"],添加后build通过

但是拉起secretflow/ubuntu-base-ci:latest,直接执行bazel build //... -c opt可以正常结束
请问该镜像中有什么关键配置是影响编译链接选项-ldl

同时dev docker命令好像有误
CapAdd and privileged are mutually exclusive options

[FNP04]选择并在 SecretFlow 中实现多方 PSI 协议

此 ISSUE 为 隐语开源共建计划(SecretFlow Open Source Contribution Plan,简称 SF OSCP)第二期任务 #12 任务拆分任务,欢迎社区开发者参与共建~
若有感兴趣想要认领的任务,但还未报名,辛苦先完成报名进行哈~

设计方案

by zhangwfjh

  • scheme: FNP04

任务介绍

  • 任务名称:选择并在 SecretFlow 中实现多方 PSI 协议
  • 技术方向:PSI
  • 任务难度:挑战🌟🌟🌟

详细要求

请选择并实现隐语的多方 PSI 协议,具体要求如下:

  • 功能性:至少一个 大于 2 个参与方的半诚实 PSI 方案
  • 安全性(尽量少 reveal)
  • 代码规范:C++ 代码需要使用 Google C++ 代码规范进行格式化(流水线包含代码规范检查卡点)
  • 提交说明:关联该 issue 并提交代码至 https://github.com/secretflow/spu/tree/main/libspu/psi/core

能力要求

  • 熟悉 PSI 原理
  • 了解多方 PSI 近年来的进展
  • 熟悉隐语 PSI 接口

操作说明

[Bug]: libspu 测试过程中,会偶发出现错误。

Issue Type

Build/Install

Modules Involved

SPU runtime, PSI

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

version = "0.9.1.dev$$DATE$$"

OS Platform and Distribution

Linux version 5.4.0-169-generic (buildd@lcy02-amd64-102) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) secretflow/spu#187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023

Python Version

python3.8

Compiler Version

gcc 版本 11.2.0 (GCC)

Current Behavior?

问题描述

  • alice.py 和 bob.py 手动执行出现以下错误。

错误日志

  • alice
----------test_ecdh_2pc-------------
rank = 0
rank = 1
[2024-07-11 10:58:51.503] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"ECDH_PSI_2PC","broadcast_result":true,"input_params":{"path":"./data/alice.csv","select_fields":["id","idx"]},"output_params":{"path":"./alice-kkrt.csv","need_sort":true},"curve_type":"CURVE_25519"}
[2024-07-11 10:58:51.503] [info] [bucket_psi.cc:400] bucket size set to 1048576
[2024-07-11 10:58:51.507] [info] [bucket_psi.cc:252] Begin sanity check for input file: ./data/alice.csv, precheck_switch:false
Process Process-1:
Traceback (most recent call last):
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/evan/src/test/spu/alice.py", line 68, in wrap
    report = psi.bucket_psi(lctx, config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/evan/src/test/spu/psi.py", line 69, in bucket_psi
    report_str = libpsi.libs.bucket_psi(
                 ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: what: 
	[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=abc-0:1:ALLGATHER
stacktrace: 
#0 yacl::link::Context::RecvInternal()+0x7fa54addd7a7
secretflow/spu#1 yacl::link::AllGatherImpl<>()+0x7fa54add8671
secretflow/spu#2 yacl::link::AllGather()+0x7fa54add89b3
secretflow/spu#3 psi::SyncWait<>()+0x7fa5499284e8
secretflow/spu#4 psi::CheckInput()+0x7fa549a89883
secretflow/spu#5 psi::BucketPsi::Run()+0x7fa549a8e34b
secretflow/spu#6 psi::RunLegacyPsi()+0x7fa54991ce9d
secretflow/spu#7 psi::BindLibs()::{lambda()#2}::operator()()+0x7fa549912a3f
secretflow/spu#8 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7fa549912f3f
secretflow/spu#9 pybind11::cpp_function::dispatcher()+0x7fa5498f5724
secretflow/spu#10 cfunction_call+0x525d17


2024-07-11 10:59:21.551 [warning] [channel.h:~Channel:163] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
  • bob
----------test_kkrt_2pc-------------
rank = 0
rank = 1
I0711 10:58:47.467434 905368 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=20223.
W0711 10:58:47.467641 905368 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
I0711 10:58:47.570077 905377 external/com_github_brpc_brpc/src/brpc/socket.cpp:2506] Checking Socket{id=0 addr=127.0.0.1:20222} (0x278a0c0)
I0711 10:58:50.572697 905379 external/com_github_brpc_brpc/src/brpc/socket.cpp:2566] Revived Socket{id=0 addr=127.0.0.1:20222} (0x278a0c0) (Connectable)
[2024-07-11 10:58:51.506] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"ECDH_PSI_2PC","broadcast_result":true,"input_params":{"path":"./data/bob.csv","select_fields":["id","idx"]},"output_params":{"path":"./bob-kkrt.csv","need_sort":true},"curve_type":"CURVE_25519"}
[2024-07-11 10:58:51.506] [info] [bucket_psi.cc:400] bucket size set to 1048576
Fatal Python error: Aborted

Current thread 0x00007f4ebf735280 (most recent call first):
  File "/home/evan/src/test/spu/psi.py", line 69 in bucket_psi
  File "/home/evan/src/test/spu/bob.py", line 58 in wrap
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/process.py", line 108 in run
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/process.py", line 314 in _bootstrap
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/popen_fork.py", line 71 in _launch
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/popen_fork.py", line 19 in __init__
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/context.py", line 281 in _Popen
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/context.py", line 224 in _Popen
  File "/home/evan/miniconda3/lib/python3.11/site-packages/multiprocess/process.py", line 121 in start
  File "/home/evan/src/test/spu/bob.py", line 79 in run_streaming_psi
  File "/home/evan/src/test/spu/bob.py", line 112 in test_ecdh_2pc
  File "/home/evan/src/test/spu/bob.py", line 141 in run_psi
  File "/home/evan/miniconda3/lib/python3.11/site-packages/absl/app.py", line 254 in _run_main
  File "/home/evan/miniconda3/lib/python3.11/site-packages/absl/app.py", line 308 in run
  File "/home/evan/src/test/spu/bob.py", line 147 in <module>

Extension modules: google.protobuf.pyext._message, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator (total: 14)

代码部分

  • alice.py

import time
import unittest

import multiprocess

from absl import app, flags

import libspu.link as link
import libspu.logging as logging
import psi as psi
from utils import get_free_port, wc_count
# from spu.utils.simulation import PropagatingThread


class Test:

    def run_streaming_psi(self, wsize, self_rank, link_id, party_ids, addrs, inputs, outputs, selected_fields, protocol):
        time_stamp = time.time()
        lctx_desc = link.Desc()
        lctx_desc.id = link_id
        lctx_desc.recv_timeout_ms = 30*1000

        log_options = logging.LogOptions()
        log_options.log_level = logging.LogLevel.DEBUG
        # log_options.enable_console_logger = False
        log_options.system_log_path = "./alice.log"
        # log_options.trace_log_path = "./alice_trace.log"
        
        logging.setup_logging(log_options)


        for rank in range(wsize):
            print(f"rank = {rank}")
            lctx_desc.add_party(party_ids[rank], addrs[rank])

        def wrap(rank, selected_fields, input_path, output_path, type):
            lctx = link.create_brpc(lctx_desc, rank)

            config = psi.BucketPsiConfig(
                psi_type=type,
                broadcast_result=True,
                input_params=psi.InputParams(
                    path=input_path, select_fields=selected_fields
                ),
                output_params=psi.OutputParams(path=output_path, need_sort=True),
                curve_type=psi.CurveType.CURVE_25519,
            )

            if type == psi.PsiType.DP_PSI_2PC:
                config.dppsi_params.bob_sub_sampling = 0.9
                config.dppsi_params.epsilon = 3

            report = psi.bucket_psi(lctx, config)

            source_count = wc_count(input_path)
            output_count = wc_count(output_path)
            print(
                f"id:{lctx.id()}, psi_type: {type}, original_count: {report.original_count}, intersection_count: {report.intersection_count}, source_count: {source_count}, output_count: {output_count}"
            )

            lctx.stop_link()

        # launch with multiprocess
        job = multiprocess.Process(
                target=wrap,
                args=(
                    self_rank,
                    selected_fields,
                    inputs[self_rank],
                    outputs[self_rank],
                    protocol,
                ),
            )
        job.start()
        job.join()

    def test_kkrt_2pc(self):
        print("----------test_kkrt_2pc-------------")

        wsize = 2
        self_rank = 0
        link_id = "abc"
        inputs = ["./data/alice.csv", "./data/bob.csv"]
        outputs = ["./alice-kkrt.csv", "./bob-kkrt.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}"]

        self.run_streaming_psi(
            wsize, self_rank, link_id,party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.KKRT_PSI_2PC
        )

    def test_ecdh_2pc(self):
        print("----------test_ecdh_2pc-------------")

        wsize = 2
        self_rank = 0
        link_id = "abc"
        inputs = ["./data/alice.csv", "./data/bob.csv"]
        outputs = ["./alice-kkrt.csv", "./bob-kkrt.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}"]

        self.run_streaming_psi(
            wsize, self_rank, link_id,party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.ECDH_PSI_2PC
        )

    def test_ecdh_2pc(self):
        print("----------test_ecdh_2pc-------------")

        wsize = 2
        self_rank = 0
        link_id = "abc"
        inputs = ["./data/alice.csv", "./data/bob.csv"]
        outputs = ["./alice-kkrt.csv", "./bob-kkrt.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}"]

        self.run_streaming_psi(
            wsize, self_rank, link_id,party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.ECDH_PSI_2PC
        )

    def test_ecdh_3pc(self):
        print("----------test_ecdh_3pc-------------")

        wsize = 3
        self_rank = 0
        link_id = "abc"
        inputs = [
            "./data/alice.csv",
            "./data/bob.csv",
            "./data/carol.csv",
        ]
        outputs = ["./alice-ecdh3pc.csv", "./bob-ecdh3pc.csv", "./carol-ecdh3pc.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000","9998"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}",f"127.0.0.1:{20224}"]
        # addrs = [f"127.0.0.1:{30222}",f"127.0.0.1:{30223}",f"127.0.0.1:{30224}"]

        self.run_streaming_psi(
            wsize, self_rank, link_id,party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.ECDH_PSI_3PC
        )

def run_psi(_):
    t= Test()
    # t.test_dppsi_2pc()
    t.test_ecdh_2pc()
    # t.test_ecdh_3pc()
    # t.test_kkrt_2pc()
    

if __name__ == '__main__':
    app.run(run_psi)
  • bob.py
import time
import unittest

import multiprocess

from absl import app, flags

import libspu.link as link
import psi as psi
from utils import get_free_port, wc_count
# from spu.utils.simulation import PropagatingThread


class Test:

    def run_streaming_psi(self, wsize, self_rank, link_id,party_ids, addrs, inputs, outputs, selected_fields, protocol):
        time_stamp = time.time()
        lctx_desc = link.Desc()
        lctx_desc.id = link_id
        lctx_desc.recv_timeout_ms = 30*1000

        for rank in range(wsize):
            print(f"rank = {rank}")
            lctx_desc.add_party(party_ids[rank], addrs[rank])

        def wrap(rank, selected_fields, input_path, output_path, type):
            lctx = link.create_brpc(lctx_desc, rank)

            config = psi.BucketPsiConfig(
                psi_type=type,
                broadcast_result=True,
                input_params=psi.InputParams(
                    path=input_path, select_fields=selected_fields
                ),
                output_params=psi.OutputParams(path=output_path, need_sort=True),
                curve_type=psi.CurveType.CURVE_25519,
            )

            if type == psi.PsiType.DP_PSI_2PC:
                config.dppsi_params.bob_sub_sampling = 0.9
                config.dppsi_params.epsilon = 3

            report = psi.bucket_psi(lctx, config)

            source_count = wc_count(input_path)
            output_count = wc_count(output_path)
            print(
                f"id:{lctx.id()}, psi_type: {type}, original_count: {report.original_count}, intersection_count: {report.intersection_count}, source_count: {source_count}, output_count: {output_count}"
            )

            lctx.stop_link()

        # launch with multiprocess
        job = multiprocess.Process(
                target=wrap,
                args=(
                    self_rank,
                    selected_fields,
                    inputs[self_rank],
                    outputs[self_rank],
                    protocol,
                ),
            )
        job.start()
        job.join()

    def test_kkrt_2pc(self):
        print("----------test_kkrt_2pc-------------")

        wsize = 2
        self_rank = 1
        link_id = "abc"
        inputs = ["./data/alice.csv", "./data/bob.csv"]
        outputs = ["./alice-kkrt.csv", "./bob-kkrt.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}"]

        self.run_streaming_psi(
            wsize,self_rank, link_id, party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.KKRT_PSI_2PC
        )

    def test_ecdh_2pc(self):
        print("----------test_kkrt_2pc-------------")

        wsize = 2
        self_rank = 1
        link_id = "abc"
        inputs = ["./data/alice.csv", "./data/bob.csv"]
        outputs = ["./alice-kkrt.csv", "./bob-kkrt.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}"]

        self.run_streaming_psi(
            wsize,self_rank, link_id, party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.ECDH_PSI_2PC
        )
    def test_ecdh_3pc(self):
        print("----------test_ecdh_3pc-------------")

        wsize = 3
        self_rank = 1
        link_id = "abc"
        inputs = [
            "./data/alice.csv",
            "./data/bob.csv",
            "./data/carol.csv",
        ]
        outputs = ["./alice-ecdh3pc.csv", "./bob-ecdh3pc.csv", "./carol-ecdh3pc.csv"]
        selected_fields = ["id", "idx"]

        party_ids = ["9999","10000","9998"]
        addrs = [f"127.0.0.1:{20222}",f"127.0.0.1:{20223}",f"127.0.0.1:{20224}"]
        # addrs = [f"127.0.0.1:{30222}",f"127.0.0.1:{30223}",f"127.0.0.1:{30224}"]

        self.run_streaming_psi(
            wsize, self_rank, link_id,party_ids,addrs, inputs, outputs, selected_fields, psi.PsiType.ECDH_PSI_3PC
        )


def run_psi(_):
    t= Test()
    # t.test_dppsi_2pc()
    t.test_ecdh_2pc()
    # t.test_ecdh_3pc()
    # t.test_kkrt_2pc()
    

if __name__ == '__main__':
    app.run(run_psi)

Standalone code to reproduce the issue

见问题描述

Relevant log output

见问题描述

关于使用python运行和secretnote分段运行有明显时间差别的问题

您好,打扰您了。
我在benchmark labeled PSI时发现直接使用python运行和使用secretnote分段运行有明显时间差别。我想请教一下是什么因素导致了这样的差别。
我目前在尝试的是以下的代码。
server方:

import secretflow as sf
import spu
import time

cluster_config = {
    'parties' : {
        'alice': {
            'address': '127.0.0.1:59179',
            'listen_addr': '0.0.0.0:59179'
        },
        'bob': {
            'address': '127.0.0.1:53341',
            'listen_addr': '0.0.0.0:53341'
        }
    },
    'self_party': 'alice'
}
sf.shutdown
sf.init(address='local', cluster_config=cluster_config)
cluster_def = {
    "nodes": [
        {
            "party": "alice",
            "address": "127.0.0.1:45413"
        },
        {
            "party": "bob",
            "address": "127.0.0.1:47480"
        },
    ],
    "runtime_config": {
        "protocol": spu.spu_pb2.SEMI2K,
        "field": spu.spu_pb2.FM128
    },
}

spu = sf.SPU(
    cluster_def,
    link_desc={
        "connect_retry_times": 60,
        "connect_retry_interval_ms": 1000,
    }
)

start_time = time.time()
spu.pir_query(
    server="alice",
    client="bob",
    server_setup_path="./alice_exactpsi_setup_1e6_npq20",
    client_key_columns=["name"],
    client_input_path="./bob_exactpsi_1e6_to_1e2.csv",
    client_output_path="./bob_exactpsi_1e6_to_1e2_output_test.csv"
)
end_time = time.time()
elapsed_time = end_time - start_time
print("The function took", elapsed_time, "seconds to run.")

对于时间的输出是
The function took 12.396417617797852 seconds to run.
client方:

import secretflow as sf
import spu
import time

cluster_config = {
    'parties' : {
        'alice': {
            'address': '127.0.0.1:59179',
            'listen_addr': '0.0.0.0:59179'
        },
        'bob': {
            'address': '127.0.0.1:53341',
            'listen_addr': '0.0.0.0:53341'
        }
    },
    'self_party': 'bob'
}
sf.shutdown
sf.init(address='local', cluster_config=cluster_config)
cluster_def = {
    "nodes": [
        {
            "party": "alice",
            "address": "127.0.0.1:45413"
        },
        {
            "party": "bob",
            "address": "127.0.0.1:47480"
        },
    ],
    "runtime_config": {
        "protocol": spu.spu_pb2.SEMI2K,
        "field": spu.spu_pb2.FM128
    },
}

spu = sf.SPU(
    cluster_def,
    link_desc={
        "connect_retry_times": 60,
        "connect_retry_interval_ms": 1000,
    }
)

start_time = time.time()
spu.pir_query(
    server="alice",
    client="bob",
    server_setup_path="./alice_exactpsi_setup_1e6_npq20",
    client_key_columns=["name"],
    client_input_path="./bob_exactpsi_1e6_to_1e2.csv",
    client_output_path="./bob_exactpsi_1e6_to_1e2_output_test.csv"
)
end_time = time.time()
elapsed_time = end_time - start_time
print("The function took", elapsed_time, "seconds to run.")

对于时间的输出是
The function took 14.504458665847778 seconds to run.

在secrenote上的运行参考https://www.bilibili.com/video/BV13M4m1R7gp 上面的教程,将运行分为三个部分,连接sf, 连接spu, 运行spu.pir_query。基本上运行spu.pir_query这一部分在10s左右。另外我也尝试了,将spu.pir_query这个部分放在两个代码块中分两个party运行,这种尝试下依然是10s,而且两边的时间差不多长(并没有12s和14s这样大的差别)。
还有一种尝试,就是在secretnote将整个代码放入代码块运行,这样的表现和直接运行.py文件一样,所以造成差别的因素因该主要是分段式运行。
对于这个差异我有两个问题:

  1. 为什么分段式运行代码的花费时间更小?有没有方法使单单调用pir_query的时间开销更小?
  2. 似乎receiver(clinet)总是比sender(server)的耗时更久,这里面是什么原因?

非常期待您的回答!感谢您的帮助!

[Bug]: 在linux机器下,将bucket_psi打包成.so库时出错

Describe the bug

我希望将bucket_psi打包成动态链接库给go使用,原始的bazel打包参数写为:

psi_cc_library(
    name = "bucket_psi",
    srcs = ["bucket_psi.cc"],
    hdrs = [
        "bucket_psi.h",
    ],
    deps = [
        ":bucket_ub_psi",
        ":memory_psi",
        "//psi:prelude",
        "//psi/proto:psi_cc_proto",
        "//psi/utils:batch_provider",
        "//psi/utils:csv_checker",
        "//psi/utils:csv_header_analyzer",
        "//psi/utils:ec_point_store",
        "@boost//:uuid",
    ],
)

,通过这个打包出来的lib貌似东西不全,打出来的libbucket_psi.so才600kb。于是我将其修改为:

psi_cc_binary(
    name = "bucket_psi",
    srcs = ["bucket_psi.cc","bucket_psi.h"],
    deps = [
        ":bucket_ub_psi",
        ":memory_psi",
        "//psi:prelude",
        "//psi/proto:psi_cc_proto",
        "//psi/utils:batch_provider",
        "//psi/utils:csv_checker",
        "//psi/utils:csv_header_analyzer",
        "//psi/utils:ec_point_store",
        "@boost//:uuid",
    ],linkshared = True,
)

我先在mac上执行bazel build -c opt //psi/legacy:bucket_psi完成了打包,打包出来的libbucket_psi.dylib有45MB,证实可用。
但是我在ubuntu的机器下执行bazel build -c opt //psi/legacy:bucket_psi,打包时报错:

(base) root@f25287b5fd26:/home/admin/dev# bazel build -c opt //psi/legacy:bucket_psi
WARNING: Download from https://golang.org/dl/?mode=json&include=all failed: class java.io.IOException connect timed out
INFO: Analyzed target //psi/legacy:bucket_psi (1 packages loaded, 33 targets configured).
INFO: Found 1 target...
ERROR: /home/admin/dev/psi/legacy/BUILD.bazel:186:14: Linking psi/legacy/libbucket_psi.so failed: (Exit 1): gcc failed: error executing command (from target //psi/legacy:bucket_psi) /usr/bin/gcc @bazel-out/k8-opt/bin/psi/legacy/libbucket_psi.so-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'scale19'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x12)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'two4x'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x1A)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha22'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x461)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha107'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x47C)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha192'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x497)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha43'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x4B3)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha128'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x4CE)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha213'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x4EA)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha64'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x505)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha149'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x520)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha234'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x53C)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha85'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x558)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha170'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x573)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha255'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x58F)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha22'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x5AD)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha107'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x5C5)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha192'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_square.pic.o:(.text+0x5DD)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'scale19'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_mul.pic.o:(.text+0x12)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha22'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_mul.pic.o:(.text+0x71E)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'alpha107'; recompile with -fPIC
>>> defined in bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x.pic.o
>>> referenced by bazel-out/k8-opt/bin/external/simplest_ot/_objs/simplest_ot_x86_asm/gfe4x_mul.pic.o:(.text+0x739)
d.lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)
collect2: error: ld returned 1 exit status
Target //psi/legacy:bucket_psi failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 30.277s, Critical Path: 29.46s
INFO: 4 processes: 3 internal, 1 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

Steps To Reproduce

psi_cc_library打出来不全,修改为psi_cc_binary后打包报错

Expected behavior

期望linux打包的lib文件和mac打包出来的一样可以使用

Version

0.4.0

Operating system

ubuntu20.04 x86

Hardware Resources

8c32g

Config报错

a2031824e00c561a28bc422991f3a455
请问这个报错是什么问题

配置文件如下:
ee2490e2074b4d3ee3b29b4589f43ce4

[Bug]: 生产模式下psi时卡住

Describe the bug

生产模式下psi时卡住 ,程序卡在这里,不继续运行也不报错:
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.581] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7

Steps To Reproduce

生产模式下运行
psi参数 :
reports = spu.psi_csv(
key=select_keys,
input_path=input_path,
output_path=output_path,
receiver='alice',
protocol='KKRT_PSI_2PC',
precheck_input= False,
sort=False,
broadcast_result=False,
)

运行日志:
[root@localhost psi]# python bob_testdata.py
2024-06-26 03:17:18,353 INFO worker.py:1540 -- Connecting to existing Ray cluster at address: 192.168.11.130:7751...
2024-06-26 03:17:18,367 INFO worker.py:1724 -- Connected to Ray cluster.
2024-06-26 03:17:18.396 INFO api.py:233 [bob] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': '192.168.11.131:10420', 'bob': '0.0.0.0:10430'}, 'CURRENT_PARTY_NAME': 'bob', 'TLS_CONFIG': {}}
2024-06-26 03:17:19.196 INFO barriers.py:284 [bob] -- [Anonymous_job] Succeeded to create receiver proxy actor.
(ReceiverProxyActor pid=780) 2024-06-26 03:17:19.186 INFO grpc_proxy.py:359 [bob] -- [Anonymous_job] ReceiverProxy binding port 10430, options: (('grpc.enable_retries', 1), ('grpc.so_reuseport', 0), ('grpc.max_send_message_length', 524288000), ('grpc.max_receive_message_length', 524288000), ('grpc.service_config', '{"methodConfig": [{"name": [{"service": "GrpcService"}], "retryPolicy": {"maxAttempts": 5, "initialBackoff": "5s", "maxBackoff": "30s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'))...
(ReceiverProxyActor pid=780) 2024-06-26 03:17:19.194 INFO grpc_proxy.py:379 [bob] -- [Anonymous_job] Successfully start Grpc service without credentials.
2024-06-26 03:17:19.960 INFO barriers.py:333 [bob] -- [Anonymous_job] SenderProxyActor has successfully created.
2024-06-26 03:17:19.961 INFO barriers.py:520 [bob] -- [Anonymous_job] Try ping ['alice'] at 0 attemp, up to 3600 attemps.
/root/psi/bob_testdata.py:68: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
data = pd.read_sql(sql,conn).sample(frac=sample_param)
psi start_time: 1719371840.0284116
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.552] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"KKRT_PSI_2PC","receiver_rank":1,"input_params":{"path":"./data/psi_input_bob_test.csv","select_fields":["id"]},"output_params":{"path":"./data/psi_output_test.csv"},"curve_type":"CURVE_25519","bucket_size":1048576}
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.553] [info] [bucket_psi.cc:400] bucket size set to 1048576
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.560] [info] [bucket_psi.cc:252] Begin sanity check for input file: ./data/psi_input_bob_test.csv, precheck_switch:false
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.568] [info] [bucket_psi.cc:265] End sanity check for input file: ./data/psi_input_bob_test.csv, size=2
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.570] [info] [bucket_psi.cc:425] Run psi protocol=2, self_items_count=2
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.572] [info] [bucket_psi.cc:514] psi protocol=2, rank=0 item_size=4
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.572] [info] [bucket_psi.cc:514] psi protocol=2, rank=1 item_size=2
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.572] [info] [bucket_psi.cc:539] psi protocol=2, bucket_count=1
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.575] [info] [arrow_csv_batch_provider.cc:75] Reach the end of csv file ./data/psi_input_bob_test.csv.
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.575] [info] [arrow_csv_batch_provider.cc:75] Reach the end of csv file ./data/psi_input_bob_test.csv.
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.575] [info] [bucket_psi.cc:551] run psi bucket_idx=0, bucket_item_size=2
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.581] [info] [memory_psi.cc:68] psi protocol=2, rank=0, inputs_size=4
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.581] [info] [memory_psi.cc:68] psi protocol=2, rank=1, inputs_size=2
(SPURuntime(device_id=None, party=bob) pid=876) [2024-06-26 03:17:24.581] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7

Expected behavior

怀疑是数据问题,用了几条数据进行测试,还是卡在这里,想请教下是什么问题?

Version

Secretflow 1.6.1b0

Operating system

centos 7 x64

Hardware Resources

8C80G

[BUG]在3000万数据进行setup的时候报错

Describe the bug
在对3000万数据进行setup的时候报错

2024-04-23 08:16:21.587 [info] [sender_kvdb.cc:InsertOrAssign:728] OPRF FullEvaluate and EncryptLabel Last batch count: 2, item_count:1000000
2024-04-23 08:16:21.590 [info] [sender_kvdb.cc:DispatchInsertOrAssign:517] Launching 1 insert-or-assign worker tasks
2024-04-23 08:16:35.774 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:0
2024-04-23 08:16:36.406 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:1
2024-04-23 08:16:36.880 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:2
2024-04-23 08:16:37.415 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:3
2024-04-23 08:16:38.189 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:4
2024-04-23 08:16:38.881 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:5
2024-04-23 08:16:39.464 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:6
2024-04-23 08:16:40.027 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:7
2024-04-23 08:16:40.573 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:8
2024-04-23 08:16:41.169 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:9
2024-04-23 08:16:41.890 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:10
2024-04-23 08:16:42.621 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:11
2024-04-23 08:16:43.269 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:12
2024-04-23 08:16:43.919 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:13
2024-04-23 08:16:44.596 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:14
2024-04-23 08:16:45.287 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:15
2024-04-23 08:16:45.963 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:16
2024-04-23 08:16:46.668 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:17
2024-04-23 08:16:47.281 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:18
2024-04-23 08:16:47.886 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:19
2024-04-23 08:16:48.537 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:20
2024-04-23 08:16:49.147 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:21
2024-04-23 08:16:49.713 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:22
2024-04-23 08:16:50.116 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:23
2024-04-23 08:16:50.744 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:24
2024-04-23 08:16:51.390 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:25
2024-04-23 08:16:51.984 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:26
2024-04-23 08:16:52.550 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:27
2024-04-23 08:16:53.149 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:28
2024-04-23 08:16:53.719 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:29
2024-04-23 08:16:54.331 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:30
2024-04-23 08:16:55.011 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:31
2024-04-23 08:16:55.600 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:32
2024-04-23 08:16:56.204 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:33
2024-04-23 08:16:56.807 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:34
2024-04-23 08:16:57.329 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:35
2024-04-23 08:16:57.859 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:36
2024-04-23 08:16:58.434 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:37
2024-04-23 08:16:59.026 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:38
2024-04-23 08:16:59.617 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:39
2024-04-23 08:17:00.172 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:40
2024-04-23 08:17:00.730 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:41
2024-04-23 08:17:01.334 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:42
2024-04-23 08:17:01.887 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:43
2024-04-23 08:17:02.535 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:44
2024-04-23 08:17:03.171 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:45
2024-04-23 08:17:03.812 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:46
2024-04-23 08:17:04.427 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:47
2024-04-23 08:17:05.092 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:48
2024-04-23 08:17:05.743 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:49
2024-04-23 08:17:06.360 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:50
2024-04-23 08:17:06.926 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:51
2024-04-23 08:17:07.504 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:52
2024-04-23 08:17:08.078 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:53
2024-04-23 08:17:08.649 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:54
2024-04-23 08:17:09.041 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:55
2024-04-23 08:17:09.584 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:56
2024-04-23 08:17:10.106 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:57
2024-04-23 08:17:10.586 [info] [sender_kvdb.cc:InsertOrAssignWorker:433] *** step leveldb put duration:34812.315606
2024-04-23 08:17:10.671 [info] [sender_kvdb.cc:DispatchInsertOrAssign:535] Finished insert-or-assign worker tasks
2024-04-23 08:17:10.676 [info] [sender_kvdb.cc:InsertOrAssign:841] Finished inserting 1000000 items in SenderDB
2024-04-23 08:17:10.972 [info] [pir.cc:LabeledPirSetup:268] finish bucket:28
2024-04-23 08:17:11.088 [info] [pir.cc:LabeledPirSetup:240] bucket:29 bucket_setup_path:/data/3000w_setup_path/bucket_29
2024-04-23 08:17:11.348 [info] [ecdh_oprf_selector.cc:CreateEcdhOprfServer:33] use fourq
2024-04-23 08:17:11.419 [info] [leveldb_kvstore.cc:Get:73] key not found
2024-04-23 08:17:11.419 [info] [sender_kvdb.cc:SenderKvDB:597] key item_count no value
2024-04-23 08:17:11.590 [info] [sender_kvdb.cc:InsertOrAssign:734] OPRF FullEvaluate and EncryptLabel batch_count: 0
2024-04-23 08:17:38.348 [info] [sender_kvdb.cc:InsertOrAssign:734] OPRF FullEvaluate and EncryptLabel batch_count: 1
2024-04-23 08:17:49.386 [info] [sender_kvdb.cc:InsertOrAssign:728] OPRF FullEvaluate and EncryptLabel Last batch count: 2, item_count:1000000
2024-04-23 08:17:49.389 [info] [sender_kvdb.cc:DispatchInsertOrAssign:517] Launching 1 insert-or-assign worker tasks
2024-04-23 08:18:01.863 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:0
2024-04-23 08:18:02.475 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:1
2024-04-23 08:18:02.978 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:2
2024-04-23 08:18:03.521 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:3
2024-04-23 08:18:04.268 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:4
2024-04-23 08:18:04.927 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:5
2024-04-23 08:18:05.575 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:6
2024-04-23 08:18:06.305 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:7
2024-04-23 08:18:06.978 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:8
2024-04-23 08:18:07.626 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:9
2024-04-23 08:18:08.255 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:10
2024-04-23 08:18:08.830 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:11
2024-04-23 08:18:09.456 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:12
2024-04-23 08:18:10.100 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:13
2024-04-23 08:18:10.683 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:14
2024-04-23 08:18:11.274 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:15
2024-04-23 08:18:11.890 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:16
2024-04-23 08:18:12.543 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:17
2024-04-23 08:18:13.185 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:18
2024-04-23 08:18:13.778 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:19
2024-04-23 08:18:14.323 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:20
2024-04-23 08:18:14.955 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:21
2024-04-23 08:18:15.522 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:22
2024-04-23 08:18:16.102 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:23
2024-04-23 08:18:16.698 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:24
2024-04-23 08:18:17.277 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:25
2024-04-23 08:18:17.829 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:26
2024-04-23 08:18:18.394 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:27
2024-04-23 08:18:18.923 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:28
2024-04-23 08:18:19.452 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:29
2024-04-23 08:18:20.009 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:30
2024-04-23 08:18:20.619 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:31
2024-04-23 08:18:21.157 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:32
2024-04-23 08:18:21.694 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:33
2024-04-23 08:18:22.319 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:34
2024-04-23 08:18:22.933 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:35
2024-04-23 08:18:23.501 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:36
2024-04-23 08:18:24.184 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:37
2024-04-23 08:18:24.841 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:38
2024-04-23 08:18:25.497 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:39
2024-04-23 08:18:26.114 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:40
2024-04-23 08:18:26.744 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:41
2024-04-23 08:18:27.325 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:42
2024-04-23 08:18:27.913 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:43
2024-04-23 08:18:28.492 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:44
2024-04-23 08:18:29.012 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:45
2024-04-23 08:18:29.578 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:46
2024-04-23 08:18:30.162 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:47
2024-04-23 08:18:30.716 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:48
2024-04-23 08:18:31.270 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:49
2024-04-23 08:18:31.854 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:50
2024-04-23 08:18:32.440 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:51
2024-04-23 08:18:33.046 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:52
2024-04-23 08:18:33.626 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:53
2024-04-23 08:18:34.191 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:54
2024-04-23 08:18:34.816 [info] [sender_kvdb.cc:InsertOrAssignWorker:417] Polynomial Interpolate and HE Plaintext Encode, bundle_indx:0, store_idx:55
Traceback (most recent call last):
  File "pir_setup.py", line 61, in <module>
  File "spu/pir.py", line 31, in pir_setup
RuntimeError: tried to interpolate at repeated points
[24825] Failed to execute script 'pir_setup' due to unhandled exception!

在进行3000万数据setup的时候,在第bucket_29的时候出错了。
而且指定bucket_size没有效果,都是固定的100万。
1000万,2000万数据都可以正常进行setup操作。
数据已经进行去重操作。

python /pir_setup.py \
--setup-data /opt/3000w.csv  \
--key-columns 'id' \
--label-column 'fenshu' \
--pir-setup-path /data/3000w_setup_path \
--pir-oprf-key-path /oprf_key.bin 

执行命令

The version of some packages
SecretFlow: 1.3.0.dev20231109
SPU: 0.6.0b0
Ray: 2.2.0
Jax: 0.4.12
Jaxlib: 0.4.12
TensorFlow: 2.11.1
PyTorch: 2.0.0+cpu

[Feature]: 关于PIR对 unlabeled PSI功能支持的询问

您好,打扰您了。
我想询问PIR上两个功能上是否有相关的实现或者是否能降低开销。

  1. unlabeled PSI
    我关注到PIR功能所对应的APSI库是支持unlabeld PSI功能的,而且也提到其开销会比label PSI更小的,我想请问一下这个功能是否支持以及如何调用,我目前的尝试是直接将label_columns设为[],即没有label输入,但是似乎无法执行。
spu.pir_setup(
    server="alice",
    input_path="/root/project/psi1/alice_exactpsi_1e6.csv",
    key_columns=['name'],
    label_columns=[],
    oprf_key_path="/root/project/psi1/alice_oprf_key",
    setup_path="/root/project/psi1/alice_exactpsi_setup_npq20",
    num_per_query=20,
    label_max_len=80,
    bucket_size=1000000
)

在secretnote上执行的结果如下。

Bob's Output:
2024-05-09 14:55:07.545 WARNING api.py:607 [bob] -- [Anonymous_job] Encounter RemoteError happend in other parties, error message: FedRemoteError occurred at alice
RayTaskError(FedRemoteError): �[36mray::ReceiverProxyActor.get_data()�[39m (pid=201684, ip=192.168.15.7, actor_id=57475061dc167106e064d65801000000, repr=<fed.proxy.barriers.ReceiverProxyActor object at 0x7f08dc26df60>)
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/proxy/barriers.py", line 236, in get_data
    raise data
fed.exceptions.FedRemoteError: FedRemoteError occurred at alice
---------------------------------------------------------------------------
RayTaskError(FedRemoteError)              Traceback (most recent call last)
Cell In[3], line 1
----> 1 spu.pir_setup(
      2     server="alice",
      3     input_path="/root/project/psi1/alice_exactpsi_1e6.csv",
      4     key_columns=['name'],
      5     # label_columns=['country', 'location'],
      6     label_columns=[],
      7     oprf_key_path="/root/project/psi1/alice_oprf_key",
      8     setup_path="/root/project/psi1/alice_exactpsi_setup_npq20",
      9     num_per_query=20,
     10     label_max_len=80,
     11     bucket_size=1000000
     12 )

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/spu.py:1990, in SPU.pir_setup(self, server, input_path, key_columns, label_columns, oprf_key_path, setup_path, num_per_query, label_max_len, bucket_size)
   1961 def pir_setup(
   1962     self,
   1963     server: str,
   (...)
   1971     bucket_size: int,
   1972 ):
   1973     """Private information retrival offline setup.
   1974     Args:
   1975         server (str): Which party is pir server.
   (...)
   1988         Dict: PIR report output by SPU.
   1989     """
-> 1990     return dispatch(
   1991         'pir_setup',
   1992         self,
   1993         server,
   1994         input_path,
   1995         key_columns,
   1996         label_columns,
   1997         oprf_key_path,
   1998         setup_path,
   1999         num_per_query,
   2000         label_max_len,
   2001         bucket_size,
   2002     )

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py:111, in dispatch(name, self, *args, **kwargs)
    101 def dispatch(name: str, self, *args, **kwargs):
    102     """Dispatch device kernel.
    103 
    104     Args:
   (...)
    109         Kernel execution result.
    110     """
--> 111     return _registrar.dispatch(self.device_type, name, self, *args, **kwargs)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/device/register.py:80, in Registrar.dispatch(self, device_type, name, *args, **kwargs)
     78 if name not in self._ops[device_type]:
     79     raise KeyError(f'device: {device_type}, op: {name} not registered')
---> 80 return self._ops[device_type][name](*args, **kwargs)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/device/kernels/spu.py:521, in pir_setup(device, server, input_path, key_columns, label_columns, oprf_key_path, setup_path, num_per_query, label_max_len, bucket_size)
    506 res.append(
    507     actor.pir_setup.remote(
    508         server,
   (...)
    517     )
    518 )
    520 # wait for all tasks done
--> 521 return sfd.get(res)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/secretflow/distributed/primitive.py:156, in get(object_refs)
    148 def get(
    149     object_refs: Union[
    150         Union[ray.ObjectRef, List[ray.ObjectRef]],
   (...)
    153     ]
    154 ):
    155     if get_distribution_mode() == DISTRIBUTION_MODE.PRODUCTION:
--> 156         return fed.get(object_refs)
    157     elif get_distribution_mode() == DISTRIBUTION_MODE.SIMULATION:
    158         return ray.get(object_refs)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py:613, in get(fed_objects)
    611 if get_global_context() is not None:
    612     get_global_context().set_last_recevied_error(e)
--> 613 raise e

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/fed/api.py:602, in get(fed_objects)
    599         ray_refs.append(received_ray_object_ref)
    601 try:
--> 602     values = ray.get(ray_refs)
    603     if is_individual_id:
    604         values = values[0]

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/auto_init_hook.py:24, in wrap_auto_init.<locals>.auto_init_wrapper(*args, **kwargs)
     21 @wraps(fn)
     22 def auto_init_wrapper(*args, **kwargs):
     23     auto_init_ray()
---> 24     return fn(*args, **kwargs)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    101     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    102         return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)

File ~/anaconda3/envs/psi/lib/python3.10/site-packages/ray/_private/worker.py:2524, in get(object_refs, timeout)
   2522     worker.core_worker.dump_object_store_memory_usage()
   2523 if isinstance(value, RayTaskError):
-> 2524     raise value.as_instanceof_cause()
   2525 else:
   2526     raise value

RayTaskError(FedRemoteError): ray::ReceiverProxyActor.get_data() (pid=201684, ip=192.168.15.7, actor_id=57475061dc167106e064d65801000000, repr=<fed.proxy.barriers.ReceiverProxyActor object at 0x7f08dc26df60>)
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/root/anaconda3/envs/psi/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/root/anaconda3/envs/psi/lib/python3.10/site-packages/fed/proxy/barriers.py", line 236, in get_data
    raise data
fed.exceptions.FedRemoteError: FedRemoteError occurred at alice

Alice's Output:

KeyboardInterrupt
  1. 只返回单个结果
    我目前使用PIR功能时,获得的结果都是返回匹配的所有结果。如果我只想获得匹配的第一个结果,想请教一下是否开销会变小,以及是否有相应的支持。这个功能似乎是与unlabeled PSI部分重合的,我目前对这个功能的需求与unlabeled PSI的场景相同。

[Bug]: OKVS测试无掩盖效果

Describe the bug

OKVS中实现的编码和解码逻辑应该是:Key*P=Value,P理论上应该是一个随机数组,但是我在实际测试时发现P中明显存在value数据,请问这本来就这么设计吗?还是我哪里设置有问题?

Steps To Reproduce

bazel run //psi/psi/core/vole_psi/okvs:baxos_test
//paxos_test同样存在该问题。
`
namespace psi::psi::okvs {

class BaxosTest : public testing::TestWithParamstd::size_t {};

TEST_P(BaxosTest, WORKS) {
size_t items_num = GetParam();
size_t bin_size = items_num / 4;
size_t weight = 3;
// statistical security parameter
size_t ssp = 40;

Baxos baxos;
yacl::crypto::Prg<uint128_t> prng(yacl::crypto::RandU128());

uint128_t seed;
prng.Fill(absl::MakeSpan(&seed, 1));

SPDLOG_INFO("items_num:{}, bin_size:{}", items_num, bin_size);

baxos.Init(items_num, bin_size, weight, ssp, PaxosParam::DenseType::GF128,
seed);

SPDLOG_INFO("baxos.size(): {}", baxos.size());

std::vector<uint128_t> items(items_num);
std::vector<uint128_t> values(items_num);
std::vector<uint128_t> values2(items_num);
std::vector<uint128_t> p(baxos.size());

prng.Fill(absl::MakeSpan(items.data(), items.size()));
prng.Fill(absl::MakeSpan(values.data(), values.size()));

auto start = std::chrono::high_resolution_clock::now();
baxos.Solve(absl::MakeSpan(items), absl::MakeSpan(values), absl::MakeSpan(p));
auto end = std::chrono::high_resolution_clock::now();
std::cout << "baxos.Solve size" << p.size();
std::chrono::duration<double, std::milli> elapsed = end - start;
std::cout << "baxos.Solve took " << elapsed.count() << " milliseconds.\n";

for (const auto& innerVec : p) {
// for (const auto& element : innerVec) {
std::cout << innerVec << " ";
// }
std::cout << std::endl;
}

for (const auto& innerVec : values) {
// for (const auto& element : innerVec) {
std::cout << innerVec << " ";
// }
std::cout << std::endl;
}
size_t k = 0;
for (size_t i = 0; i < values.size(); i++) {
for (size_t j = 0; j < p.size(); j++) {
if (p[j] == values[i]) {
// std::cout << "valuesp:" << i << std::endl;
k++;
}
}
}
std::cout << "count: " << k << std::endl;
start = std::chrono::high_resolution_clock::now();
baxos.Decode(absl::MakeSpan(items), absl::MakeSpan(values2),
absl::MakeSpan(p));
end = std::chrono::high_resolution_clock::now();
elapsed = end - start;
std::cout << "baxos.Decode took " << elapsed.count() << " milliseconds.\n";

if (std::memcmp(values2.data(), values.data(),
values.size() * sizeof(uint128_t)) != 0) {
for (uint64_t i = 0; i < items_num; ++i) {
EXPECT_EQ(std::memcmp(&values[i], &values2[i], sizeof(uint128_t)), 0);
}
}
}

INSTANTIATE_TEST_SUITE_P(Works_Instances, BaxosTest, testing::Values(15));

} // namespace psi::psi::okvs
`

结果:
`
Running main() from gmock_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Works_Instances/BaxosTest
[ RUN ] Works_Instances/BaxosTest.WORKS/0
[2024-07-05 03:27:56.630] [info] [baxos_test.cc:43] items_num:15, bin_size:3
[2024-07-05 03:27:56.631] [info] [baxos_test.cc:48] baxos.size(): 265
baxos.Solve size265baxos.Solve took 0.084167 milliseconds.
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
30249581019136461799913703116248529260
0
0
0
0
0
0
0
0
0
0
0
0
0
0
264482226524464365223049826774360018070
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
297070843746093303118528725100684470190
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
319914869819946649309955289809349906505
0
0
326329356209774252584499154986720639053
0
0
0
0
0
0
94857184957107499938021084602682390890
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
282432622376067604239800030988424235940
0
0
0
0
0
0
0
0
0
0
60542127553454401102716789245072047140
75377835026112524672441006935454381434
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
36933043176152668205754522843720319017
0
42139514444490055029471643400844337786
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
328152844493251641623901453218272922896
0
0
0
0
0
0
0
0
0
0
0
194629314372862775373800636331631656676
0
0
0
239385172818071565953594850162918447632
0
0
38640883360020630412977397758636309358
0
0
0
0
0
0
0
0
0
0
0
38640883360020630412977397758636309358
194629314372862775373800636331631656676
57143744327010333286401672076945547235
6884410884816999675291523560055973892
297070843746093303118528725100684470190
30249581019136461799913703116248529260
94857184957107499938021084602682390890
282432622376067604239800030988424235940
60542127553454401102716789245072047140
42139514444490055029471643400844337786
75377835026112524672441006935454381434
88767996193783301854506181464606643968
264482226524464365223049826774360018070
36933043176152668205754522843720319017
239385172818071565953594850162918447632
count: 12
baxos.Decode took 0.059291 milliseconds.
[ OK ] Works_Instances/BaxosTest.WORKS/0 (1 ms)
[----------] 1 test from Works_Instances/BaxosTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (1 ms total)
[ PASSED ] 1 test.
`

Expected behavior

OKVS生成的P不显示原本value。

Version

0.4.0

Operating system

mac

Hardware Resources

16c16G

seal_pir性能测试

secretflow目前好像没有提供seal_pir的接口,请问应该如何测试seal_pir的性能?

选择并在 SecretFlow 中实现多方 PSI 协议

此 ISSUE 为 隐语开源共建计划(SecretFlow Open Source Contribution Plan,简称 SF OSCP)第三期任务 ISSUE,欢迎社区开发者参与共建~
若有感兴趣想要认领的任务,但还未报名,辛苦先完成报名进行哈~

任务介绍

  • 任务名称:选择并在 SecretFlow 中实现多方 PSI 协议
  • 技术方向:PSI
  • 任务难度:挑战🌟🌟🌟

详细要求

请选择并实现隐语的多方 PSI 协议,具体要求如下:

  • 功能性:至少一个 大于 2 个参与方的半诚实 PSI 方案
  • 安全性(尽量少 reveal)
  • 代码规范:C++ 代码需要使用 Google C++ 代码规范进行格式化(流水线包含代码规范检查卡点)
  • 提交说明:关联该 issue 并提交代码至 https://github.com/secretflow/spu/tree/main/libspu/psi/core

能力要求

  • 熟悉 PSI 原理
  • 了解多方 PSI 近年来的进展
  • 熟悉隐语 PSI 接口

操作说明

认领说明

本任务可有多种实现方式,故支持有多位开发者进行认领,请在认领任务后,在该 issue 下 comment 你的具体设计思路。

设计思路说明:简单说明计划使用哪个半诚实 PSI 方案以及简单的实现思路。

当同一任务有多位开发者提交设计思路时:

  • 若多位开发者的设计思路类似,则将按照 comment 的时间 assign 给第一位 comment 的开发者;
  • 若多位开发者的设计思路不同,则该 issue 将会拆分为以不同设计思路进行实现的子 issue,并 assign 给对应开发者。

使用ECDH/KKRT-PSI-NPC协议执行三方psi报错

运行代码其中一个节点如下:
`import os
import sys
import time
import logging
import multiprocess
from absl import app
import spu
import secretflow as sf
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

cluster_config ={
'parties': {
'alice': {
'address': '10.200.3.32:8002',
'listen_addr': '0.0.0.0:8002'
},
'bob': {
'address': '10.200.3.33:8012',
'listen_addr': '0.0.0.0:8012'
},
'carol': {
'address': '10.200.3.34:8022',
'listen_addr': '0.0.0.0:8022'
},
},
'self_party': 'bob'
}

cluster_def = {
'nodes': [
{'party': 'alice', 'id': 'local:0', 'address': '10.200.3.32:12945'},
{'party': 'bob', 'id': 'local:1', 'address': '10.200.3.33:12946'},
{'party': 'carol', 'id': 'local:2', 'address': '10.200.3.34:12947'},
],
'runtime_config': {
'protocol': spu.spu_pb2.SEMI2K,
'field': spu.spu_pb2.FM128,
},
}

link_desc = {
'recv_timeout_ms': 3600000,
}

def main(_):
sf.init(address='10.200.3.33:8011', log_to_driver=True, cluster_config=cluster_config, omp_num_threads=multiprocess.cpu_count())

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

alice = sf.PYU('alice')
bob = sf.PYU('bob')
carol = sf.PYU('carol')

key_columns = ['姓名','证件类型','证件号码']
label_columns = ['register_date','age']
spu = sf.SPU(cluster_def, link_desc)

parent_path = '/jdxk/psi'

input_path = {
    alice: f'{parent_path}/Atest.csv',
    bob: f'{parent_path}/Btest.csv',
    carol: f'{parent_path}/Ctest.csv',
}

output_path = {
    alice: f'{parent_path}/psi_output.csv',
    bob: f'{parent_path}/psi_output_bob.csv',
    carol:f'{parent_path}/psi_output_carol.csv',
}
select_keys = {
    alice: key_columns,
    bob: key_columns,
    # if run with `ECDH_PSI_3PC`, add carol
    carol: key_columns,
}

spu = sf.SPU(cluster_def=cluster_def)

start = time.time()

reports = spu.psi_csv(
    key=select_keys,
    input_path=input_path,
    output_path=output_path,
    receiver='alice',  
    protocol='ECDH_PSI_NPC', 
    precheck_input=False,  # will cost ext time if set True
    broadcast_result=True,  # will cost ext time if set True
)
print(f"psi reports: {reports}")
logging.info(f"cost time: {time.time() - start}")

sf.shutdown()

if name == 'main':
app.run(main)
`

运行卡在计算界面未出结果,其中两个节点的日志如下:
iwEcAqNwbmcDAQTRBe4F0QEgBrAq88psbJI8zwTpcusHQDEAB9MAAAAAwdHhXwgACaJpbQoAC9IACJhI png_720x720q90
lQLPJxMXeOlPqXzNAY7NBhKwjuEYUYHvks4E6W9FMQClAA_1554_398 png_720x720q90

PSI Quick Start with v2 API 报错

报错如下
(base) root@k8s-node1:/approot1/secretflow-learn/quikstart# docker run -it --rm --network host --mount type=bind,source=/tmp/receiver,target=/root/receiver -w /root --cap-a dd=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=NET_ADMIN --privileged=true secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/psi-anolis8:latest bash -c "./ main --config receiver/receiver.config"
[2024-01-25 16:23:45.540] [info] [main.cc:44] SecretFlow PSI Library v0.2.0.dev240123 Copyright 2023 Ant Group Co., Ltd.
terminate called after throwing an instance of 'yacl::EnforceNotMet'
what(): [Enforce fail at psi/main.cc:70] status.ok(). Launch config JSON string couldn't be parsed: {
"psi_config": {
"protocol_config": {
"protocol": "PROTOCOL_KKRT",
"role": "ROLE_RECEIVER",
"broadcast_result": true
},
"input_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/root/receiver/receiver_input.csv"
},
"output_config": {
"type": "IO_TYPE_FILE_CSV",
"path": "/root/receiver/receiver_output.csv"
},
"keys": [
"id0",
"id1"
],
"debug_options": {
"trace_path": "/root/receiver/receiver.trace"
},
"link_config": {
"parties": [
{
"id": "receiver",
"host": "127.0.0.1:5300"
},
{
"id": "sender",
"host": "127.0.0.1:5400"
}
]
}
},
"self_link_party": "receiver"
}
Stacktrace:
#0 __libc_start_main+0x7f503283acf3

[Bug]: spu kkrt在测试环境跑了个800w对100w的任务,但是sender和receiver两端一直卡在[info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 15 阶段过不去

Issue Type

Performance

Modules Involved

PSI

Have you reproduced the bug with SPU HEAD?

No

Have you searched existing issues?

Yes

SPU Version

0.3.3b2

OS Platform and Distribution

CentOS Linux release 7.6.1810 (Core)

Python Version

3.8

Compiler Version

4.8.5

Current Behavior?

我们用spu kkrt在测试环境跑了个800w对100w的任务,但是sender和receiver两端一直卡在[info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 15 阶段过不去,看了下网络请求双边是有在做数据交换的,可能卡在Preprocess阶段的NegotiateBucketNum了,对于这种现象有什么排查建议吗?
以下是对通讯端口的抓包
image

Standalone code to reproduce the issue

使用simple_psi.py脚本执行任务
receiver执行参数:{"protocol":"KKRT_PSI_2PC","rank":1,"party_ips":"party_ip:1213,0.0.0.0:1213","in_path":"input.csv","field_names":"id","precheck_input":false,"receiver_rank":1,"output_sort":true,"bucket_size":1048576,"ic_mode":false}
sender执行参数:
{"protocol":"KKRT_PSI_2PC","rank":0,"party_ips":"0.0.0.0:1213,party_ip:1213","in_path":"input.csv","field_names":"id","precheck_input":false,"receiver_rank":1,"output_sort":true,"bucket_size":1048576,"ic_mode":false}

Relevant log output

2024-02-02 14:33:40.421 [info] [bucket_psi.cc:Run:97] Begin sanity check for input file: input.csv, precheck_switch:false
2024-02-02 14:33:52.205 [info] [bucket_psi.cc:Run:115] End sanity check for input file: input.csv, size=9001605
2024-02-02 14:33:52.208 [info] [bucket_psi.cc:RunPsi:419] Run psi protocol=2, self_items_count=9001605
2024-02-02 14:33:52.208 [info] [bucket_psi.cc:RunBucketPsi:486] psi protocol=2, rank=0 item_size=999999
2024-02-02 14:33:52.208 [info] [bucket_psi.cc:RunBucketPsi:486] psi protocol=2, rank=1 item_size=9001605
2024-02-02 14:33:52.208 [info] [bucket_psi.cc:RunBucketPsi:496] psi protocol=2, bucket_count=9
2024-02-02 14:34:06.803 [info] [bucket_psi.cc:RunBucketPsi:508] run psi bucket_idx=0, bucket_item_size=1000382
2024-02-02 14:34:06.811 [info] [memory_psi.cc:Run:66] psi protocol=2, rank=0, inputs_size=110936
2024-02-02 14:34:06.811 [info] [memory_psi.cc:Run:66] psi protocol=2, rank=1, inputs_size=1000382
2024-02-02 14:34:06.815 [info] [thread_pool.cc:ThreadPool:30] Create a fixed thread pool with size 15
.....

for a long long time

sealPir仅仅支持8192安全参数?

struct TestParams {
  size_t batch_number;
  size_t element_number;
  size_t element_size = 288;
  size_t poly_degree = 8192;  // now only support 8192
};

在sealPir测试时发现在代码中说仅仅支持8192安全参数,这将导致密文长度过长,我尝试修改相关安全参数,并解决相关报错,但是计算结果错误。而在sealPir开源库中支持4096安全强度。想问一下:
1.出于什么原因仅支持8192安全参数。
2.是否能修改一些代码部分来支持其他安全参数?

Seal PIR 与 Labeled PSI的疑问

Issue Type

Others

Source

source

Secretflow Version

1.5.0.dev240319

OS Platform and Distribution

centos 7.9

Python version

3.10

Bazel version

6.5.0

GCC/Compiler version

11.2.1

What happend and What you expected to happen.

老师,我有几个问题想请教下:
Q1 - 我们有Seal PIR测试案例链接吗,有Seal PIR封装接口的链接吗?
Q2 - 目前对外公布的pir_setup/pir_query方法都是基于Labeled PSI吗?
Q3 - 最后就是下面2个链接都失效了,有最新链接吗?
https://github.com/secretflow/spu/blob/main/libspu/pir/seal_pir_test.cc
https://github.com/secretflow/spu/blob/main/libspu/psi/core/labeled_psi/README.md

Reproduction code to reproduce the issue.

[Bug]: spu 0.8.0b0版本无法兼容0.3.3b2版本

Issue Type

Usability

Modules Involved

PSI

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

0.8.0b0

OS Platform and Distribution

CentOS Linux release 7.9.2009 (Core)

Python Version

3.9.19

Compiler Version

4.8.5

Current Behavior?

接收方 spu版本 0.8.0b0
发送方 spu版本 0.3.3b2
双方使用KKRT_PSI_2PC算法执行psi任务,无法连通

Standalone code to reproduce the issue

接收方simple_psi.py -rank 0 -party_ips 0.0.0.0:1010,发送方ip:1010 -protocol KKRT_PSI_2PC -in_path /tmp/data/y.csv -out_path /tmp/y.csv.out -field_names id --precheck_input true
发送方simple_psi.py -rank 1 -party_ips 接收方ip:1010,0.0.0.0:1010 -protocol KKRT_PSI_2PC -in_path /tmp/data/y.csv -out_path /tmp/y.csv.out -field_names id --precheck_input true

Relevant log output

接收方日志:
[2024-03-21 07:23:46.156] [info] [launch.cc:164] LEGACY PSI config: {"psi_type":"KKRT_PSI_2PC","broadcast_result":true,"input_params":{"path":"/tmp/data/y.csv","select_fields":["id"],"precheck":true},"output_params":{"path":"/tmp/y.csv.out","need_sort":true},"curve_type":"CURVE_25519","bucket_size":1048576}
[2024-03-21 07:23:46.156] [info] [bucket_psi.cc:400] bucket size set to 1048576
Fatal Python error: Aborted

Current thread 0x00007f6d2d891740 (most recent call first):
  File "/usr/local/bin/python3/lib/python3.9/site-packages/spu/psi.py", line 69 in bucket_psi
  File "/usr/local/bin/spu", line 93 in main
  File "/usr/local/bin/python3/lib/python3.9/site-packages/absl/app.py", line 254 in _run_main
  File "/usr/local/bin/python3/lib/python3.9/site-packages/absl/app.py", line 308 in run
  File "/usr/local/bin/spu", line 102 in <module>
Aborted

发送方日志:
2024-03-21 07:23:46.287 [info] [bucket_psi.cc:Init:228] bucket size set to 1048576
2024-03-21 07:23:46.288 [info] [bucket_psi.cc:Run:97] Begin sanity check for input file: /tmp/data/y.csv, precheck_switch:true
2024-03-21 07:23:46.289 [info] [csv_checker.cc:CsvChecker:121] Executing duplicated scripts: LC_ALL=C sort --buffer-size=1G --temporary-directory=/tmp --stable selected-keys.1711005826288592734 | LC_ALL=C uniq -d > duplicate-keys.1711005826288592734
Traceback (most recent call last):
  File "/usr/local/bin/spu", line 119, in <module>
    app.run(main)
  File "/usr/local/bin/python3/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/bin/python3/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/usr/local/bin/spu", line 112, in main
    report = psi.bucket_psi(setup_link(FLAGS.rank), config, FLAGS.ic_mode)
  File "/usr/local/bin/python3/lib/python3.8/site-packages/spu/psi.py", line 48, in bucket_psi
    report_str = libspu.libs.bucket_psi(link, config.SerializeToString(), ic_mode)
RuntimeError: what:
	[external/yacl/yacl/link/transport/channel.cc:117] Get data timeout, key=root:1:ALLGATHER
stacktrace:
#0 yacl::link::Context::RecvInternal()+0x7fe108a47417
secretflow/spu#1 yacl::link::AllGatherImpl<>()+0x7fe108a41bfd
secretflow/spu#2 yacl::link::AllGather()+0x7fe108a42193
secretflow/spu#3 spu::psi::SyncWait<>()+0x7fe10888462a
secretflow/spu#4 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x7fe1077838e8
secretflow/spu#5 pybind11::cpp_function::dispatcher()+0x7fe10775e6d6
secretflow/spu#6 PyCFunction_Call+0x43bcda

[Bug]: 在已经搭建隐语平台的2个机器上调用spu的pir脚本报brpc错误

Issue Type

Build/Install

Modules Involved

PIR

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

0.8.0.dev20240112

OS Platform and Distribution

Linux version 3.10.0-1160.80.1.el7.x86_64

Python Version

3.8

Compiler Version

gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Current Behavior?

brpc server failed start

Standalone code to reproduce the issue

python /home/app/spu/examples/python/pir/pir_server.py --party_ips 10.46.139.251:61307,10.46.139.4:61307 --rank 1 --oprf_key_path /home/app/spu/oprf_key.bin --setup_path setup_path --enable_tls False

Relevant log output

(sfhost) python /home/app/spu/examples/python/pir/pir_server.py --party_ips 10.46.139.251:9092,10.46.139.4:9092 --rank 1 --oprf_key_path /home/app/spu/oprf_key.bin --setup_path setup_path --enable_tls False
id_0 = 10.46.139.251:9092
id_1 = 10.46.139.4:9092
2024-02-01 17:16:57.806 [error] [server.cpp:BRPC:1054] Fail to listen 10.46.139.4:9092
2024-02-01 17:16:57.824 [warning] [channel.h:~Channel:160] Channel destructor is called before WaitLinkTaskFinish, try stop send thread
Traceback (most recent call last):
  File "/home/app/spu/examples/python/pir/pir_server.py", line 94, in <module>
    app.run(main)
  File "/home/app/anaconda3/envs/sfhost/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/app/anaconda3/envs/sfhost/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/app/spu/examples/python/pir/pir_server.py", line 87, in main
    link_ctx = setup_link(FLAGS.rank)
  File "/home/app/spu/examples/python/pir/pir_server.py", line 70, in setup_link
    return link.create_brpc(lctx_desc, rank)
RuntimeError: what:
        [external/yacl/yacl/link/transport/brpc_link.cc:101] brpc server failed start
stacktrace:
#0 yacl::link::FactoryBrpc::CreateContext()+0x2af49bcc5bf4
secretflow/spu#1 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()+0x2af49a7d8f84
secretflow/spu#2 pybind11::cpp_function::dispatcher()+0x2af49a7af83e
secretflow/spu#3 PyCFunction_Call+0x4f5572


(sfhost)

cryptor中的椭圆曲线算子没有实现mask的逆方法,准备自己实现一个,但是出现了一些问题

我想实现ECDH-PSI的如下版本:
1.接收方集合加密后发送给发送方(aG)
2.发送方将收到的加密后返还给接收方(abG)
3.接收方解密收到的数据(a^ abG->bG)
4.发送方将自己的集合加密后发送给接收方
5.对比得到交集

我基于sm2_cryptor.cc中的EccMask进行实验时出现了a^ abG 不等于 bG 的情况,实验代码(sm2_cryptor.cc中的第35行)如下:

auto mask_functor = [this](const Item& in, Item& out) {
    BnCtxPtr bn_ctx(yacl::CheckNotNull(BN_CTX_new()));

    EcGroupSt ec_group(GetEcGroupId(curve_type_));

    EcPointSt ec_point(ec_group);

    EC_POINT_oct2point(ec_group.get(), ec_point.get(),
                       reinterpret_cast<const unsigned char*>(in.data()),
                       in.size(), bn_ctx.get());

    BigNumSt bn_sk;
    bn_sk.FromBytes(
        absl::string_view(reinterpret_cast<const char*>(&this->private_key_[0]),
                          kEccKeySize),
        ec_group.bn_p);

    Item input_point;
    ec_point.ToBytes(absl::MakeSpan(
        reinterpret_cast<uint8_t*>(input_point.data()), input_point.size()));
    std::cout << "input_point:" << input_point.data() << std::endl;
    // pointmul
    BigNumSt bn_sk_v = bn_sk.Inverse(ec_group.bn_p);  // 拿到私钥模逆

    EcPointSt ec_point2 = ec_point.PointMul(ec_group, bn_sk);

    EcPointSt ec_point3 = ec_point2.PointMul(ec_group, bn_sk_v);  // 乘私钥的模逆

    ec_point3.ToBytes(
        absl::MakeSpan(reinterpret_cast<uint8_t*>(out.data()), out.size()));
    std::cout << "out:" << out.data() << std::endl;
  };

输出结果:
input_point:ni!���z��
���ۺ;e6^���E��O"�^�
out:�8R��$�׋��A�V)tz�%$�K�9Ǩ�
��

理论上ec_point应该等于ec_point3,但是实际不是,不知道是什么原因,期待解答。

使用 python 接口测试 psi 的时候,如何设置打印文件日志。

Feature Request Type

Build/Install

Have you searched existing issues?

Yes

Is your feature request related to a problem?

  • 测试用例 legacy_psi_test.py 中,如何添加文件日志。

  • 添加以下代码:
    `
    def run_streaming_psi(self, wsize, self_rank, link_id, party_ids, addrs, inputs, outputs, selected_fields, protocol):

      log_options = logging.LogOptions()
      log_options.log_level = logging.LogLevel.DEBUG
      log_options.system_log_path = "./alice.log"
      logging.setup_logging(log_options)
    

`
libpsi 中的日志并不会打印到日志文件中,会打印到console 中,如何设置可以让日志都打印到文件中。

Describe features you want to add to SPU

  • 使用python接口,可以设置统一的文件日志。

Describe features you want to add to SPU

  • 使用python接口,可以设置统一的文件日志。

sealPir安全参数仅仅支持8192?

struct TestParams {
  size_t batch_number;
  size_t element_number;
  size_t element_size = 288;
  size_t poly_degree = 8192;  // now only support 8192
};

在sealPir测试时发现在代码中说仅仅支持8192安全参数,这将导致密文长度过长,我尝试修改相关安全参数,并解决相关报错,但是计算结果错误。而在sealPir开源库中支持4096安全强度。想问一下:
1.出于什么原因仅支持8192安全参数。
2.是否能修改一些代码部分来支持其他安全参数?

[Bug]: import时报错illegal instruction

Issue Type

Support

Modules Involved

SPU runtime

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu 0.8.0b0

OS Platform and Distribution

no matter

Python Version

3.9

Compiler Version

No response

Current Behavior?

在海光7185cpu上运行spu时,import spu spu.psi会报错illegal instruction (core dumped),看起来缺少spu需要的指令集。在文档中我只看到需要avx2,该cpu是支持avx2的;能否提供更详细的spu所需指令集说明?

Standalone code to reproduce the issue

--

Relevant log output

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.