Code Monkey home page Code Monkey logo

Comments (7)

acelyc111 avatar acelyc111 commented on September 27, 2024 1

Hi, @ninsmiracle !
Is the Pegasus cluster deployed as a onebox in the docker container? Do the Pegasus shell tool and admin-cli run in the same docker container?

When I deloyed as a onebox in my Docker container , cluster run as normal. However, if I deploy it on real node, cluster running but can not accept any RPC. I think the key point is meta.THREAD_POOL_META_SERVER2.02008e370001000c: rpc_host_port.cpp:62:from_address(): assertion expression: [utils::hostname_from_ip(__bswap_32 (addr.ip()), &hp._host)] invalid host_port 172.17.0.1.

@ninsmiracle You can check if this patch could solve the issue: #2044

from incubator-pegasus.

ninsmiracle avatar ninsmiracle commented on September 27, 2024

So I want to know what should I do , to deloy a peagsus cluster with FQDN now , and how to use tools control this cluster. Thanks a lot. @acelyc111

from incubator-pegasus.

ninsmiracle avatar ninsmiracle commented on September 27, 2024

Let me add more details:

  1. deploy clusters,it works. Every nodes running...

  2. useing peagsus-shell to connected to cluster
    image

  3. send any RPC command , like nodes -dr or ls -d. TIME_OUT
    image

4.A lot of core in meta-server
image

Core like core.meta.THREAD_PO...

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/work/app/pegasus/c3tst-performance1/meta/package/bin/pegasus_server confi'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3c0c8bc1d7 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f3c0c8bc1d7 in raise () from /lib64/libc.so.6
#1  0x00007f3c0c8bd8c8 in abort () from /lib64/libc.so.6
#2  0x00007f3c10cd8a1e in dsn_coredump () at /home/guoningshen/code/incubator-pegasus/src/runtime/service_api_c.cpp:130
#3  0x00007f3c0dcb4134 in process_fatal_log (log_level=<optimized out>) at /home/guoningshen/code/incubator-pegasus/src/utils/simple_logger.cpp:117
#4  dsn::tools::simple_logger::log (this=0x2e3a200, file=<optimized out>, function=<optimized out>, line=<optimized out>, log_level=<optimized out>, str=<optimized out>)
    at /home/guoningshen/code/incubator-pegasus/src/utils/simple_logger.cpp:284
#5  0x00007f3c10d09ff3 in dsn::host_port::from_address (addr=...) at /home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_host_port.cpp:60
#6  0x00007f3c10d0f0c5 in dsn::message_ex::create_response (this=this@entry=0x327be00) at /home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_message.cpp:358
#7  0x00007f3c10d0638d in dsn::rpc_engine::forward (this=this@entry=0x2c4f180, request=request@entry=0x327be00, address=...) at /home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_engine.cpp:853
#8  0x00007f3c10cd90a3 in dsn_rpc_forward (request=0x327be00, addr=...) at /home/guoningshen/code/incubator-pegasus/src/runtime/service_api_c.cpp:207
#9  0x00007f3c0ffc6196 in forward (addr=..., this=0x7f3bee4e5f20) at /home/guoningshen/code/incubator-pegasus/src/runtime/rpc/rpc_holder.h:224
#10 dsn::replication::meta_service::check_leader<dsn::rpc_holder<dsn::replication::configuration_list_apps_request, dsn::replication::configuration_list_apps_response> > (this=this@entry=0x32ee000, 
    rpc=..., forward_address=<optimized out>) at /home/guoningshen/code/incubator-pegasus/src/meta/meta_service.h:406
#11 0x00007f3c0ffc629a in dsn::replication::meta_service::check_leader_status<dsn::rpc_holder<dsn::replication::configuration_list_apps_request, dsn::replication::configuration_list_apps_response> > (
    this=this@entry=0x32ee000, rpc=..., forward_address=forward_address@entry=0x0) at /home/guoningshen/code/incubator-pegasus/src/meta/meta_service.h:420
#12 0x00007f3c0ff9ef6a in dsn::replication::meta_service::on_list_apps (this=0x32ee000, rpc=...) at /home/guoningshen/code/incubator-pegasus/src/meta/meta_service.cpp:671
#13 0x00007f3c0fff8653 in operator() (request=<optimized out>, __closure=<optimized out>) at /home/guoningshen/code/incubator-pegasus/src/runtime/serverlet.h:201
#14 std::_Function_handler<void (dsn::message_ex*), bool dsn::serverlet<dsn::replication::meta_service>::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::replication::configuration_list_apps_request, dsn::replication::configuration_list_apps_response> >(dsn::task_code, char const*, void (dsn::replication::meta_service::*)(dsn::rpc_holder<dsn::replication::configuration_list_apps_request, dsn::replication::configuration_list_apps_response>))::{lambda(dsn::message_ex*)#1}>::_M_invoke(std::_Any_data const&, dsn::message_ex*&&) (__functor=..., __args#0=<optimized out>)
    at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#15 0x00007f3c10d123b2 in operator() (__args#0=<optimized out>, this=0x2b310d0) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#16 dsn::rpc_request_task::exec (this=0x2b31000) at /home/guoningshen/code/incubator-pegasus/src/runtime/task/task.h:436
#17 0x00007f3c10d13be1 in dsn::task::exec_internal (this=0x2b31000) at /home/guoningshen/code/incubator-pegasus/src/runtime/task/task.cpp:173
#18 0x00007f3c10d2b257 in dsn::task_worker::loop (this=0x2b19290) at /home/guoningshen/code/incubator-pegasus/src/runtime/task/task_worker.cpp:245
#19 0x00007f3c10d2bdc0 in dsn::task_worker::run_internal (this=0x2b19290) at /home/guoningshen/code/incubator-pegasus/src/runtime/task/task_worker.cpp:225
#20 0x00007f3c0f7a5a3f in execute_native_thread_routine () from /home/work/app/pegasus/c3tst-performance1/meta/package/bin/librocksdb.so.8
#21 0x00007f3c0df3adc5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f3c0c97e73d in clone () from /lib64/libc.so.6
(gdb) 

Core like core.pegasus_server....

#0  0x0000000000000000 in ?? ()
#1  0x00007f693f83b6c0 in (anonymous namespace)::stacktrace_generic_fp::capture<false, false> (result=result@entry=0xaee010, max_depth=31, skip_count=1, initial_frame=initial_frame@entry=0x7ffd328eae80, 
    initial_pc=initial_pc@entry=0x0, sizes=0x0) at src/stacktrace_generic_fp-inl.h:175
#2  0x00007f693f83b74a in GetStackTrace_generic_fp (result=0xaee010, max_depth=<optimized out>, skip_count=<optimized out>) at src/stacktrace_generic_fp-inl.h:332
#3  0x00007f693f83ba52 in GetStackTrace (result=result@entry=0xaee010, max_depth=max_depth@entry=30, skip_count=skip_count@entry=0) at src/stacktrace.cc:346
#4  0x00007f693f82c37e in tcmalloc::PageHeap::HandleUnlock (this=0x7f693fa56720 <tcmalloc::Static::pageheap_>, context=0x7ffd328eaf10) at src/page_heap.cc:155
#5  0x00007f693f82e07a in ~LockingContext (this=0x7ffd328eaf10, __in_chrg=<optimized out>) at src/page_heap.cc:77
#6  tcmalloc::PageHeap::NewWithSizeClass (this=this@entry=0x7f693fa56720 <tcmalloc::Static::pageheap_>, n=n@entry=1, sizeclass=26) at src/page_heap.cc:161
#7  0x00007f693f82beb7 in tcmalloc::CentralFreeList::Populate (this=this@entry=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>) at src/central_freelist.cc:314
#8  0x00007f693f82c088 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe (this=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>, N=1, start=0x7ffd328eb020, end=0x7ffd328eb028)
    at src/central_freelist.cc:273
#9  0x00007f693f82c120 in tcmalloc::CentralFreeList::RemoveRange (this=0x7f693fbe1420 <tcmalloc::Static::central_cache_+31616>, start=start@entry=0x7ffd328eb020, end=end@entry=0x7ffd328eb028, N=1)
    at src/central_freelist.cc:253
#10 0x00007f693f82fca3 in tcmalloc::ThreadCache::FetchFromCentralCache (this=this@entry=0xb0e000, cl=cl@entry=26, byte_size=byte_size@entry=576, 
    oom_handler=oom_handler@entry=0x7f693f81d240 <(anonymous namespace)::nop_oom_handler(size_t)>) at src/thread_cache.cc:125
#11 0x00007f693f83f15d in Allocate (oom_handler=0x7f693f81d240 <(anonymous namespace)::nop_oom_handler(size_t)>, cl=26, size=576, this=<optimized out>) at src/thread_cache.h:381
#12 do_malloc (size=568) at src/tcmalloc.cc:1414
#13 do_allocate_full<tcmalloc::malloc_oom> (size=568) at src/tcmalloc.cc:1804
#14 tcmalloc::allocate_full_malloc_oom (size=568) at src/tcmalloc.cc:1820
#15 0x00007f693dfa754d in __fopen_internal () from /lib64/libc.so.6
#16 0x00007f693ca60a16 in selinuxfs_exists () from /lib64/libselinux.so.1
#17 0x00007f693ca58ce8 in init_lib () from /lib64/libselinux.so.1
#18 0x00007f6943dfd1e3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#19 0x00007f6943def21a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#20 0x0000000000000004 in ?? ()
#21 0x00007ffd328ed220 in ?? ()
#22 0x00007ffd328ed26a in ?? ()
#23 0x00007ffd328ed275 in ?? ()
#24 0x00007ffd328ed27f in ?? ()
#25 0x0000000000000000 in ?? ()
(gdb) 
  1. stdout(error log) in meta-server
W2024-05-11 10:33:36.503 (1715394816503732375 36348) : overwrite default thread pool for task RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX from THREAD_POOL_META_SERVER to THREAD_POOL_DEFAULT
W2024-05-11 10:33:36.503 (1715394816503775340 36348) : overwrite default thread pool for task RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX_ACK from THREAD_POOL_META_SERVER to THREAD_POOL_DEFAULT
I2024-05-11 10:33:36.503 (1715394816503863057 36348) : pegasus server starting, pid(36348), version($Version: Pegasus Server 2.6.0-SNAPSHOT (aea1cfe632d455fcddfe4c92ebbd9d4e89037abb) Release, built by gcc 7.3.1, built on 12180ab51819, built at May  7 2024 12:14:31 $)
F2024-05-11 10:36:03.558 (1715394963558260142 36428)   meta.THREAD_POOL_META_SERVER2.02008e370001000c: rpc_host_port.cpp:62:from_address(): assertion expression: [utils::hostname_from_ip(__bswap_32 (addr.ip()), &hp._host)] invalid host_port 172.17.0.1

7.By the way , all the replica-server running during that time
image

8.And I can not connect to cluster via admin-cli
image

from incubator-pegasus.

acelyc111 avatar acelyc111 commented on September 27, 2024

Hi, @ninsmiracle !

Is the Pegasus cluster deployed as a onebox in the docker container? Do the Pegasus shell tool and admin-cli run in the same docker container?

from incubator-pegasus.

ninsmiracle avatar ninsmiracle commented on September 27, 2024

Hi, @ninsmiracle !

Is the Pegasus cluster deployed as a onebox in the docker container? Do the Pegasus shell tool and admin-cli run in the same docker container?

When I deloyed as a onebox in my Docker container , cluster run as normal. However, if I deploy it on real node, cluster running but can not accept any RPC.
I think the key point is meta.THREAD_POOL_META_SERVER2.02008e370001000c: rpc_host_port.cpp:62:from_address(): assertion expression: [utils::hostname_from_ip(__bswap_32 (addr.ip()), &hp._host)] invalid host_port 172.17.0.1.

from incubator-pegasus.

acelyc111 avatar acelyc111 commented on September 27, 2024

I connected to peagsus cluster via admlin-cli,such as use this command ./admin-cli -n aaa:25101,bbb:25101,but return fatal: failed to list nodes [context deadline exceeded]

It's because after the main FQDN patch has been merged, a new Thrift structure (i.e. host_port) has been introduced, but the admin-cli side dosen't know this type. You can check it in the admin-cli's shell.log, the error looks like:

time="2024-05-23T00:30:55+08:00" level=info msg="failed to read response from [127.0.0.1:34601(meta)]: *admin.ListNodesResponse error reading struct: *admin.NodeInfo error reading struct: Unknown data type 57"

The resolution is to update the admin-cli dependent go-client. However, we have to resolve #1917 at first.

from incubator-pegasus.

acelyc111 avatar acelyc111 commented on September 27, 2024

@ninsmiracle If it has been resolved, I'll close the issue.

from incubator-pegasus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.