Code Monkey home page Code Monkey logo

Comments (10)

cjbj avatar cjbj commented on July 27, 2024
  • Describe what the program's doing.
  • Try adding the line disable_oob=on to a sqlnet.ora file on the machine where your app is running.
  • Try remove the DIAG_* lines from the sqlnet.ora file.
  • Try with a recent version of the DB.

from odpi.

quidstone avatar quidstone commented on July 27, 2024
  • Our program is calling a procedure which has an insert command inside. We are using a connection string like this:
host:port/service?CONNECT_TIMEOUT=300ms&RECV_TIMEOUT=200ms&SEND_TIMEOUT=60000ms

We used pooled connection where during startup of the application, it would create some connections with the following pool creation params:

poolCreateParams.maxSessions = maxSessions;
poolCreateParams.minSessions = maxSessions;
poolCreateParams.sessionIncrement = 0;
poolCreateParams.pingTimeout = 1000; //millisec
poolCreateParams.pingInterval = 120;//seconds
poolCreateParams.timeout = 0;

We needed constant amount of connection available at all times. Then we added the pool wait time to operate in a NON-BLOCKING mode . If there is a timeout then it would result in a failure which is okay sometimes. It rarely happens though in our scenario. We have seen we were getting more than 300k request per hour where ~70 reqs were not getting any connection. This is okay at out current deployment.

dpiPool_setGetMode(*pool, DPI_MODE_POOL_GET_TIMEDWAIT); 
dpiPool_setWaitTimeout(*pool, timeout);

But when there is a space issue in DB, then the procedure execution would fail after 2s per connection and give the error unable to extend table. We needed the procedure to finish either successfully or fail within a shorter time period. Usually for successful cases it takes 5ms avg so we tried to add a RECV_TIMEOUT of 200ms for the fail cases to expedite any return from procedure execution. Application behaves mostly the way it should but every time during the issue it generates core dump from a segmentation fault deep inside libclntsh which disrupts the production env and congestion occurs in our network.

  • disable_oob=on will be added
  • We added sqlnet.ora to remove huge files created for diagnostic purposes and added those DIAG_* lines and I was confused that the timeouts in the connection string may be overridden by the sqlnet.ora and added those too to the file. Are diagnostics needed for debugging? We have a sqlnet.log file created where the application is running. It creates mainly the following lines,
***********************************************************************
Fatal NI connect error 12520, connecting to:
 (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=)(CID=(PROGRAM=)(HOST=)(USER=root)))(CONNECT_TIMEOUT=300ms)(RECV_TIMEOUT=200ms)(SEND_TIMEOUT=60000ms)(ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=))(ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=))(ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=)))

  VERSION INFORMATION:
        TNS for Linux: Version 19.0.0.0.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
  Version 19.11.0.0.0
  Time: 15-OCT-2021 09:28:29
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12564
    TNS-12564: Message 12564 not found; No message file for product=network, facility=TNS
    ns secondary err code: 12560
    nt main err code: 524
    TNS-00524: Message 524 not found; No message file for product=network, facility=TNS
    nt secondary err code: 115
    nt OS err code: 0


***********************************************************************
Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 19.0.0.0.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
  Version 19.11.0.0.0
  Time: 15-OCT-2021 14:13:35
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535
    TNS-12535: Message 12535 not found; No message file for product=network, facility=TNS
    ns secondary err code: 12609
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=))
  • Our production is running with 11.2 and we don't have any other options right now.

from odpi.

quidstone avatar quidstone commented on July 27, 2024

now we are getting this,

#0  0x00007f496269b4fb in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-324.el7_9.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 numactl-libs-2.0.12-5.el7.x86_64 systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64
(gdb) bt
#0  0x00007f496269b4fb in raise () from /lib64/libpthread.so.0
#1  0x00007f495fe7e1bf in skgesigOSCrash () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#2  0x00007f4960572e3d in kpeDbgSignalHandler () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#3  0x00007f495fe7e492 in skgesig_sigactionHandler () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#4  <signal handler called>
#5  0x00007f495ed62849 in kpucopyuserhndl () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#6  0x00007f495ee02142 in kpuspextend () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#7  0x00007f495ee045ac in kpuspgetfreesession () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#8  0x00007f495ee00f6f in kpuspgetpooledsession () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#9  0x00007f495edfd04d in kpuspsessionget () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#10 0x00007f495eb3dd01 in OCISessionGet () from /usr/lib/oracle/19.11/client64/lib/libclntsh.so
#11 0x00007f496246c3a4 in dpiOci__sessionGet (envHandle=0x1685d00, handle=handle@entry=0x7f48c400d680, authInfo=authInfo@entry=0x7f4865fd7618,
    connectString=connectString@entry=0x16baa68 "OCI:SP:zpuyRFdHlCPgU6hJZQr8DC", connectStringLength=connectStringLength@entry=29, tag=<optimized out>,
    tagLength=0, outTag=outTag@entry=0x7f493f7e58c8, outTagLength=outTagLength@entry=0x7f493f7e58d0, found=found@entry=0x7f493f7e58d4, mode=mode@entry=1,
    error=error@entry=0x7f493f7e5850) at src/dpiOci.c:3136
#12 0x00007f4962450f1c in dpiConn__getSession (error=0x7f493f7e5850, authInfo=0x7f4865fd7618, params=0x7f493f7e5870, connectStringLength=29,
    connectString=0x16baa68 "OCI:SP:zpuyRFdHlCPgU6hJZQr8DC", mode=1, conn=0x7f48c400d660) at src/dpiConn.c:861
#13 dpiConn__get (error=0x7f493f7e5850, pool=0x1685b60, createParams=0x7f493f7e5870, connectStringLength=29,
    connectString=0x16baa68 "OCI:SP:zpuyRFdHlCPgU6hJZQr8DC", passwordLength=<optimized out>, password=<optimized out>, userNameLength=<optimized out>,
    userName=<optimized out>, conn=0x7f48c400d660) at src/dpiConn.c:636
#14 dpiConn__create (conn=0x7f48c400d660, context=<optimized out>, userName=userName@entry=0x0, userNameLength=userNameLength@entry=0,
    password=password@entry=0x0, passwordLength=passwordLength@entry=0, connectString=0x16baa68 "OCI:SP:zpuyRFdHlCPgU6hJZQr8DC", connectStringLength=29,
    pool=pool@entry=0x1685b60, commonParams=commonParams@entry=0x0, createParams=createParams@entry=0x7f493f7e5870, error=error@entry=0x7f493f7e5850)
    at src/dpiConn.c:414
#15 0x00007f496245ad71 in dpiPool__acquireConnection (pool=pool@entry=0x1685b60, userName=userName@entry=0x0, userNameLength=userNameLength@entry=0,
    password=password@entry=0x0, passwordLength=0, params=params@entry=0x7f493f7e5870, conn=conn@entry=0x7f493f7e5958, error=error@entry=0x7f493f7e5850)
    at src/dpiPool.c:36
#16 0x00007f496245af1d in dpiPool_acquireConnection (pool=pool@entry=0x1685b60, userName=userName@entry=0x0, userNameLength=userNameLength@entry=0,
    password=password@entry=0x0, passwordLength=passwordLength@entry=0, params=0x7f493f7e5870, params@entry=0x0, conn=conn@entry=0x7f493f7e5958)
    at src/dpiPool.c:341
#17 0x0000000000406a5e in dpi__getConnFromPool (pool=0x1685b60) at db_util.c:197

sqlnet.log

SQLNET.RECV_TIMEOUT=200ms
SQLNET.OUTBOUND_CONNECT_TIMEOUT=300ms
DISABLE_OOB=on
ADR_BASE=/data01/oracle_diag

We had exactly this issue before adding DIAG_* params in sqlnet.ora file. Here is a stackoverflow thread which suggested DIAG_*'s addition.

from odpi.

tgulacsi avatar tgulacsi commented on July 27, 2024

Maybe

poolCreateParams.sessionIncrement = 1;
poolCreateParams.timeout = 100;

Would be more real-world (what libclntsh writers awaited for)?

from odpi.

quidstone avatar quidstone commented on July 27, 2024

from doc here

dpiPoolCreateParams.timeout

Specifies the length of time (in seconds) after which idle sessions in the pool are terminated. Note that termination only occurs when the pool is accessed. The default value is 0 which means that no idle sessions are terminated. This value can be set after the pool has been created using the function dpiPool_setTimeout() and acquired using the function dpiPool_getTimeout().

dpiPoolCreateParams.sessionIncrement

Specifies the number of sessions that will be created by the session pool when more sessions are required and the number of sessions is less than the maximum allowed. This value is ignored if the dpiPoolCreateParams.homogeneous member has a value of 0. This value added to the dpiPoolCreateParams.minSessions member value must not exceed the dpiPoolCreateParams.maxSessions member value. The default value is 0.

I need a constant number of connections available at all time. so I added,

poolCreateParams.maxSessions = maxSessions;
poolCreateParams.minSessions = maxSessions;

Both have the same value assigned. I don't want to terminate idle connections ,so there will be no cost of creating one when needed. It can fail if there is none available within the pool timeout time. So I decided to assign session increment to zero. As there is no termination, max and min has same value assigned to the maxSessions param.

from odpi.

quidstone avatar quidstone commented on July 27, 2024

@cjbj, Should we consider updating instant client to 19.12? Where should we go from here? Changing DB version is not an option right now.

I am getting the feeling that these tools work better with relatively new versions of oracle DB. But to be honest, this timeout feature is too basic to fail this way. So, I am not sure why it is not working as it should. Is there any other info needed for you guys to understand it better?

from odpi.

tgulacsi avatar tgulacsi commented on July 27, 2024

I cannot unsee the similarity with godror/godror#192 (comment) - we se SIGSEGVs when the pool is full (timeouts set), and acquiring new connections concurrently and fast enough.

Limiting the number of sessions on the call side (Go) evades this.

from odpi.

cjbj avatar cjbj commented on July 27, 2024

The Go issue could be unrelated, so let's treat the issues separately for now.

I just finished a discussion with the OCI developers. They want to know what's going but, but as expected, a testcase would help diagnose the problem.

We could look at trace. Overall it would be best if you could create an Oracle support request for the problem so "sensitive" files like traces can be shared; I don't know whether you can do this since you have 11.2 DB? It would be channelled through the OCI support team.

from odpi.

cjbj avatar cjbj commented on July 27, 2024

@quidstone if you're unable to createa support request (I wonder what your support status of 11.2 is?), I am happy to have a call with you to gather information. Send me email (see my profile).

from odpi.

quidstone avatar quidstone commented on July 27, 2024

hi @cjbj , I have already created a support request through our db-admin at out company. I admit that things are moving slow. This is more of our problem than anyone else's. We have to provide oracle support team traces from both servers(DB, client) when there is an issue. This is taking more time than it should. Unless I share those traces, I can't really tell what's next. We can talk about it a bit though. I will send you an email. thanks.

from odpi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.