Code Monkey home page Code Monkey logo

Comments (8)

inancgumus avatar inancgumus commented on June 17, 2024 1

Actually, all of this investigation started when I was stress testing my web API with vegeta. I wrote this small script to test the same errors happen to isolate the problem from my API. Then, I saw that it also happens with this simple code. So, I thought that it could be a mgo issue. Maybe, it's because of Mongo itself. I don't know yet. Thanks for all your help.

from mgo.

domodwyer avatar domodwyer commented on June 17, 2024

Hi @inancgumus,

Have you tuned the mongod connection limits (either in the config or with --maxConns) and the ulimits? Can you paste the output of db.serverStatus().connections and ulimit -a on your test machine?

It seems SetPoolLimit() isn't supposed to be used as a concurrency limiter. I do agree it's not really the job of mgo to limit concurrency, but if that's the case I have no idea why SetPoolLimit() exists - we use a concurrency throttler in our application code to limit the number of simultaneous calls to mongo and avoid SetPoolLimit() entirely, it works well and I suggest you try the same.

from mgo.

inancgumus avatar inancgumus commented on June 17, 2024

Hi @domodwyer ,

I didn't try --maxConns, but I set ulimit to unlimiteds. Yeah, I had read that view on SetPoolLimit and I aggree on that. I'm just exploring what causes this and how to prevent it, firstly, other than restructing my code.

Did you try the above code on your machine? So, maybe you can see the problem directly.


These are my config details:

ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4864
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited
> db.version()
3.4.4
> db.serverStatus().connections
{ "current" : 1, "available" : 399999, "totalCreated" : 18584 }
> db.runCommand( { getCmdLineOpts: 1 } )
{
	"argv" : [
		"/usr/local/opt/mongodb/bin/mongod",
		"--config",
		"/usr/local/etc/mongod.conf"
	],
	"parsed" : {
		"config" : "/usr/local/etc/mongod.conf",
		"net" : {
			"bindIp" : "127.0.0.1"
		},
		"operationProfiling" : {
			"mode" : "all"
		},
		"storage" : {
			"dbPath" : "/usr/local/var/mongodb"
		},
		"systemLog" : {
			"component" : {
				"query" : {
					"verbosity" : 5
				}
			},
			"destination" : "file",
			"logAppend" : true,
			"path" : "/usr/local/var/log/mongodb/mongo.log",
			"verbosity" : 5
		}
	},
	"ok" : 1
}

/usr/local/etc/mongod.conf:

systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb
net:
  bindIp: 127.0.0.1
operationProfiling:
  mode: all
systemLog:
  verbosity: 5
  component:
    query:
      verbosity: 5

And these are the error messages in mongo log:

4079:2017-08-04T15:39:21.249+0300 D NETWORK  [conn20] SocketException: remote: 127.0.0.1:64246 error: 9001 socket exception [CLOSED] server [127.0.0.1:64246]
4080:2017-08-04T15:39:21.249+0300 D NETWORK  [conn19] SocketException: remote: 127.0.0.1:64243 error: 9001 socket exception [CLOSED] server [127.0.0.1:64243]
4081:2017-08-04T15:39:21.249+0300 D NETWORK  [conn24] SocketException: remote: 127.0.0.1:64245 error: 9001 socket exception [CLOSED] server [127.0.0.1:64245]
4082:2017-08-04T15:39:21.249+0300 D NETWORK  [conn23] SocketException: remote: 127.0.0.1:64244 error: 9001 socket exception [CLOSED] server [127.0.0.1:64244]
4083:2017-08-04T15:39:21.249+0300 D NETWORK  [conn18] SocketException: remote: 127.0.0.1:64242 error: 9001 socket exception [CLOSED] server [127.0.0.1:64242]
4084:2017-08-04T15:39:21.249+0300 D NETWORK  [conn17] SocketException: remote: 127.0.0.1:64241 error: 9001 socket exception [CLOSED] server [127.0.0.1:64241]
4085:2017-08-04T15:39:21.249+0300 D NETWORK  [conn16] SocketException: remote: 127.0.0.1:64240 error: 9001 socket exception [CLOSED] server [127.0.0.1:64240]
4086:2017-08-04T15:39:21.249+0300 D NETWORK  [conn15] SocketException: remote: 127.0.0.1:64239 error: 9001 socket exception [CLOSED] server [127.0.0.1:64239]
4087:2017-08-04T15:39:21.249+0300 D NETWORK  [conn14] SocketException: remote: 127.0.0.1:64238 error: 9001 socket exception [CLOSED] server [127.0.0.1:64238]
4088:2017-08-04T15:39:21.249+0300 D NETWORK  [conn13] SocketException: remote: 127.0.0.1:64237 error: 9001 socket exception [CLOSED] server [127.0.0.1:64237]
4089:2017-08-04T15:39:21.249+0300 D NETWORK  [conn12] SocketException: remote: 127.0.0.1:64236 error: 9001 socket exception [CLOSED] server [127.0.0.1:64236]
4090:2017-08-04T15:39:21.249+0300 D NETWORK  [conn11] SocketException: remote: 127.0.0.1:64235 error: 9001 socket exception [CLOSED] server [127.0.0.1:64235]
4091:2017-08-04T15:39:21.250+0300 D NETWORK  [conn10] SocketException: remote: 127.0.0.1:64234 error: 9001 socket exception [CLOSED] server [127.0.0.1:64234]
4092:2017-08-04T15:39:21.250+0300 D NETWORK  [conn9] SocketException: remote: 127.0.0.1:64233 error: 9001 socket exception [CLOSED] server [127.0.0.1:64233]
4093:2017-08-04T15:39:21.250+0300 D NETWORK  [conn6] SocketException: remote: 127.0.0.1:64229 error: 9001 socket exception [CLOSED] server [127.0.0.1:64229]
4094-2017-08-04T15:39:21.250+0300 I -        [conn444] end connection 127.0.0.1:64961 (480 connections now open)
--
4094-2017-08-04T15:39:21.250+0300 I -        [conn444] end connection 127.0.0.1:64961 (480 connections now open)
4095:2017-08-04T15:39:21.250+0300 D NETWORK  [conn8] SocketException: remote: 127.0.0.1:64231 error: 9001 socket exception [CLOSED] server [127.0.0.1:64231]
4096:2017-08-04T15:39:21.250+0300 D NETWORK  [conn4] SocketException: remote: 127.0.0.1:64227 error: 9001 socket exception [CLOSED] server [127.0.0.1:64227]
4097:2017-08-04T15:39:21.250+0300 D NETWORK  [conn7] SocketException: remote: 127.0.0.1:64230 error: 9001 socket exception [CLOSED] server [127.0.0.1:64230]
4098:2017-08-04T15:39:21.250+0300 D NETWORK  [conn5] SocketException: remote: 127.0.0.1:64228 error: 9001 socket exception [CLOSED] server [127.0.0.1:64228]
4099:2017-08-04T15:39:21.250+0300 D NETWORK  [conn3] SocketException: remote: 127.0.0.1:64226 error: 9001 socket exception [CLOSED] server [127.0.0.1:64226]
4100:2017-08-04T15:39:21.250+0300 D NETWORK  [conn2] SocketException: remote: 127.0.0.1:64225 error: 9001 socket exception [CLOSED] server [127.0.0.1:64225]
4101:2017-08-04T15:39:21.250+0300 D NETWORK  [conn1] SocketException: remote: 127.0.0.1:64224 error: 9001 socket exception [CLOSED] server [127.0.0.1:64224]
4102-2017-08-04T15:39:21.250+0300 I -        [conn388] end connection 127.0.0.1:64960 (480 connections now open)
4103-2017-08-04T15:39:21.250+0300 I -        [conn422] end connection 127.0.0.1:64959 (480 connections now open)
4104-2017-08-04T15:39:21.250+0300 I -        [conn387] end connection 127.0.0.1:64958 (480 connections now open)
4105-2017-08-04T15:39:21.250+0300 I -        [conn443] end connection 127.0.0.1:64957 (480 connections now open)
4106-2017-08-04T15:39:21.250+0300 I -        [conn386] end connection 127.0.0.1:64956 (480 connections now open)
4107-2017-08-04T15:39:21.250+0300 I -        [conn384] end connection 127.0.0.1:64954 (480 connections now open)
4108-2017-08-04T15:39:21.250+0300 I -        [conn385] end connection 127.0.0.1:64955 (480 connections now open)
4109-2017-08-04T15:39:21.250+0300 I -        [conn383] end connection 127.0.0.1:64953 (480 connections now open)
4110-2017-08-04T15:39:21.250+0300 I -        [conn382] end connection 127.0.0.1:64952 (480 connections now open)
4111-2017-08-04T15:39:21.251+0300 I -        [conn442] end connection 127.0.0.1:64951 (480 connections now open)

from mgo.

inancgumus avatar inancgumus commented on June 17, 2024

Now I tried maxConns option and there are still the same errors.

$ mongod --config /usr/local/etc/mongod.conf --maxConns=500000

.
.
.
ERR: read tcp 127.0.0.1:52547->127.0.0.1:27017: read: connection reset by peer
ERR: read tcp 127.0.0.1:52547->127.0.0.1:27017: read: connection reset by peer
.
.
.
ERR: read tcp 127.0.0.1:51524->127.0.0.1:27017: read: operation timed out
ERR: read tcp 127.0.0.1:51524->127.0.0.1:27017: read: operation timed out
ERR: read tcp 127.0.0.1:51525->127.0.0.1:27017: read: operation timed out
ERR: read tcp 127.0.0.1:51526->127.0.0.1:27017: read: operation timed out
ERR: read tcp 127.0.0.1:51521->127.0.0.1:27017: read: operation timed out
ERR: read tcp 127.0.0.1:51643->127.0.0.1:27017: read: operation timed out
.
.
.

from mgo.

inancgumus avatar inancgumus commented on June 17, 2024

And there are also new errors from mgo like this (weirdly, not always happens):

$ mongod --config /usr/local/etc/mongod.conf --maxConns=500000 --slowms=50
goroutine 11389 [IO wait]:
net.runtime_pollWait(0x3258b40, 0x72, 0x760)
	/usr/local/opt/go/libexec/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc424d1fc68, 0x72, 0x1270060, 0x126e260)
	/usr/local/opt/go/libexec/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc424d1fc68, 0xc420555f50, 0x24)
	/usr/local/opt/go/libexec/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc424d1fc00, 0xc420555f50, 0x24, 0x24, 0x0, 0x1270060, 0x126e260)
	/usr/local/opt/go/libexec/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc42000f6b0, 0xc420555f50, 0x24, 0x24, 0x0, 0x0, 0x0)
	/usr/local/opt/go/libexec/src/net/net.go:181 +0x70
github.com/globalsign/mgo.fill(0x1272100, 0xc42000f6b0, 0xc420555f50, 0x24, 0x24, 0x4, 0x0)
	/Users/inanc/dev/go/src/github.com/globalsign/mgo/socket.go:567 +0x53
github.com/globalsign/mgo.(*mongoSocket).readLoop(0xc4246f8870)
	/Users/inanc/dev/go/src/github.com/globalsign/mgo/socket.go:583 +0x107
created by github.com/globalsign/mgo.newSocket
	/Users/inanc/dev/go/src/github.com/globalsign/mgo/socket.go:196 +0x259

from mgo.

domodwyer avatar domodwyer commented on June 17, 2024

Hi @inancgumus

I ran your code and I can only reproduce with operationProfiling.mode = all which slows down operations significantly - turning this off will help performance in general, but this suggests your DB ops are slowish, and therefore are holding onto connections while they complete. The mgo driver will keep dialing new connections as it needs them, up till the limit set by SetPoolLimit() which is 4096 by default.

It's not actually possible to have 500,000 connections as you'll exhaust your ephemeral ports, but your ulimits -a above shows:

open files                      (-n) 4864

So you can't open more than 4864 sockets & files combined. When you hit this limit you'll get all sorts of network errors - try checking the number of connections you have open/waiting with netstat -an | grep tcp | wc -l and tweaking resource limits like kern.ipc.somaxconn - google will help here.

This doesn't seem to be a mgo issue - tuning your network stack will help but it really is best to limit concurrent ops in code. Saying that, feel free to reopen if you think it is a mgo issue!

Dom

from mgo.

inancgumus avatar inancgumus commented on June 17, 2024

your DB ops are slowish, and therefore are holding onto connections while they complete

My DB ops as in the code above and they're very simple and the collection only has a few records. So, I couldn't get what's wrong with them. Is there a problem in the code above?

I tried netstat and also lsof -n -i4TCP:27017 | wc -l to watch mongo connections. I also set kern.ipc.somaxconn and ulimit files to 10240 etc. And, then I increased with sess.SetPoolLimit(10000). I run the code again while netstat was showing 80 conns. I also limited the code with 500 ops. Whenever, netstat reaches near 1.000 conns, i/o timeout and conn reset by peer errors happen.

I don't know yet how this is not related to mongo or mgo. I'll investigate the same with other libraries or even directly reaching to mongo.

from mgo.

domodwyer avatar domodwyer commented on June 17, 2024

My DB ops as in the code above and they're very simple and the collection only has a few records. So, I couldn't get what's wrong with them. Is there a problem in the code above?

The code seems fine, but operationProfiling.mode = all will slow down operations considerably - try running without it. DB ops can also be slowed down by maxing the disk I/O, CPU, or any other resource shared between your app, Mongo, and any other program on your machine which is quite a few on your laptop!

Running an aggregation (or in this case, loads of parallel aggregations) over all records in your collection obviously takes proportionally long to the number of records, so each query will take longer with every record. This will definitely be holding onto a connection while it completes, meaning you'll need more and more connections to service requests. There might be a more efficient way to do this depending on your needs.

Best of luck!

from mgo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.