scylladb / gocql Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gocql/gocql

166.0 166.0 44.0 4.07 MB

Package gocql implements a fast and robust ScyllaDB client for the Go programming language.

Home Page: https://docs.scylladb.com/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html

License: BSD 3-Clause "New" or "Revised" License

Go 99.89% Shell 0.11%

gocql's People

Contributors

Stargazers

Watchers

Forkers

rkuska buni avikivity rjeczalik martin-sucha kiwicom operasoftware benguang cbass561 guy9 zimnx manmanson piodul jaihind213 dogenkigen amoskong sfroment guangminglion optimyze nuivall codebreaker101 ponewor steve-gray simonfrey bkhamitov vponomaryov linuxdick ketch-com tnozicka wprzytula lorak-mmk avelanarius gor027 inloco dkropachev chandervir-k sylwiaszunejko hikitani atlant1da-404 rohankumardubey moguchev sharechat mykaul roydahan

gocql's Issues

Time type not working properly

See upstream issue gocql#1284

We need to either integrate a fix or fix it here.

Upstream PR: gocql#1283

How should paging be used in rest APIs

While there are some examples here and here on how paging should work in theory, these examples do not clearly show on how you would use the paging state in a Json rest API.
So the theory is that you execute a select statement, save the current paging state, return it to the user in string format and then use it the next time the user requests the next page.

I've tried using the examples you've posted such as:
I have a struct that contains the paging state parameter

type MessagesRestDetails struct {
  PageState string `form:"pagestate" binding:"omitempty"`
}

And when a request is sent I bind it to a struct variable

var messagesRestDetails structs.MessagesRestDetails        
if err := c.BindQuery(&messagesRestDetails); err != nil {
  c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
  return
}

I then execute the query as

var pageState []byte
if messagesRestDetails.PageState != "" {
  ps, err := base64.StdEncoding.DecodeString(messagesRestDetails.PageState)
  if err != nil {
      c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
      return
  }
  pageState = ps
}

iter := session.Query("SELECT * FROM messages.messages WHERE gid = ?", messagesGid.GID).PageSize(3).PageState(pageState).Iter()
nextPageState := iter.PageState()

And then the nextPageState I encode it to string and return it to the client

encodedStr := base64.StdEncoding.EncodeToString([]byte(nextPageState))

So while the first time I request this it works and the first three elements are returned in the API, when using the received page state value and send it in a new request I get"error": "illegal base64 data at input byte 165". That's because the encoded string contains a '\' character.
So the question is how to properly send and receive the paging state in these requests?

Remove confusing old git tags in the repository

Hi,

In the git repository (https://github.com/scylladb/gocql/tags), there are old git tags with higher digit number than current release !
Some dependency tools (like renovate bot) are confused and get release 1.10.0 for example.
Last scylladb/gocql official release is 1.7.3, but we have tags numbered 1.8.0 or 1.10.0 from 2021! This is confusing.

Is it possible to remove theses old tags or do something ? :)

Thank you !

Missing decrement of connection count when removing a sharded connection from the sharded pool

Write shard-aware ports blog post

As a followup to #52 we need to write a blog post introducing the feature.

It should contain:

a description of how it works
proof of usefulness - benchmark where many clients try to open connections so that we can show that before we opened X connections per shard now it's 1 flat

Release a new version

Fetch upstream changes and tag a release.

Update README to inform how to use this fork with go modules

I deleted the template as I think it is not applicable for this issue.

Currently README is missing an information about how to install the package with the recent addition of golang - Go modules.

To replace the original package it is needed to run following command:

go mod edit -replace=github.com/gocql/gocql=github.com/scylladb/[email protected]

I tried it with my fork (as I couldn't manage to load replaced module with delve):

Added a dumb debug print:
rkuska@998fd40
Replaced gocql/gocql:
go mod edit -replace=github.com/gocql/gocql=github.com/rkuska/[email protected]
Run my module and show the debug print test in the output

I am happy to contribute with a patch to update the README to reflect this. Well unless I am wrong :-)

panic: not a sharded connection

Sometime github.com/gocql/gocql panics in our environment from the unknown reason.

What version of Scylla or Cassandra are you using?

ScyllaDB 2018.1.10

What version of Gocql are you using?

/go/pkg/mod/github.com/scylladb/[email protected]

What did you do?

Typically we are scanning structs with many fields. But I'm not sure if it is related to any calls to the database.

What did you expect to see?

No panic.

What did you see instead?

panic: scylla: IP:PORT not a sharded connection

goroutine 791411 [running]:
github.com/gocql/gocql.(*scyllaConnPicker).Remove(0xc000745c80, 0xc0184fc360)
	/go/pkg/mod/github.com/scylladb/[email protected]/scylla.go:110 +0x127
github.com/gocql/gocql.(*hostConnPool).HandleError(0xc0002442a0, 0xc0184fc360, 0x1036240, 0xc00aa2abe0, 0xc0309bce01)
	/go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:547 +0xa4
github.com/gocql/gocql.(*Conn).closeWithError(0xc0184fc360, 0x1036240, 0xc00aa2abe0)
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:469 +0x289
github.com/gocql/gocql.(*Conn).exec(0xc0184fc360, 0x104eb60, 0xc0087cb5c0, 0x1034e80, 0x1972c48, 0x0, 0x0, 0x0, 0x0, 0x0)
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:823 +0x8b4
github.com/gocql/gocql.(*startupCoordinator).write(0xc01a810430, 0x104eb60, 0xc0087cb5c0, 0x1034e80, 0x1972c48, 0x8ea755, 0x101e6e3, 0x33, 0xc000cb8f48)
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:340 +0x188
github.com/gocql/gocql.(*startupCoordinator).options(0xc01a810430, 0x104eb60, 0xc0087cb5c0, 0x101e6e3, 0x33)
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:349 +0x75
github.com/gocql/gocql.(*startupCoordinator).setupConn.func2(0xc01a810430, 0x104eb60, 0xc0087cb5c0, 0xc00159dce0)
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:314 +0x8b
created by github.com/gocql/gocql.(*startupCoordinator).setupConn
	/go/pkg/mod/github.com/scylladb/[email protected]/conn.go:312 +0xdf

Tuples are not supported by gocql

What version of Scylla or Cassandra are you using?

Scylla 4.1.0

What version of Gocql are you using?

v1.4.1-0.20200520132847-57de5e5cdd5c

What did you do?

Wanted to use field with tuple<text, int> on the table and custom type struct field that correspond to it:

type Field struct {
  Text string
  Int int32
}

And then insert and select from this table.

What did you expect to see?

That marshaling/unmarshaling is working automatically given that tuples and structs are supported options.

What did you see instead?

An error:

gocql: expected 10 values send got 9

It seems gocql is expecting insert values to expand their tuple values but when trying to insert with expanded values it complains again.

Multiple session during go tests, resulting in failed tests

Please answer these questions before submitting your issue. Thanks!

Background:
I'm running golang integration tests that connect to one of our staging environments scylladb clusters. When i run go test ./... it actually runs 2 packages in parallel (sub directories from root directory where i called go test ./...). In each of these packages, there is one main _test.go file that runs through a suite of tests, creating their own session but under the same configurations. The close separately once their tests have finished.

When i run each of these packages seperately, the work perfectly. All tests pass, no timeouts, etc. However, when I run them together, I get the failing tests in both (1 failed test for each package), and occasionally the stack trace being printed out with the goroutines as shown in the "### What did you see instead?" section.

So my question is, Is it bad to have multiple session when running go tests? It seems to only have these issues when both of these sessions are open. Any help would be appreciated!

What version of Scylla or Cassandra are you using?

Scylla Enterprise 2021.1.8

What version of Gocql are you using?

github.com/gocql/gocql v0.0.0-20191102131523

What version of Go are you using?

go version go1.17.5 darwin/amd64

What did you do?

Ran integration tests

What did you expect to see?

Passing tests

What did you see instead?

goroutine 85 [select]:
github.com/gocql/gocql.(*writeCoalescer).writeFlusher(0xc000382f00, 0x30d40)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:814 +0x139
created by github.com/gocql/gocql.newWriteCoalescer
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:719 +0x125

goroutine 83 [IO wait]:
internal/poll.runtime_pollWait(0x7f96effc9310, 0x72, 0xffffffffffffffff)
/opt/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0003a0398, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
/opt/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/opt/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0003a0380, 0xc000346000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/opt/go/src/internal/poll/fd_unix.go:166 +0x1d5
net.(*netFD).Read(0xc0003a0380, 0xc000346000, 0x1000, 0x1000, 0xc0003a7080, 0x3, 0xc0003ade88)
/opt/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc0002c8300, 0xc000346000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/opt/go/src/net/net.go:183 +0x91
bufio.(*Reader).Read(0xc0002819e0, 0xc00033e228, 0x1, 0x9, 0x40, 0x38, 0xce7ea0)
/opt/go/src/bufio/bufio.go:227 +0x222
io.ReadAtLeast(0xe23ec0, 0xc0002819e0, 0xc00033e228, 0x1, 0x9, 0x1, 0xc0003add70, 0x410058, 0x38)
/opt/go/src/io/io.go:328 +0x87
io.ReadFull(...)
/opt/go/src/io/io.go:347
github.com/gocql/gocql.readHeader(0xe23ec0, 0xc0002819e0, 0xc00033e228, 0x9, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/frame.go:449 +0x96
github.com/gocql/gocql.(*Conn).recv(0xc00033e1e0, 0x0, 0x0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:593 +0xfd
github.com/gocql/gocql.(*Conn).serve(0xc00033e1e0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:510 +0x31
created by github.com/gocql/gocql.(*Session).dialWithoutObserver
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:283 +0x6ca

goroutine 86 [IO wait]:
internal/poll.runtime_pollWait(0x7f96effc93f8, 0x72, 0xffffffffffffffff)
/opt/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc00012a818, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
/opt/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/opt/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00012a800, 0xc000348000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/opt/go/src/internal/poll/fd_unix.go:166 +0x1d5
net.(*netFD).Read(0xc00012a800, 0xc000348000, 0x1000, 0x1000, 0xc0003a7200, 0x3, 0xc000082e88)
/opt/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc0002c8310, 0xc000348000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/opt/go/src/net/net.go:183 +0x91
bufio.(*Reader).Read(0xc000281c20, 0xc00033e318, 0x1, 0x9, 0x40, 0x38, 0xce7ea0)
/opt/go/src/bufio/bufio.go:227 +0x222
io.ReadAtLeast(0xe23ec0, 0xc000281c20, 0xc00033e318, 0x1, 0x9, 0x1, 0xc000082d70, 0x410058, 0x38)
/opt/go/src/io/io.go:328 +0x87
io.ReadFull(...)
/opt/go/src/io/io.go:347
github.com/gocql/gocql.readHeader(0xe23ec0, 0xc000281c20, 0xc00033e318, 0x9, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/frame.go:449 +0x96
github.com/gocql/gocql.(*Conn).recv(0xc00033e2d0, 0x0, 0x0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:593 +0xfd
github.com/gocql/gocql.(*Conn).serve(0xc00033e2d0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:510 +0x31
created by github.com/gocql/gocql.(*Session).dialWithoutObserver
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:283 +0x6ca

goroutine 84 [select]:
github.com/gocql/gocql.(*Conn).heartBeat(0xc00033e1e0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:550 +0x110
created by github.com/gocql/gocql.(*Session).dialWithoutObserver
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:284 +0x6ef

goroutine 87 [select]:
github.com/gocql/gocql.(*Conn).heartBeat(0xc00033e2d0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:550 +0x110
created by github.com/gocql/gocql.(*Session).dialWithoutObserver
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:284 +0x6ef

goroutine 99 [select]:
github.com/gocql/gocql.(*Conn).exec(0xc0004280f0, 0xe35860, 0xc000125140, 0xe24700, 0xc0002ca300, 0x0, 0x0, 0x0, 0x0, 0x0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:902 +0x3d3
github.com/gocql/gocql.(*Conn).executeQuery(0xc0004280f0, 0xe35860, 0xc000125140, 0xc0001f6240, 0x2a89755125acdb)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:1121 +0x569
github.com/gocql/gocql.(*Query).execute(0xc0001f6240, 0xe35860, 0xc000125140, 0xc0004280f0, 0x6be717)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:949 +0x49
github.com/gocql/gocql.(*queryExecutor).attemptQuery(0xc0003802a0, 0xe35860, 0xc000125140, 0xe3e5d0, 0xc0001f6240, 0xc0004280f0, 0xc0003ca000)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/query_executor.go:29 +0x84
github.com/gocql/gocql.(*queryExecutor).do(0xc0003802a0, 0xe35860, 0xc000125140, 0xe3e5d0, 0xc0001f6240, 0x40f8fb)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/query_executor.go:112 +0x1c7
github.com/gocql/gocql.(*queryExecutor).executeQuery(0xc0003802a0, 0xe3e5d0, 0xc0001f6240, 0x0, 0x0, 0x0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/query_executor.go:60 +0xfb
github.com/gocql/gocql.(*Session).executeQuery(0xc00039ea80, 0xc0001f6240, 0xd36216)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:426 +0xb2
github.com/gocql/gocql.(*Query).Iter(0xc0001f6240, 0xc0001f6240)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:1130 +0x45
github.com/gocql/gocql.(*Query).Exec(0xc0001f6240, 0xc0001f6240, 0x1e)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:1113 +0x2b
medium.com/pkg/cql.(*Client).Exec(0xc000010308, 0xe35860, 0xc000125110, 0xe25820, 0xc00000e978, 0x0, 0x0)
/workspace/go/pkg/cql/client.go:84 +0x1e5
medium.com/cmd/store/scylla.(*Store).DropTables(0xc00022c0e0, 0xe357f0, 0xc00003c0a8, 0xc0002cd790, 0x9, 0x0, 0x0)
/workspace/go/cmd/store/scylla/tables.go:324 +0x17e
medium.com/cmd/store/scylla.testFeatures(0xc000103c80, 0xe357f0, 0xc00003c0a8, 0xc00022c0e0)
/workspace/go/cmd/store/scylla/entity_test.go:118 +0x1665
medium.com/cmd/store/scylla.TestStore.func1.1(0xc000103c80)
/workspace/go/cmd/store/scylla/store_test.go:18 +0x4a
testing.tRunner(0xc000103c80, 0xc000325920)
/opt/go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
/opt/go/src/testing/testing.go:1239 +0x2b3

goroutine 11 [select, 2 minutes]:
github.com/gocql/gocql.(*Session).reconnectDownedHosts(0xc00039ea80, 0xdf8475800)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:286 +0x1b1
created by github.com/gocql/gocql.(*Session).init
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/session.go:252 +0x6df

goroutine 47 [select]:
github.com/gocql/gocql.(*writeCoalescer).writeFlusher(0xc00034a2a0, 0x30d40)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:814 +0x139
created by github.com/gocql/gocql.newWriteCoalescer
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:719 +0x125

goroutine 49 [select]:
github.com/gocql/gocql.(Conn).heartBeat(0xc0004280f0)
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:550 +0x110
created by github.com/gocql/gocql.(Session).dialWithoutObserver
/builder/home/go/pkg/mod/github.com/gocql/[email protected]/conn.go:284 +0x6ef

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

panic from `unmarshalDecimal`

Stack:

Command error: panic: runtime error: slice bounds out of range [4:0]
goroutine 3241 [running]:
github.com/gocql/gocql.unmarshalDecimal(0xceb600, 0xc023cf7740, 0xc14610b0b7, 0x0, 0x2c06f48, 0xbc1040, 0xc175aaa240, 0xc175aaa240, 0x16)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:1094 +0x495
github.com/gocql/gocql.Unmarshal(0xceb600, 0xc023cf7740, 0xc14610b0b7, 0x0, 0x2c06f48, 0xbc1040, 0xc175aaa240, 0x4ad253, 0xc00e81ed30)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:150 +0x3b6
github.com/gocql/gocql.unmarshalNullable(0xceb600, 0xc023cf7740, 0xc14610b0b7, 0x0, 0x2c06f48, 0xc010c493c0, 0xc23421fbf8, 0xc23421fbf8, 0x16)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:204 +0x219
github.com/gocql/gocql.Unmarshal(0xceb600, 0xc023cf7740, 0xc14610b0b7, 0x0, 0x2c06f48, 0xc010c493c0, 0xc23421fbf8, 0xc023cf7780, 0xc14610b0b3)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:127 +0xe5c
github.com/gocql/gocql.unmarshalMap(0xceb5c0, 0xc02b4a0900, 0xc14610b0b7, 0x7ee, 0x2c06f4c, 0xc00c179c80, 0xc2341cb378, 0xc02ec910aa, 0x8005b1)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:1662 +0x6e6
github.com/gocql/gocql.Unmarshal(0xceb5c0, 0xc02b4a0900, 0xc14610a9ff, 0xea6, 0x2c07600, 0xc00c179c80, 0xc2341cb378, 0x0, 0x0)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:158 +0xcbf
github.com/gocql/gocql.scanColumn(0xc14610a9ff, 0xea6, 0x2c07600, 0xc02d0c786c, 0x3, 0xc02d0c7870, 0x6, 0xc02d0c7878, 0x4, 0xceb5c0, ...)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/session.go:1346 +0x274
github.com/gocql/gocql.(*Iter).Scan(0xc04b4487e0, 0xc0ec293ba0, 0xf, 0x1a, 0x1197e40)
/home/penberg/go/pkg/mod/github.com/scylladb/[email protected]/session.go:1446 +0x2e5
github.com/gocql/gocql.(*Iter).MapScan(0xc04b4487e0, 0xc1757f2ab0, 0x1401)

Peace of unsafe code:

func unmarshalDecimal(info TypeInfo, data []byte, value interface{}) error {
	switch v := value.(type) {
	case Unmarshaler:
		return v.UnmarshalCQL(info, data)
	case *inf.Dec:
		scale := decInt(data[0:4])
		unscaled := decBigInt2C(data[4:], nil)
		*v = *inf.NewDecBig(unscaled, inf.Scale(scale))
		return nil
	}
	return unmarshalErrorf("can not unmarshal %s into %T", info, value)
}

support for Statement request size

Java driver has requestSizeInBytes
to give a higher estimate of Statement size for counting the batch size
https://docs.datastax.com/en/drivers/java/3.7/com/datastax/driver/core/Statement.html#requestSizeInBytes-com.datastax.driver.core.ProtocolVersion-com.datastax.driver.core.CodecRegistry-

go has only Batch::Size()

so when client wants to do exact batch splitting, it's hard to count the statement size

Could above be ported over from java to go?

Java logic is in:
https://github.com/scylladb/java-driver/blob/3.7.1-scylla-2/driver-core/src/main/java/com/datastax/driver/core/BatchStatement.java#L203
or
https://github.com/scylladb/java-driver/blob/3.7.1-scylla-2/driver-core/src/main/java/com/datastax/driver/core/RegularStatement.java#L185
or
https://github.com/scylladb/java-driver/blob/3.7.1-scylla-2/driver-core/src/main/java/com/datastax/driver/core/BoundStatement.java#L307
(and similar, just query java driver tree for requestSizeInBytes)

Data Race During --race Test

Probably needs proper mutexing - this one periodically causes flaking in CI for me. The items hitting it are:
github.com/gocql/gocql.(*tokenAwareHostPolicy).Init() vs github.com/gocql/gocql. (*tokenAwareHostPolicy).updateReplicas()

Expand test setup matrix with Scylla

Currently the test suite runs only on Cassandra which for obvious reasons is insufficient.
We should extend the test matrix and add bootstrapping of scylla nodes.

gocql does not re-resolve DNS names (issue 831 from https://github.com/gocql/gocql/issues/831)

DNS remains the same, but IP changes for the control connection node. Need to understand the behavior when a node is replaced:

Tasks

Beta Give feedback

The node is not the one the control connection is with.
The node is the one the control connection is with.
Options

Question on bulk queries

Hello!

(not sure if this is the right place to ask questions - feel free to close if not)
Going over some online resources on Scylla performance, there's a recurring theme, which is:

it may be beneficial to do bulk queries (using SELECT ... IN on reads and BATCH on writes) instead of doing more parallel queries.
when doing bulk queries, it is better to avoid requiring coordination across multiple nodes (= avoid bulk queries that require data from different nodes)

Example resources that touch on this:

On SELECT ... IN queries: https://www.scylladb.com/tech-talk/planning-queries-maximum-performance-scylla-summit-2017/ (a bit old, but I think it is still relevant?)
On BATCH insert queries: https://www.scylladb.com/2019/03/27/best-practices-for-scylla-applications/

Now, the application I'm working on needs to do large bulk reads and writes, so I want to make sure I'm not adding artificial bottlenecks.

With that in mind, I'm a bit puzzled about how to implement the above recommendations with gocql. Ideally the driver would let me split my bulk queries by node, but there does not seem to be a way to do so (?)
It seems the closest I can do using gocql is to have my application split bulk queries by partition key. But if I'm trying to read, let's say, 200K rows with all different partition keys, this basically means bulk queries will all be split into 200K single-row queries - which basically means I can't use bulk queries in practice - which may be detrimental to my application performance.

Given that the driver has access to the information that would be required to split queries intelligently, I was wondering why there is no such facility? I'm not sure whether I am misunderstanding how to best use the driver, or if the driver is currently lacking the feature. It would surely require the driver to expose a rather odd-looking API (as the current gocql API seems pretty abstracted away from such concerns) - but that seems required in order to make bulk queries as efficient as they can be.

I'm completely new to the Scylla ecosystem so apologies if my question seems odd. Advice welcome, and sorry if I missed something obvious :)

Thanks!

SingleHostExecutor() doesn't close the connection properly

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

Scylla Enterprise 2022.1.0

What version of Gocql are you using?

github.com/scylladb/gocql v1.5.1-0.20210906110332-fb22d64efc33

What version of Go are you using?

go1.19

What did you do?

Added credentials to scylla manager so that the healthcheck service uses SingleHostExecutor() that executes simple CQL query SELECT now() FROM system.local on the node.
Afterwards, manager closes the connection with Close() method.

See https://github.com/scylladb/scylla-manager/blob/e25e51487cae81fc04eb67fa5b957249a4ac6801/pkg/ping/cqlping/cqlping.go#L148-L157

What did you expect to see?

I expect to see that the connection is closed.

What did you see instead?

I see that connection leaks and manager keep increasing number of opened connections to the node.
More details here https://github.com/scylladb/scylla-enterprise/issues/2965

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

Panic in gocql.marshalVarchar

What version of Scylla or Cassandra are you using?

3.0.7-0.20190528.4c16c1fe1

What version of Gocql are you using?

commit id = "91173a01ffb95ef90b31335d5cfc83eeda502a33"

What did you do?

call gocql get to get item

What did you expect to see?

no panic

What did you see instead?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x461d2c]
	goroutine 709566504 [running]:
project/vendor/github.com/gocql/gocql.marshalVarchar(0x1b2d820, 0xc65b38a140, 0x149f340, 0xc68f1ada90, 0x0, 0xc09e53ce00, 0xc40ac045b0, 0xc40b95d7f0, 0xc40b95d7f8)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/marshal.go:214 +0x486
project/vendor/github.com/gocql/gocql.Marshal(0x1b2d820, 0xc65b38a140, 0x149f340, 0xc68f1ada90, 0x40e246, 0xc44a273bc0, 0x403b7f, 0xc40b95d940, 0xc40b95d880)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/marshal.go:68 +0x3b0
project/vendor/github.com/gocql/gocql.createRoutingKey(0xc478e904b0, 0xc68f1adaa0, 0x1, 0x1, 0xcc, 0xc478e904b0, 0x0, 0x0, 0x40f930)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/session.go:1732 +0xa7
project/vendor/github.com/gocql/gocql.(*Query).GetRoutingKey(0xc409867680, 0xc40b95d948, 0x4416f3, 0x149c9c0, 0xc06c0, 0x144f6c0)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/session.go:991 +0xf7
project/vendor/github.com/gocql/gocql.(*tokenAwareHostPolicy).Pick(0xc06d59c690, 0x1b42ca0, 0xc409867680, 0xc40b95da90)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/policies.go:591 +0x52
project/vendor/github.com/gocql/gocql.(*queryExecutor).do(0xc25968ea60, 0x1b2c860, 0xc4ab375c80, 0x1b42ca0, 0xc409867680, 0xb)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/query_executor.go:87 +0x63
project/vendor/github.com/gocql/gocql.(*queryExecutor).executeQuery(0xc25968ea60, 0x1b42ca0, 0xc409867680, 0x0, 0x0, 0x0)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/query_executor.go:60 +0x437
project/vendor/github.com/gocql/gocql.(*Session).executeQuery(0xc02c857c00, 0xc409867680, 0x17c3991)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/session.go:407 +0xb7
project/vendor/github.com/gocql/gocql.(*Query).Iter(0xc409867680, 0x28)
    /home/user/go/src/project/vendor/github.com/gocql/gocql/session.go:1110 +0xa5
project/vendor/github.com/scylladb/gocqlx.Iter(...)
    /home/user/go/src/project/vendor/github.com/scylladb/gocqlx/iterx.go:46
project/vendor/github.com/scylladb/gocqlx.(*Queryx).Get(0xc40b95de20, 0x1593d20, 0xc40880a9a0, 0x179b3e0, 0xc02c857c00)
    /home/user/go/src/project/vendor/github.com/scylladb/gocqlx/queryx.go:212 +0x70
project/vendor/github.com/scylladb/gocqlx.(*Queryx).GetRelease(0xc40b95de20, 0x1593d20, 0xc40880a9a0, 0x0, 0x0)
    /home/user/go/src/project/vendor/github.com/scylladb/gocqlx/queryx.go:219 +0x84

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

Batch too large

What version of Scylla or Cassandra are you using?

Scylla 4.4.6-0.20211028.dd018d4de

What version of Gocql are you using?

v1.5.0

What version of Go are you using?

1.17.1 linux/amd64

What did you do?

I watched a couple of talks about inserting 1 milion metrics a second in cassandra/scylla. I'm trying to replicate the result in go and have hit a wall. In the mentioned talks it is said that with the settings that I have set I would be able have 10000 queries in one batch. After altering demo code to use just one partition the warning goes away. KairosDB, that is mentioned in the talk, is creating batches based on their destination node/shard. To my knowledge this is not available in gocql driver (#75 ) unlike DataStax driver for java.

Did I miss anything? Is it possible to push 1 million request per second with go?

What did you expect to see?

I expected to be able to push 10000 queries in one batch into scylla

What did you see instead?

scylladb-scylla1-1  | WARN  2021-11-16 15:43:49,980 [shard 0] BatchStatement - Batch modifying 602 partitions in test.data_points is of size 153510 bytes, exceeding specified WARN threshold of 152576 by 934.
scylladb-scylla1-1  | ERROR 2021-11-16 15:56:14,120 [shard 0] BatchStatement - Batch modifying 603 partitions in test.data_points is of size 153765 bytes, exceeding specified FAIL threshold of 153600 by 165.

Steps to Replicate

scylla.yaml

num_tokens: 256
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "127.0.0.1"
listen_address: localhost
native_transport_port: 9042
native_shard_aware_transport_port: 19042
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
cas_contention_timeout_in_ms: 1000
endpoint_snitch: SimpleSnitch
rpc_address: localhost
rpc_port: 9160
api_port: 10000
api_address: 127.0.0.1
batch_size_warn_threshold_in_kb: 149
batch_size_fail_threshold_in_kb: 150
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
commitlog_total_space_in_mb: -1
murmur3_partitioner_ignore_msb_bits: 12
api_ui_dir: /opt/scylladb/swagger-ui/dist/
api_doc_dir: /opt/scylladb/api/api-doc/
native_transport_max_threads: 2000

docker-compose.yaml

version: "3"

services:
  scylla1:
    image: scylladb/scylla
    command: --seeds=scylla1,scylla2 --smp 1 --memory 2048M --overprovisioned 1 --api-address 0.0.0.0
    ports:
      - 9042:9042
      - 19042:19042
    volumes:
      - "./scylla/scylla.yaml:/etc/scylla/scylla.yaml"
    networks:
      web:

  scylla2:
    image: scylladb/scylla
    command: --seeds=scylla1,scylla2 --smp 1 --memory 2048M --overprovisioned 1 --api-address 0.0.0.0
    ports:
      - 9043:9042
      - 19043:19042
    volumes:
      - "./scylla/scylla.yaml:/etc/scylla/scylla.yaml"
    networks:
      web:

  scylla3:
    image: scylladb/scylla
    command: --seeds=scylla1,scylla2 --smp 1 --memory 2048M --overprovisioned 1 --api-address 0.0.0.0
    ports:
      - 9044:9042
      - 19044:19042
    volumes:
      - "./scylla/scylla.yaml:/etc/scylla/scylla.yaml"
    networks:
      web:

networks:
  web:
    driver: bridge

GO

import (
	"context"
	"fmt"
	"math/rand"
	"testing"
	"time"

	"github.com/gocql/gocql"
	"github.com/gocql/gocql/lz4"
)

/*
// go.mod
replace github.com/gocql/gocql => github.com/scylladb/gocql v1.5.0

// scylla.yaml
batch_size_warn_threshold_in_kb: 149
batch_size_fail_threshold_in_kb: 150
native_transport_max_threads: 2000

// cqlsh
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
CREATE TABLE IF NOT EXISTS data_points (
  metric text,
  row_time timestamp,
  data_type text,
  tags frozen<map<text,text>>,
  offset int,
  value blob,
  PRIMARY KEY ((metric, row_time, data_type, tags), offset)
);
*/
const ThreeWeeks = 1814400000 // in ms

type ScyllaConfig struct {
	Keyspace  string
	Hosts     []string
	BatchSize int
}

func TestBatchManual(t *testing.T) {
	config := ScyllaConfig{
		Keyspace:  "test",
		Hosts:     []string{"localhost:9042", "localhost:9043", "localhost:9044"},
		BatchSize: 602,
	}

	clusterConfig := gocql.NewCluster(config.Hosts...)
	clusterConfig.Keyspace = config.Keyspace
	clusterConfig.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy())
	clusterConfig.Compressor = lz4.LZ4Compressor{}
	session, err := gocql.NewSession(*clusterConfig)
	if err != nil {
		t.Fatal(err)
	}

	batch := session.NewBatch(gocql.UnloggedBatch)
	for i := 0; i < config.BatchSize; i++ {
		time := time.Now().UnixMilli()
		offset := time % ThreeWeeks
		rowTime := time - offset
                // different partitions
		batch.Query(
			"INSERT INTO data_points (metric,row_time,data_type,tags,offset,value) VALUES (?, ?, ?, ?, ?, ?);",
			"metric", rowTime, "int", map[string]string{"name": fmt.Sprintf("%d", i)}, offset, []byte{byte(i / 256), byte(i % 256)})
                // same partitions
                //batch.Query(
		//	"INSERT INTO data_points (metric,row_time,data_type,tags,offset,value) VALUES (?, ?, ?, ?, ?, ?);",
		//	"metric", rowTime, "int", map[string]string{"name": "value"}, offset+1000*int64(i), []byte{byte(i / 256), byte(i % 256)})
	}

	if err := session.ExecuteBatch(batch); err != nil {
		t.Fatal(err)
	}

}

Update the docs with instructions how to use go mod replace

panic in readCollectionSize

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

ScyllaDB 2018.1.10

What version of Gocql are you using?

/go/pkg/mod/github.com/scylladb/[email protected]

What did you do?

Scanned a struct with many fields, not sure whether any particular field type triggered this.

What did you expect to see?

No panic.

What did you see instead?

panic: runtime error: index out of range

goroutine 78088437 [running]:
github.com/gocql/gocql.readCollectionSize(...)
    /go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:1405
github.com/gocql/gocql.unmarshalList(0xfb2ac0, 0xc003a01900, 0xc008a012d5, 0x2, 0x2f, 0xcf7860, 0xc00632a748, 0x918641, 0x4)
    /go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:1443 +0xd39
github.com/gocql/gocql.Unmarshal(0xfb2ac0, 0xc003a01900, 0xc008a012d5, 0x2, 0x2f, 0xcf7860, 0xc00632a748, 0xc0073f7130, 0x2)
    /go/pkg/mod/github.com/scylladb/[email protected]/marshal.go:152 +0xa3b
github.com/gocql/gocql.scanColumn(0xc008a012d5, 0x2, 0x2f, 0xc001845617, 0x7, 0xc001845630, 0x7, 0xc001845640, 0xc, 0xfb2ac0, ...)
    /go/pkg/mod/github.com/scylladb/[email protected]/session.go:1295 +0x291
github.com/gocql/gocql.(*Iter).Scan(0xc00250a360, 0xc00aa406c0, 0x23, 0x23, 0x23)
    /go/pkg/mod/github.com/scylladb/[email protected]/session.go:1395 +0x2e6
github.com/scylladb/gocqlx.(*Iterx).StructScan(0xc00183f8c0, 0xdf7280, 0xc00632a600, 0xe78060)
    /go/pkg/mod/github.com/scylladb/[email protected]/iterx.go:242 +0x1bb
github.com/scylladb/gocqlx.(*Iterx).scanAny(0xc00183f8c0, 0xdf7280, 0xc00632a600, 0xc00183f800, 0xc00250a360)
    /go/pkg/mod/github.com/scylladb/[email protected]/iterx.go:105 +0x232
github.com/scylladb/gocqlx.(*Iterx).Get(0xc00183f8c0, 0xdf7280, 0xc00632a600, 0xc001714c74, 0xc001714c74)
    /go/pkg/mod/github.com/scylladb/[email protected]/iterx.go:66 +0x48
github.com/scylladb/gocqlx.(*Queryx).Get(0xc00081b5c0, 0xdf7280, 0xc00632a600, 0xc00081b5c0, 0x40750b)
    /go/pkg/mod/github.com/scylladb/[email protected]/queryx.go:212 +0x75

The panic appears to be caused by incalid range checks when using protocol version greater than version 2 as in this case readCollectionSize consumes 4 bytes, but the callers of readCollectionSize always check only for 2 bytes.

Scylla connection picker is racy

What version of Cassandra are you using?

Scylla 3.0.2

What version of Gocql are you using?

latest/master

What did you do?

Run a bunch of queries in parallel

What did you expect to see?

No race conditions.

What did you see instead?

Race conditions caused by the Scylla connection picker (https://github.com/scylladb/gocql/blob/master/scylla.go), mainly the Pick() method but it looks like the whole thing is prone to race conditions.

Name of the driver is not advertised in system.clients

The system.clients table in Scylla shows a list of currently opened connections. If a driver tells the database node abouts its name, then it will be included in the table. However, it appears that gocql isn't doing that.

Querying the system.clients table with running scylla-bench shows this:

cqlsh> select * from system.clients;

 address  | port  | client_type | connection_stage | driver_name | driver_version | hostname | protocol_version | shard_id | ssl_cipher_suite | ssl_enabled | ssl_protocol | username
----------+-------+-------------+------------------+-------------+----------------+----------+------------------+----------+------------------+-------------+--------------+-----------
 10.0.1.7 | 35229 |         cql |            READY |        null |           null |     null |                4 |        0 |             null |        null |         null | anonymous
 10.0.1.7 | 36308 |         cql |            READY |        null |           null |     null |                4 |        0 |             null |        null |         null | anonymous
 10.0.1.7 | 36314 |         cql |            READY |        null |           null |     null |                4 |        0 |             null |        null |         null | anonymous
 10.0.1.7 | 50903 |         cql |            READY |        null |           null |     null |                4 |        0 |             null |        null |         null | anonymous

The driver_name and driver_version fields are empty (which is also the case for the connections established by cqlsh, by the way). Advertising driver's name in the system.clients table can be helpful when debugging issues e.g. when a connection imbalance occurs and allows to narrow down the culprit application/driver better.

Connections setup from clients that use the additional native_trasport_shard_aware_port - are not all to that port

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

2021.1.5 (scylla-cloud)

What version of Gocql are you using?

1.5.0

What version of Go are you using?

1.16.9

What did you do?

update scylla-bench to use gocql 1.5.0

connected with scylla-bench to a cluster

./scylla-bench -workload sequential -mode write -nodes XXX  -username XXX -password XXX

accessed one of the servers

support@ip-172-31-0-108:~$ netstat  -an | grep 9042
tcp        0      0 0.0.0.0:19042           0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:9042            0.0.0.0:*               LISTEN     
tcp        0     15 ip1:19042      ip2:32810    ESTABLISHED
tcp        0     30 ip1:19042      ip2:32788    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32821    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32818    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32820    ESTABLISHED
tcp        0     15 ip1:19042      ip2:32819    ESTABLISHED
tcp        0      0 ip1:9042       ip2:55714    ESTABLISHED
tcp        0     15 ip1:19042      ip2:32815    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32823    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32813    ESTABLISHED
tcp        0      0 ip1:9042       ip2:55712    ESTABLISHED
tcp        0     15 ip1:19042      ip2:32822    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32811    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32812    ESTABLISHED
tcp        0      0 ip1:19042      ip2:32814    ESTABLISHED

as can be seen there are:

13 connections to 19042
2 connections to 9042

I expected the control connection to be on 9042 and all the rest of the connections (actually sending traffic) to be on 19042

Maybe the first connection is not the control connection ?

I guess there is an explanation for this - yet am not sure what it is.

panic: token map different size to token ring: got 1536 expected 2048

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

Scylla 4.1

What version of Gocql are you using?

github.com/scylladb/gocql v1.4.0

What did you do?

I enabled TokenAwareHostPolicy like it says in the readme to see if performance would improve. On my test cluster it worked fine, but when I deployed it to production all my servers panicked with this error:

panic: token map different size to token ring: got 1536 expected 2048
goroutine 1 [running]:
github.com/gocql/gocql.(*networkTopology).replicaMap(0xc00051c088, 0xc000598380, 0xc00051c088, 0x0, 0x0)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/topology.go:273 +0x186d
github.com/gocql/gocql.(*tokenAwareHostPolicy).updateReplicas(0xc0001c2380, 0xc0003ca3a0, 0xc000023750, 0xa)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/policies.go:457 +0x25f
github.com/gocql/gocql.(*tokenAwareHostPolicy).KeyspaceChanged(0xc0001c2380, 0xc000023750, 0xa, 0x0, 0x0)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/policies.go:442 +0x99
github.com/gocql/gocql.(*Session).init(0xc000180700, 0xc0001a0380, 0x0)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/session.go:280 +0x5ca
github.com/gocql/gocql.NewSession(0xc00024f260, 0x3, 0x3, 0xb01191, 0x5, 0x0, 0x2540be400, 0x12a05f200, 0x23b6, 0xc000023750, ...)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/session.go:166 +0x809
github.com/gocql/gocql.(*ClusterConfig).CreateSession(...)
        /home/wim/Documents/workspace/go/pkg/mod/github.com/scylladb/[email protected]/cluster.go:194

This actually brought the site down for a few minutes.

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status

Datacenter: eu-de
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
UN  159.69.242.183  1.39 GB    256          ?       886551b1-7dbb-4ba3-8fe1-c388bd2fed53  rack1
UN  116.202.25.245  1.21 GB    256          ?       d8632268-2ec5-4f8a-bc33-5a20d8d8379d  rack1
UN  49.12.4.179     1.3 GB     256          ?       ce75f822-c8ec-4b81-9451-c6877b20ca3e  rack1
Datacenter: us-central
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
UN  209.126.0.59    1.44 GB    256          ?       b1b386a7-9997-4ba6-a79f-b24c9efa43c9  rack1
UN  207.244.245.11  1.17 GB    256          ?       a9027bea-5f82-4380-bede-892f882d9d17  rack1
Datacenter: us-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns    Host ID                               Rack
UN  23.175.1.55     1.53 GB    256          ?       ac84d564-01b3-4e04-9161-83b14cad1cca  rack1
UN  23.175.1.54     1.53 GB    256          ?       7b5b00fe-b7b6-4f0c-a896-5f59fef09bb2  rack1
UN  23.175.1.53     1.57 GB    256          ?       4b4bb1de-bb90-4dc9-b879-2b6f730a565e  rack1

output of SELECT peer, rpc_address FROM system.peers


 peer           | rpc_address
----------------+----------------
    23.175.1.53 |    23.175.1.53
 159.69.242.183 | 159.69.242.183
    23.175.1.55 |    23.175.1.55
    23.175.1.54 |    23.175.1.54
   209.126.0.59 |   209.126.0.59
 207.244.245.11 | 207.244.245.11
 116.202.25.245 | 116.202.25.245

rebuild your application with the gocql_debug tag and post the output
Is this necessary?

Race in executeQuery

What version of Gocql are you using?

scylladb/[email protected]

What did you do?

I'm reading source code and spotted race gocql#1381 (comment)
there is a similar case in this fork on line

gocql/conn.go

Line 1152 in 81a4afe

qry.lwt = info.request.lwt

There is no synchronization around reading/writing qry.lwt. Access from multiple goroutines is possible if speculative query execution is enabled (goroutines spawn in queryExecutor.speculate).

What did you expect to see?

Some synchronization mechanism for the field.

What did you see instead?

Unsynchronized access.

Extend integration tests with scylla specific tests

The idea is to ensure the integrity of the shard awareness since it doesn't just depend on relatively isolated scylla specific code but also how tokens are produced and propagated through out the rest of the code.

Panic in query_executor

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

Scylla Enterprise 2021.1.4

What version of Gocql are you using?

v0.0.0-20210215130051-390af0fb2915

What version of Go are you using?

1.15

What did you do?

Cluster repair

What did you expect to see?

No panic

What did you see instead?

panic

A cluster repair triggered a panic with the stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf6ff26]

goroutine 35151517 [running]:
github.com/gocql/gocql.(*queryMetrics).attempt(0x0, 0x1, 0x60a1fd2, 0xc000313380, 0x7f536f095f00, 0x2573040, 0x60a1fd2)
        /vendor/github.com/gocql/gocql/session.go:808 +0x26
github.com/gocql/gocql.(*Query).attempt(0xc0049c2900, 0xc0000360b6, 0x8, 0xc0556659e3f166f3, 0x7f5375137f20, 0x2573040, 0xc0556659dde74724, 0x7f536f095f4e, 0x2573040, 0xc004a84c60, ...)
        /vendor/github.com/gocql/gocql/session.go:1049 +0xca
github.com/gocql/gocql.(*queryExecutor).attemptQuery(0xc000799b80, 0x1aab860, 0xc0063614c0, 0x1aba0c0, 0xc0049c2900, 0xc000fac900, 0xc0000516f8)
        /vendor/github.com/gocql/gocql/query_executor.go:35 +0x123
github.com/gocql/gocql.(*queryExecutor).do(0xc000799b80, 0x1aab860, 0xc0063614c0, 0x1aba0c0, 0xc0049c2900, 0xc0049a8e80, 0x184b170)
        /vendor/github.com/gocql/gocql/query_executor.go:127 +0x1b3
github.com/gocql/gocql.(*queryExecutor).run(0xc000799b80, 0x1aab860, 0xc0063614c0, 0x1aba0c0, 0xc0049c2900, 0xc0049a8e80, 0xc0032db3e0)
        /vendor/github.com/gocql/gocql/query_executor.go:173 +0x85
created by github.com/gocql/gocql.(*queryExecutor).executeQuery
       /vendor/github.com/gocql/gocql/query_executor.go:85 +0x2ea

Unfortunately since this only happened during a repair, cannot get a reproducible program.

We suspect it may have something to do with either retry policy or speculative execution, our usage is like so:

var (
	speculativeExecPolicy = gocql.SimpleSpeculativeExecution{
		NumAttempts:  2,
		TimeoutDelay: 100 * time.Millisecond,
	}

	retryPolicy = gocql.ExponentialBackoffRetryPolicy{
		NumRetries: 2,
		Min:        10 * time.Millisecond,
	}
)


func queryWithRetry(...) {
	ttl := qb.TTL(...)
	query := dbSession.
		Query(
			table.InsertBuilder().
				TTLNamed("ttl").
				ToCql(),
		).
		Consistency(consistency).
		RetryPolicy(&retryPolicy).
		BindStructMap(obj, qb.M{"ttl": ttl})

	err = query.ExecRelease()
}

func queryWithSpeculativeExecution(...) {
	query := qb.Select("table").
		Where(qb.Eq("id")).
		CountAll().
		Query(dbSession).
		Consistency(gocql.LocalOne).
		SetSpeculativeExecutionPolicy(&speculativeExecPolicy).
		Bind(id)

	var count int
	return query.GetRelease(&count)

}

Performance issues when switched from gocql/gocql to scylladb/gocql

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

Scylla : Version : 3.2

What version of Gocql are you using?

latest

What did you do?

Migrated from gocql/gocql to scylladb/gocql

What did you expect to see?

Performance improvements. Less non-token aware queries.

What did you see instead?

Non-token aware queries did not reduce to zero instead remained almost same although there was a visible decrement in cross-shard queries.
Total requests served per coordinator had increased.
Total requests(Scylla CQL optimisation) had also increased.
Latency on one of the node was especially very high.

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

Load balancing is useless unless we allow configuring the conn.maxStreams value

HEAD: c8cd0ba

Description
The hard codded value for "protocol" version greater than 2 is 32768: https://github.com/scylladb/gocql/blob/master/internal/streams/streams.go#L24

This makes #86 pretty much useless with a shard-aware driver because any concurrency above 100-200 per shard is already going to cause issues.

The issue is a follow up for the comment here: #86 (comment)

Two things:

The current default is too high IMO.
There should be a way to control the conn.maxStreams value by the user.

full scan operations time out

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

3.x

What version of Gocql are you using?

1.3.0-rc.1

What did you do?

I'm trying to scan scylla table in efficient way based on this https://github.com/scylladb/scylla-code-samples/blob/master/efficient_full_table_scan_example_code/efficient_full_table_scan.go#L61
I'm testing a different parameters, but this particular test was made with 1 thread and 10000 ranges per thread. So the token range was divided by 10000.
When I was trying to use CL=QUORUM I got.
Operation timed out for services_prod_us.device_signals - received only 1 responses from 2 CL=QUORUM. with query:SELECT version,device_id,signal_bucket FROM device_signals WHERE token(version,device_id,signal_bucket) >= ? AND token(version,device_id,signal_bucket) <= ?; and range &{-xxxxx -yyyyyyy}.
With CL=LOCAL_ONE
Operation timed out for services_prod_us.device_signals - received only 0 responses from 1 CL=LOCAL_ONE. with query:SELECT version,device_id,signal_bucket FROM device_signals WHERE token(version,device_id,signal_bucket) >= ? AND token(version,device_id,signal_bucket) <= ?; and range &{xxxxx yyyyyyy}
error type is (*gocql.RequestErrReadTimeout)
Scylla config

cluster := gocql.NewCluster(address...)
cluster.CQLVersion = "3.3.1"
cluster.NumConns = 16
cluster.ConnectTimeout = 10 * time.Second
cluster.Consistency = gocql.Quorum
cluster.DisableInitialHostLookup = disableInitHostLookup
cluster.Compressor = &gocql.SnappyCompressor{}
cluster.PoolConfig.HostSelectionPolicy =
gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy())
cluster.Timeout = 1 * time.Minute
cluster.RetryPolicy = &scylladb.MyExponentialBackoffRetryPolicy{NumRetries: 15, Min: 10 * time.Millisecond, Max: time.Minute}

MyExponentialBackoffRetryPolicy is the same as ExponentialBackoffRetryPolicy but I'm logging Attempt function call.

gocql: github.com/scylladb/gocql v1.3.0-rc.1

Why gocql.RequestErrReadTimeout is not triggering RetryPolicy? (Retry policy is working fine when the cluster.Timeout is exceeded and ErrTimeoutNoResponse is triggered.)
How to change timeout for gocql.RequestErrReadTimeout and what it means and what is the difference between RequestErrReadTimeout and ErrTimeoutNoResponse
How it's possible that I got this error with CL=LOCAL_ONE with TokenAwareHostPolicy?

What did you expect to see?

No timeouts and if needed the retry policy should kick in

What did you see instead?

time outs on some queries

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

load balancing: slow coordinator avoidance

It would be nice to have a real load balancing strategy which chooses coordinators based on actual metrics such as how fast it can process requests.

See scylladb/scylladb#5715 for motivation. I explain in the comments what the issue is (it is a driver-side issue).

The 4.x versions of the Java driver AFAIK do have such strategies - at least one.
From https://docs.datastax.com/en/developer/java-driver/4.11/manual/core/load_balancing/#built-in-policies:

DefaultLoadBalancingPolicy should almost always be used; it requires a local datacenter to be specified either programmatically when creating the session, or via the configuration (see below). It can also use a highly efficient slow replica avoidance mechanism, which is by default enabled.

Doc: Add CDC Connector

Add to docs:

Using CDC with Go

When writing applications, you can now use our Go Library <https://github.com/scylladb/scylla-cdc-java>_ to simplify writing applications that read from Scylla CDC.

Unable to do non-token aware update counter queries

What version of Scylla or Cassandra are you using?

ScyllabDB 3.1.2

What version of Gocql are you using?

e5a4338

What did you do?

CREATE TABLE stats (
id text,
date date,
list counter,
"view" counter,
phone counter,
reply counter,
message counter,
sms counter,
PRIMARY KEY  (id, date)
);

with this configuration

c := gocql.NewCluster("*****")
c.CQLVersion = "3.3.1"
//c.DisableInitialHostLookup = true
c.Keyspace = "counter"
c.MaxPreparedStmts = 1000000000
c.MaxRoutingKeyInfo = 1000000000
c.Timeout = 2 * time.Second
c.NumConns = 2000
c.SocketKeepalive = time.Minute
c.Consistency = gocql.LocalOne
c.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(
    gocql.RoundRobinHostPolicy(),
)
// try compressor
//c.Compressor = &gocql.SnappyCompressor{}
c.RetryPolicy = &gocql.ExponentialBackoffRetryPolicy{NumRetries: 6}

And only queries like that:

ct := "view"
q := "UPDATE stats SET \"" + string(ct) + "\" = \"" + string(ct) + "\" + ? where id = ? and date = ?"
qry := s.store.Query(q, incr, ad, date.RFC3339())
qry.SetConsistency(gocql.One)
qry = qry.WithContext(ctx)
return qry.Exec()

I have 100% of Non-Token aware queries in scylla monitoring.

What did you expect to see?

All queries or close to 0 that are token aware.

What did you see instead?

100% of queries are non-token aware.

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status

$ nodetool status

Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  ip.adr.es.s  6.7 GB     256          ?       fbb5723b-01db-4c1a-95ee-afd63d915a8b  1a
UN  ip.adr.es.s  5.35 GB    256          ?       dd32edfe-a5c1-4933-a7cd-2b53190f2623  1a
UN  ip.adr.es.s  4.71 GB    256          ?       ff7712be-2a5e-425e-b7ce-b57206474aa2  1c
UN  ip.adr.es.s  4.8 GB     256          ?       fad80a53-66fe-42ed-91f4-8836bd94e6b1  1b
UN  ip.adr.es.s  4.95 GB    256          ?       deef4bb7-e334-4f86-8be7-8e392531d7cb  1b
UN  ip.adr.es.s  5.3 GB     256          ?       ab23cac1-ccdf-49ee-9c49-54f538d7cf7c  1c
UN  ip.adr.es.s  5.32 GB    256          ?       bfc4e3e0-6401-4067-970d-50482f5d169f  1a
UN  ip.adr.es.s  4.57 GB    256          ?       274b106a-db94-46ab-a4ea-3e8eb59a2042  1c
UN  ip.adr.es.s  6.76 GB    256          ?       dfa914f2-bcd4-4434-ab6d-77fd4d48f8ce  1a
UN  ip.adr.es.s  6.8 GB     256          ?       470959e8-388b-4862-b610-b0550e7f71cc  1a
UN  ip.adr.es.s  5.25 GB    256          ?       20728350-22c6-457a-9dd6-4d3e047bfbf5  1a
UN  ip.adr.es.s  5.37 GB    256          ?       9e0e8c47-d6c5-4021-8aa4-9968e95a2141  1c
UN  ip.adr.es.s  6.57 GB    256          ?       549465b1-41b7-4889-bded-838c6b4c9ef5  1b
UN  ip.adr.es.s  5.38 GB    256          ?       0d80d750-732f-489b-8375-73cdfdf73d62  1a
UN  ip.adr.es.s  5.74 GB    256          ?       11cca159-8d2b-41af-924e-a8bcfe54cc86  1c
UN  ip.adr.es.s  6.2 GB     256          ?       6dbc3ebe-a2bb-4506-bc3a-0d8bb6e351e9  1b
UN  ip.adr.es.s  4.61 GB    256          ?       c50c5510-179c-494e-a924-0b4e37834380  1c
UN  ip.adr.es.s  6.16 GB    256          ?       a152787e-65d6-47e8-87d1-cb5100fcf861  1b
UN  ip.adr.es.s  4.66 GB    256          ?       a580e1a2-b3ce-404a-a861-b8e05b1a6b7f  1c
UN  ip.adr.es.s  5.22 GB    256          ?       02cef23c-68a7-4b32-ba35-2391502644bf  1b
UN  ip.adr.es.s  5.16 GB    256          ?       49a1d2b3-aac8-41fd-a2f3-b7cdd19154e8  1c
UN  ip.adr.es.s  6.29 GB    256          ?       a7c31c54-cae8-4bd6-ba26-596d8101b07f  1b
UN  ip.adr.es.s  5.34 GB    256          ?       38d5d160-48d8-492b-8882-5d805bbd7e37  1a
UN  ip.adr.es.s  5.16 GB    256          ?       439fc0be-c89b-4f54-b516-42b1c2237928  1b

output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

A deadlock may occur if closing excess connections encounter an error

A deadlock can occur when closing one of the excess connections fail with error:

goroutine 206 [semacquire]:
sync.runtime_SemacquireMutex(0xc0000c2fb4, 0x900000000, 0x1)
	/home/piodul/.gvm/gos/go1.14.4/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc0000c2fb0)
	/home/piodul/.gvm/gos/go1.14.4/src/sync/mutex.go:138 +0x1c1
sync.(*Mutex).Lock(0xc0000c2fb0)
	/home/piodul/.gvm/gos/go1.14.4/src/sync/mutex.go:81 +0x7d
sync.(*RWMutex).Lock(0xc0000c2fb0)
	/home/piodul/.gvm/gos/go1.14.4/src/sync/rwmutex.go:98 +0x4a
github.com/gocql/gocql.(*hostConnPool).HandleError(0xc0000c2f70, 0xc000402f00, 0xb99880, 0xc0004243c0, 0x1)
	/home/piodul/code/scylla-gocql/connectionpool.go:578 +0x58
github.com/gocql/gocql.(*Conn).closeWithError(0xc000402f00, 0x0, 0x0)
	/home/piodul/code/scylla-gocql/conn.go:535 +0x36f
github.com/gocql/gocql.(*Conn).Close(...)
	/home/piodul/code/scylla-gocql/conn.go:544
github.com/gocql/gocql.(*scyllaConnPicker).closeExcessConns(0xc00018e820)
	/home/piodul/code/scylla-gocql/scylla.go:360 +0x95
github.com/gocql/gocql.(*scyllaConnPicker).Put(0xc00018e820, 0xc0004a4d80)
	/home/piodul/code/scylla-gocql/scylla.go:330 +0x3f0
github.com/gocql/gocql.(*hostConnPool).connectToShard(0xc0000c2f70, 0x0, 0x0, 0x0)
	/home/piodul/code/scylla-gocql/connectionpool.go:551 +0x4cb
github.com/gocql/gocql.(*hostConnPool).connectMany.func1(0xc00041c0a0, 0xc00041e120, 0xc0000c2f70, 0xc00041c098, 0xc0000a0100, 0x0)
	/home/piodul/code/scylla-gocql/connectionpool.go:462 +0x10a
created by github.com/gocql/gocql.(*hostConnPool).connectMany
	/home/piodul/code/scylla-gocql/connectionpool.go:458 +0x228

I found this while writing #52 . This deadlock occurred when testing the version from the PR (therefore the line numbers may be off), but it should also happen on master.

The deadlock happens because one goroutine may acquire the same lock twice:

First lock occurs here (in this backtrace the function is called connectToShard),
The second one here.

The description of an error I got from closing the connection is: write tcp 127.0.0.1:32777->127.0.0.1:38147: i/o timeout. It only happened when using client encryption. I suspect this happened because when TLS connection is closed, it attempts to write some final data to the peer, and it might fail because of SetWriteDeadline which was set earlier. I don't know how probable it is to trigger it in real application.

Add support for com.scylladb.auth.TransitionalAuthenticator

https://docs.scylladb.com/operating-scylla/security/runtime-authentication/

This fails with:

unable to create session: unable to discover protocol version: unexpected authenticator "com.scylladb.auth.TransitionalAuthenticator"

Add support for ScyllaDB's per partition rate limiting's new error

The upcoming ScyllaDB 5.1 introduces a feature called per-partition rate limiting. In case the (user defined) per-partition rate limit is exceeded, the database will start returning a new kind of error, or fall back to Configuration_error if the driver does not support it.

Because the new error should be propagated to users and handled differently than other errors, the driver should be prepared to accept the new error.

Reference implementation in the scylla-rust-driver: scylladb/scylla-rust-driver#549

Cannot install the library

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

N/A

What version of Gocql are you using?

latest

What did you do?

I was trying to install the library by following the README. I feel like the instruction is a bit vague. I already had gocql/gocql installed on my machine. Now I wanted to switch to scylladb/gocql. It is impossible to to run command go get github.com/scylladb/gocql because it will result with an output like this:

go: github.com/scylladb/gocql upgrade => v1.4.3
go get: github.com/scylladb/[email protected]: parsing go.mod:
	module declares its path as: github.com/gocql/gocql
	        but was required as: github.com/scylladb/gocql

I decided to clone the repository and run following command inside:

go mod edit -replace=github.com/gocql/gocql=github.com/scylladb/gocql@v0.0.0-20201029170107-81a4afe636ae01c9826794e83365a0023692303d

The command returned without error but now when I want to use the library I get the error:

go: github.com/gocql/gocql@v0.0.0-20201029170107-81a4afe636ae01c9826794e83365a0023692303d: invalid version: unknown revision 81a4afe636ae01c9826794e83365a0023692303d

What did you expect to see?

I expected the library to have some explanation in README how to use it

What did you see instead?

Fail after following instructions

code in directory .../scylladb/gocql expects import "github.com/gocql/gocql"

Please answer these questions before submitting your issue. Thanks!

go version
go version go1.13.6 linux/amd64

What did you do?

go get github.com/scylladb/gocql
package github.com/scylladb/gocql: code in directory /home/dyasny/go/src/github.com/scylladb/gocql expects import "github.com/gocql/gocql"

What did you expect to see?

a module installed with no errors

When adding a new datacenter with a replication factor of 0, the GoCQL client hard-faults.

What version of Scylla or Cassandra are you using?

2020.1.3-0.20201001.6bab5d934b

What version of Gocql are you using?

1.4.0

What did you do?

Added a new datacenter with a replication factor of 0 to the keyspace via:

ALTER KEYSPACE parler WITH replication = { 'class' : 'NetworkTopologyStrategy', 'us-west-2-ent' : 3, 'us-west-2-ent-b' : 0 } AND durable_writes = true;

What did you expect to see?

No errors occouring

What did you see instead?

panic: invalid replication_factor 0. Is the "parler:dc=us-west-2-ent-b" keyspace configured correctly?

goroutine 1 [running]:
github.com/gocql/gocql.getReplicationFactorFromOpts(0xc0027cb940, 0x19, 0x1779820, 0xc001f7e740, 0x4)
	/go/pkg/mod/github.com/scylladb/[email protected]/topology.go:63 +0x28d
github.com/gocql/gocql.getStrategy(0xc0018c4540, 0x6, 0xc0018c4540)
	/go/pkg/mod/github.com/scylladb/[email protected]/topology.go:82 +0x1e8
github.com/gocql/gocql.(*tokenAwareHostPolicy).updateReplicas(0xc0001d3260, 0xc001f7e6a0, 0x1ad61ce, 0x6)
	/go/pkg/mod/github.com/scylladb/[email protected]/policies.go:454 +0x230
github.com/gocql/gocql.(*tokenAwareHostPolicy).KeyspaceChanged(0xc0001d3260, 0x1ad61ce, 0x6, 0x0, 0x0)
	/go/pkg/mod/github.com/scylladb/[email protected]/policies.go:442 +0x99
github.com/gocql/gocql.(*Session).init(0xc0001d8000, 0xc000de0500, 0x0)
	/go/pkg/mod/github.com/scylladb/[email protected]/session.go:280 +0x5e4
github.com/gocql/gocql.NewSession(0xc000de0480, 0x5, 0x8, 0x1ad3e4f, 0x5, 0x0, 0x12a05f200, 0x23c34600, 0x2352, 0x1ad61ce, ...)
	/go/pkg/mod/github.com/scylladb/[email protected]/session.go:166 +0x809
github.com/gocql/gocql.(*ClusterConfig).CreateSession(...)
	/go/pkg/mod/github.com/scylladb/[email protected]/cluster.go:194
gitlab.parler.com/server/server/support/goLibrary/helpers.initCassandra(0x1ad61ce, 0x6, 0x1ad5b0e)
	/builder/support/goLibrary/helpers/ctx.go:124 +0x585
gitlab.parler.com/server/server/support/goLibrary/helpers.InitCTX(0xc00087ea00, 0x1ea4de0, 0xc000554270, 0xc0005e2de0, 0xc000568780, 0xc000230050, 0xc0000b4140, 0xc000139540, 0x1e95120, 0x2a60db0, ...)
	/builder/support/goLibrary/helpers/ctx.go:212 +0x352
main.main()
	/builder/applications/golang/taskrunner/main.go:200 +0xfee

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status

nodetool status
Datacenter: us-west-2-ent
=========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.3.33.195  3.5 TB     256          ?       31dcee36-27a1-4821-bc44-11a0f8dd3df5  2a
UN  10.3.33.107  4.13 TB    256          ?       c7f8d883-24c4-4070-a370-43e524a75270  2a
UN  10.3.33.138  3.23 TB    256          ?       1101667a-e0cb-4b44-9ba8-fe61592ee219  2a
UN  10.3.33.143  3.77 TB    256          ?       86505452-ca58-4a49-bb5a-9ce6a7adf2a2  2a
UN  10.3.33.140  3.77 TB    256          ?       8b2128d7-4c3f-4f94-89fe-82267a79f212  2a
UN  10.3.33.17   3.46 TB    256          ?       817fcb54-d5bd-4021-bc7d-f2df0f356833  2a
UN  10.3.33.86   3.71 TB    256          ?       c34a8838-3703-4699-9657-1ba20ae7fc26  2a
UN  10.3.33.246  3.79 TB    256          ?       26c0a6c6-6c8f-457f-aff1-1145b62406f9  2a
UN  10.3.33.118  4.58 TB    256          ?       41d9a5d0-0a79-4363-9100-7d3cbc23f2bb  2a
UN  10.3.33.245  1.56 TB    256          ?       0c36858f-e3c0-4f92-a945-a4ae50de46ec  2a
UN  10.3.33.20   3.76 TB    256          ?       0cfa3c08-b779-4e64-9ed9-5a03881f393f  2a
UN  10.3.33.59   1.58 TB    256          ?       90638e06-f4f0-46f4-8448-2ff7cbb5d44b  2a
UN  10.3.33.217  3.5 TB     256          ?       e386098a-be8a-4dbb-93ce-048c7c420c58  2a
UN  10.3.33.158  3.82 TB    256          ?       f2d25c53-c449-4d7f-97c0-8b5165005ed1  2a
UN  10.3.33.94   1.88 TB    256          ?       1add2664-57ab-45bc-a8b6-4022f08ac749  2a
UN  10.3.33.30   3.36 TB    256          ?       d3c2306a-567e-462e-b61e-b446b1c4de5b  2a
UN  10.3.33.189  3.89 TB    256          ?       880f82ac-c7f5-4d0d-a120-12632ef9b0fa  2a
UN  10.3.33.29   3.04 TB    256          ?       5b2dbd33-fae9-4c19-a55b-2101db237879  2a
UN  10.3.33.156  3.87 TB    256          ?       417dc5af-20fd-4e97-b564-9ddf3f11d125  2a
UN  10.3.33.188  2.9 TB     256          ?       8ca80c8f-2c8b-4070-a993-dd9bcaeeea15  2a

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

output of SELECT peer, rpc_address FROM system.peers

cqlsh> SELECT peer, rpc_address FROM system.peers;

 peer        | rpc_address
-------------+-------------
 10.3.33.189 | 10.3.33.189
  10.3.33.86 |  10.3.33.86
 10.3.33.118 | 10.3.33.118
  10.3.33.29 |  10.3.33.29
 10.3.33.138 | 10.3.33.138
 10.3.33.245 | 10.3.33.245
  10.3.33.17 |  10.3.33.17
 10.3.33.140 | 10.3.33.140
  10.3.33.20 |  10.3.33.20
 10.3.33.156 | 10.3.33.156
  10.3.33.94 |  10.3.33.94
  10.3.33.59 |  10.3.33.59
 10.3.33.188 | 10.3.33.188
 10.3.33.246 | 10.3.33.246
  10.3.33.30 |  10.3.33.30
 10.3.33.195 | 10.3.33.195
 10.3.33.143 | 10.3.33.143
 10.3.33.217 | 10.3.33.217
 10.3.33.158 | 10.3.33.158

rebuild your application with the gocql_debug tag and post the output

Note: This functionally appears to be a bug where a RF of 0 is considered to be a hard-fault in the gocql client.

Generated CQL improvements

Generated CQL has the following minor issues

First line is empty
There is no \n at the end
Indentation is mixed 2 spaces for columns vs 4 spaces for properties


CREATE TABLE backuptest_purge.big_table (
  id int PRIMARY KEY,
  data blob
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys':'ALL','rows_per_partition':'ALL'}
    AND comment = ''
    AND compaction = {'class':'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0
    AND speculative_retry = '99.0PERCENTILE';```

Make tracing requests shard aware

Currently, all tracing requests use a single connection, the control connection; this connection might or might not be "the right" connection (in terms of which shard it's connected to) for the given session_id (which is the partition key in system_traces.sessions and system_traces.events).

Relevant code in session.go:

func (t *traceWriter) Trace(traceId []byte) {
	var (
		coordinator string
		duration    int
	)
	iter := t.session.control.query(`SELECT coordinator, duration
			FROM system_traces.sessions
			WHERE session_id = ?`, traceId)

and a couple lines below:

	iter = t.session.control.query(`SELECT event_id, activity, source, source_elapsed
			FROM system_traces.events
			WHERE session_id = ?`, traceId)

This could be improved to make the requests token/shard-aware -- pick the right connection from the available pool of connections (as we do for user requests) based on the session_id's token.

Currently tracing queries put unnecessary strain on the cluster as they require cross-shard requests.

It is impossible to read with LWT

Consistency type does not contain SERIAL and LOCAL_SERIAL values so it is impossible to set consistency to be serial on read.

Await schema agreement returns wrong error

Schema is in sync no schema changes events, it detects no conflicts yet still error is returned.

15:40:56.285    INFO    backup.await_schema     Awaiting schema agreement...
15:42:56.427    INFO    backup.await_schema     Schema agreement error  {"error": "gocql: cluster schema versions not consistent: []"}
15:42:56.428    INFO    backup.await_schema     Schema agreement not reached, retrying...       {"error": "gocql: cluster schema versions not consistent: []", "wait": "16.538893946s"}
15:45:13.126    INFO    backup.await_schema     Schema agreement error  {"error": "gocql: cluster schema versions not consistent: []"}
15:45:13.126    INFO    backup.await_schema     Schema agreement not reached, retrying...       {"error": "gocql: cluster schema versions not consistent: []", "wait": "29.401847101s"}
15:47:42.684    INFO    backup.await_schema     Schema agreement error  {"error": "gocql: cluster schema versions not consistent: []"}
15:47:42.684    INFO    backup.await_schema     Schema agreement not reached, retrying...       {"error": "gocql: cluster schema versions not consistent: []", "wait": "56.135457321s"}

Index out of bounds when getting db conn

What version of Scylla or Cassandra are you using?

3.0.2

What version of Gocql are you using?

1.3.1

What did you do?

Server was running for several days (10+) and a query was executed

What did you expect to see?

The query to execute correctly

What did you see instead?

A panic when the internal gocql client attempted to establish a connection to the db. The exact error was:

github.com/gocql/gocql.(*Session).executeQuery(0xc002f13500, 0xc005b12a20, 0x12eef5e) goroutine 100090784 [running]: github.com/gocql/gocql.(*scyllaConnPicker).randomConn(0xc0001026e0, 0xc003f95538) /go/pkg/mod/github.com/scylladb/[email protected]/scylla.go:236 +0x83 github.com/gocql/gocql.(*scyllaConnPicker).Pick(0xc0001026e0, 0x0, 0x0, 0xc000699740) /go/pkg/mod/github.com/scylladb/[email protected]/scylla.go:152 +0x100 github.com/gocql/gocql.(*hostConnPool).Pick(0xc0006691f0, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/scylladb/[email protected]/connectionpool.go:313 +0xd9 github.com/gocql/gocql.(*queryExecutor).do(0xc0056757a0, 0x14c34c0, 0xc00003e0e0, 0x14d3560, 0xc005b12a20, 0x20300f) /go/pkg/mod/github.com/scylladb/[email protected]/query_executor.go:106 +0x18c github.com/gocql/gocql.(*queryExecutor).executeQuery(0xc0056757a0, 0x14d3560, 0xc005b12a20, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/scylladb/[email protected]/query_executor.go:60 +0x437 panic: runtime error: index out of range [-32] /go/pkg/mod/github.com/scylladb/[email protected]/session.go:431 +0xb7 github.com/gocql/gocql.(*Query).Iter(0xc005b12a20, 0xc005b12a20) /go/pkg/mod/github.com/scylladb/[email protected]/session.go:1135 +0xa5 github.com/gocql/gocql.(*Query).Exec(0xc005b12a20, 0xc00ad2d000, 0x7b) /go/pkg/mod/github.com/scylladb/[email protected]/session.go:1118 +0x2b

I believe the issue lies in the scyllaConnPicker.randomConn() method.

func (p *scyllaConnPicker) randomConn() *Conn { idx := int(atomic.AddInt32(&p.pos, 1)) for i := 0; i < len(p.conns); i++ { if conn := p.conns[(idx+i)%len(p.conns)]; conn != nil { return conn } } return nil }
For long running processes, idx will eventually overflow and cause a panic when attempting to select a connection. It should be changed (I can change it too) to circle back to zero when number overflows.

"received only 1 responses from 2 CL=QUORUM" query error having RF=3 and only 1 down node

What version of Scylla or Cassandra are you using?

4.6.3 , 2022.1.rc8

What version of Gocql are you using?

1.6.0

What version of Go are you using?

1.17.4

What did you do?

Created 5-node Scylla cluster and ran scylla-bench tool that uses this gocql driver. Commands are following:

stress_cmd:                                                                                            
- scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=25 -clustering-row-count=10000 -partition-offset=401 -clustering-row-size=uniform:10..1024 -concurrency=10 -connection-count=10 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 10
- scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=25 -clustering-row-count=100 -partition-offset=426 -clustering-row-size=uniform:2048..5120 -concurrency=10 -connection-count=10 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 10
- scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=25 -clustering-row-count=10 -partition-offset=451 -clustering-row-size=uniform:5120..8192 -concurrency=10 -connection-count=10 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 10
- scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=25 -clustering-row-count=1 -partition-offset=476 -clustering-row-size=uniform:8192..10240 -concurrency=10 -connection-count=10 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 10
- scylla-bench -workload=sequential -mode=read  -replication-factor=3 -partition-count=25 -clustering-row-count=5555 -clustering-row-size=uniform:1024..2048 -concurrency=100 -connection-count=100 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 0 -duration=170m -validate-data 
- scylla-bench -workload=sequential -mode=read  -replication-factor=3 -partition-count=25 -clustering-row-count=5555 -partition-offset=26 -clustering-row-size=uniform:1024..2048 -concurrency=100 -connection-count=100 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 0 -duration=170m -validate-data
- scylla-bench -workload=sequential -mode=read  -replication-factor=3 -partition-count=25 -clustering-row-count=5555 -partition-offset=51 -clustering-row-size=uniform:1024..2048 -concurrency=100 -connection-count=100 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 0 -duration=170m -validate-data
- scylla-bench -workload=sequential -mode=read  -replication-factor=3 -partition-count=25 -clustering-row-count=5555 -partition-offset=76 -clustering-row-size=uniform:1024..2048 -concurrency=100 -connection-count=100 -consistency-level=quorum -rows-per-request=10 -timeout=30s -retry-number=30 -retry-interval=80ms,1s -iterations 0 -duration=170m -validate-data

As it can be seen. We use RF=3 and CL=quorum.
Then, as part of a test run we do various disruptions to Scylla cluster. ANd when we run terminate and replace node disruption/nemesis we get following error with pretty high probability:

Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.

Host selection policy which we use is token-aware with fallback to round-robin. But, host-pool with automatic detection of down nodes behaves the same way.

Note that this problem is not rare it gets caught a lot of times. It is hard to not catch it making a Scylla node go down.

What did you expect to see?

No errors because there must be 2 alive Scylla nodes holding replicas and only 1 down (expected).

What did you see instead?

The mentioned error which assumes that one of the Scylla nodes that holds one of replicas is not contacted to.

Logs

CI job: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/valerii/job/longevity-large-partition-asymmetric-cluster-3h/13/console
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/700cb0a5-d24d-43a1-b191-4cf8e954cffe/20220628_125019/db-cluster-700cb0a5.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/700cb0a5-d24d-43a1-b191-4cf8e954cffe/20220628_125019/monitor-set-700cb0a5.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/700cb0a5-d24d-43a1-b191-4cf8e954cffe/20220628_125019/loader-set-700cb0a5.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/700cb0a5-d24d-43a1-b191-4cf8e954cffe/20220628_125019/sct-runner-700cb0a5.tar.gz

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

output of nodetool status
output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

A warning message when a connection to a shard-aware port times out is bogus

What version of Scylla or Cassandra are you using?

2022.2.6

What version of Gocql are you using?

HEAD: e38b2bc

What version of Go are you using?

Irrelevant

What did you do?

Tried to create a new connection to the cluster using a shard-aware port.

What did you expect to see?

An error message that would not confuse me.

What did you see instead?

A very confusing message that made me check a totally irrelevant direction and waste more than 3 working days of multiple people till we were finally able to figure out what the problem was.

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

"Cassandra cluster"?! You really want to fix your GH templates ;)

please provide the following information

output of nodetool status

Can't do! Production system!
Single DC, 36 nodes, 3 racks.
Each rack has 12 nodes.

output of SELECT peer, rpc_address FROM system.peers
rebuild your application with the gocql_debug tag and post the output

Both the above are unfeasible.

Description
The error message in question is this:

xxxx/xx/xx xx:xx:xx scylla: a.b.c.d:19042 connection to shard-aware address a.b.c.d:19042 resulted in wrong shard being assigned; please check that you are not behind a NAT or AddressTranslater which changes source ports; falling back to non-shard-aware port for 5m0s

But the thing is that NAT or an AddressTranslator is not the only possibility here.
Given gocql#1701 it's very easy to hit a ConnectTimeout which defaults to 600ms (!!!).

As a result if one of the shards (shard A) is overloaded and a TCP connection to 19042 times out due to that the driver is going to fall back to a "storm" connection policy trying to connect to a non-shard-aware port (9042): https://github.com/scylladb/gocql/blob/master/scylla.go#L422

And then it gets interesting (which also took us some time to figure after we realized that NAT has nothing to do with this): because the driver creates most of TCP connections asynchronously: https://github.com/scylladb/gocql/blob/master/connectionpool.go#L484
the following race may happen:

A connection to Shard A using 19042 is sent.
A connection to Shard B using 19042 is sent.
(2) times out and send a connection to 9042.
(3) lands on shard A and succeeds.
(1) completes but hits (https://github.com/scylladb/gocql/blob/master/scylla.go#L408) and prints the aforementioned message blaming NAT.

So, either fix the message or fix the race.

I'm going to file a separate GH issue about this fallback for a "storm" connection policy in general.

Document NumConns has no effect when using Scylla fork

This should be added to NumConns comment and README.

Prefer primary replica for the token for LWT queries

The way Paxos protocol works is that if two queries attempt to update the same key from different coordinators, they will start two independent Paxos rounds. Each round will be assigned a timestamp, and the coordinator who has the highest timestamp will win.
The issue is, the loser only queues up after the winner (using a semaphore) if both rounds are coordinated by the same node. If rounds are started at different nodes, the only option for the user is to sleep increasingly random interval and retry (this is what our implementation does).

If the key is contended and the driver is neither shard nor token aware, this leads to a lot of retries to make an update than necessary. If the driver is token or shard aware, it will send the query to one of the replicas for the partition, but will use round-robin.
This still means there will be at least 50% of loser queries, which will retry before they can commit.

This is why for LWT queries the driver should choose replicas in a pre-defined order, so that in case of contention they will queue up at the replica, rather than compete: choose the primary replica first, then, if the primary is known to be down, the first secondary, then the second secondary, and so on.
This will reduce contention over hot keys and thus increase LWT performance.

Gocql driver does not recover after a node failover

Please answer these questions before submitting your issue. Thanks!

What version of Scylla or Cassandra are you using?

Scylla Enterprise 2020.1.7

What version of Gocql are you using?

replace github.com/gocql/gocql => github.com/scylladb/gocql v0.0.0-20210425135552-909f2a77f46e

What version of Go are you using?

1.13

What did you do?

Inserted some data to the Scylla.
Run a workload that reads a random partition key from the database with gocql client.
Restart 1 scylla node - the coordinator metrics starts to look odd - client still ok.
Restarted a second scylla node - the coordinator metrics are really bad, client still ok.
Restarted the third scylla node - the client returns error no hosts available.

What did you expect to see?

That the gocql client realize that the restarted node(s) are back online, and direct traffic to them.

What did you see instead?

After restarting scylla nodes, the driver does not send the requests, only after restarting the gocql application.
When 3 nodes have been restarted, the client gets no hosts available errors.

After a node failover, the client does not direct traffic to the restarted node, but instead it directs the traffic to the other 2 nodes.
Following that, we restarted another node, which caused the client to direct all the traffic to a single coordinator node, even that all of the nodes are online.
Meanwhile In the cluster metrics in Grafana - the other nodes does accept requests, but only from a single coordinator node (the node that didn't restarted).
When we restarted the last node, all of the client requests failed with no hosts available errors.
To fix this situation we restarted the client.

I attached a trace of the gocql client before the restart, and after the restart, and the go client code (+go.mod) that run this this workload here in this: B0004DAC-92E9-46EB-A366-F8F044DC9E12

Thanks!