Driver version
I've tested this behavior with Redshift ODBC 2.0.0.8 & 2.0.0.9
Redshift version
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.56899
Client Operating System
Edition: Windows 10 Pro
Version: 22H2
OS build: 19045.3448
Experience: Windows Feature Experience Pack 1000.19044.1000.0
Table schema
CREATE TABLE public.nasty_cat (
generated_on timestamp without time zone ENCODE az64,
sparta_route_code character varying(256) ENCODE bytedict,
destination_spec character varying(256) ENCODE bytedict,
load_region character varying(256) ENCODE bytedict,
load_port character varying(256) ENCODE bytedict,
discharge_region character varying(256) ENCODE bytedict,
discharge_port character varying(256) ENCODE bytedict,
vessel_type character varying(256) ENCODE bytedict,
load_date character varying(256) ENCODE lzo,
delivery_date character varying(256) ENCODE lzo,
load_month_year character varying(256) ENCODE bytedict,
load_window character varying(256) ENCODE bytedict,
delivery_month_year character varying(256) ENCODE bytedict,
delivery_window character varying(256) ENCODE bytedict,
granularity character varying(256) ENCODE bytedict,
delivered_pricing_basis_1_name character varying(256) ENCODE bytedict,
delivered_pricing_basis_1_price double precision ENCODE raw,
delivered_pricing_basis_1_margin double precision ENCODE raw
) DISTSTYLE AUTO;
Problem description
The Problem occurs when attempting to load lots of data through ODBC when Redshift has a too small max_query_execution_time
.
After the given time amount, the Redshift ODBC driver receives a failure signal and aborts the SQLFetch call. However during this, there is a heap corruption du to freeing the result and then accessing it again.
In the example we're loading 28 million records for 18 columns but the max_query_execution_time is set to 400 seconds. In the minimum reproducible example I'm loading the same data set, but with a max query timeout of 20 seconds.
- Expected behaviour:
Due to max_query_execution_time
the query is aborted, fails and indicates to the C++ ODBC caller (= me) that an error occurred by returning an SQL return code of -1 (or 100 SQL_NO_DATA if necessary).
Subsequently, SQLFetch returns an error indication so that the ODBC caller can abort any further SQLFetch calls.
-
Actual behaviour:
The Redshift SQLFetch implementation recognizes the error, clears up the PGresult*
of RS_RESULT_INFO*
and afterwards attempts to retrieve the number of tuples using PQntuples(const PGresult *res)
. This is a read access violation with gflags. Without gflags this is simply reading out garbage. In a Release build of our application this has led to a number of crashes with a customer using Redshift.
-
Error message/stack trace:
When activating gflags /p /enable Test.exe /full
for our test application we receive:
Unhandled exception thrown: read access violation.
res was 0x236A4BC9F10.
with this stack trace:
> rsodbc.dll!PQntuples(const pg_result * res) Line 2882 C
rsodbc.dll!RS_STMT_INFO::RS_SQLFetchScroll(void * phstmt, short hFetchOrientation, __int64 iFetchOffset) Line 1342 C++
rsodbc.dll!SQLFetch(void * phstmt) Line 405 C++
[External Code]
Test.exe!main() Line 101 C++
- Any other details that can be helpful:
Things I've found out:
_pqGetResultLoopOnThread
actually catches that an error occurs and one can see the following error message:
ERROR: Query (6501378[child_sequence:1]) cancelled by WLM abort action of Query Monitoring Rule "max_query_execution_time".
DETAIL:
-----------------------------------------------
error: Query (6501378[child_sequence:1]) cancelled by WLM abort action of Query Monitoring Rule "max_query_execution_time".
code: 1078
context: Query (6501378[child_sequence:1]) cancelled by WLM abort action of Query Monitoring Rule "max_query_execution_time".
query: 0
location: wlm_query_action.cpp:179
process: wlm [pid=1073782836]
-----------------------------------------------
PQclear(PGresult *res)
has been called before calling PQntuples, thus freeing the result.
ODBC trace logs
Reproduction code
Please note that this example code is simply a copy'n'paste of the SQLBindCol exampel code from Microsoft which can be found here: https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlbindcol-function?view=sql-server-ver16
The only changes are:
- I've connected to our Redshift database using SQLDriverConnectA
- I've bound 3 of the 18 columns to verify (through debug breakpoints) that data is loaded properly. Which it is.
Other than that it's basically the example code. I hope this can help.
HWND dhwnd = nullptr;
SQLCHAR outstr[1024];
SQLSMALLINT outstrlen;
SQLHENV henv;
SQLHDBC hdbc;
SQLHSTMT hstmt = 0;
SQLRETURN retcode;
unsigned int nBufferSize = 4096;
auto m_pBuffer = new char[nBufferSize * 4096]();
auto m_pIndicators = new SQLLEN[nBufferSize]{ 0 };
auto m_pBuffer2 = new wchar_t[nBufferSize * 4096]();
auto m_pIndicators2 = new SQLLEN[nBufferSize]{ 0 };
auto m_pBuffer3 = new float[nBufferSize]();
auto m_pIndicators3 = new SQLLEN[nBufferSize]{ 0 };
// Allocate environment handle
retcode = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv);
// Set the ODBC version environment attribute
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
retcode = SQLSetEnvAttr(henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER*)SQL_OV_ODBC3, 0);
// Allocate connection handle
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
retcode = SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc);
// Set login timeout to 5 seconds
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
// Connect to data source
retcode = SQLDriverConnectA(hdbc,
dhwnd,
(SQLCHAR*)"DSN=AWS Redshift", // DSN we've setup
SQL_NTS,
outstr,
sizeof(outstr),
&outstrlen,
SQL_DRIVER_NOPROMPT);
SQLSetConnectAttr(hdbc, SQL_LOGIN_TIMEOUT, (SQLPOINTER)5, 0);
// Allocate statement handle
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
retcode = SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &hstmt);
retcode = SQLExecDirect(hstmt, (SQLWCHAR*)L"SELECT * FROM \"dev\".\"public\".\"nasty_cat\"", SQL_NTS);
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
// Bind columns 1, 2, and 3
retcode = SQLBindCol(hstmt, 1, SQL_C_CHAR, m_pBuffer, 1028, m_pIndicators);
retcode = SQLBindCol(hstmt, 2, SQL_C_WCHAR, m_pBuffer2, 1028, m_pIndicators2);
retcode = SQLBindCol(hstmt, 3, SQL_C_FLOAT, m_pBuffer3, 1028, m_pIndicators3);
// Fetch and print each row of data. On an error, display a message and exit.
for (int i = 0; ; i++) {
retcode = SQLFetch(hstmt);
if (retcode == SQL_ERROR || retcode == SQL_SUCCESS_WITH_INFO)
printf("error\n");
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO)
{
// do something meaningful
}
else
break;
}
}
// Process data
if (retcode == SQL_SUCCESS || retcode == SQL_SUCCESS_WITH_INFO) {
SQLCancel(hstmt);
SQLFreeHandle(SQL_HANDLE_STMT, hstmt);
}
SQLDisconnect(hdbc);
}
SQLFreeHandle(SQL_HANDLE_DBC, hdbc);
}
}
SQLFreeHandle(SQL_HANDLE_ENV, henv);
}
If there's any other things I can assist you with, I'm happy to help :)