Code Monkey home page Code Monkey logo

Comments (5)

jmklix avatar jmklix commented on August 16, 2024

This looks like it might be a problem with CurveFS and trying to make sdk calls after the bucket gets deleted. Please open an issue with that product if you would like help with that. If you can reproduce this error and provide sdk logs then we could try to debug it, but currently there isn't enough information to do that. Please let me know if you have any aws-sdk-cpp specific questions that I can help you with.

from aws-sdk-cpp.

sglnd avatar sglnd commented on August 16, 2024

This looks like it might be a problem with CurveFS and trying to make sdk calls after the bucket gets deleted. Please open an issue with that product if you would like help with that. If you can reproduce this error and provide sdk logs then we could try to debug it, but currently there isn't enough information to do that. Please let me know if you have any aws-sdk-cpp specific questions that I can help you with.

aws_del_bef.zip
aws_del_after.zip

I provided logs both before and after the bucket was deleted. Let me describe the background again: during the process, the bucket was Forbidden, and there was no change in the access to CurveFs from the upper layer. However, it was only when the bucket was finally deleted that a large number of CurlHttp accesses occurred. This timing coincided, leading us to believe that it was the different information returned after the bucket was deleted that caused some issues with the SDK's client judgment.

from aws-sdk-cpp.

jmklix avatar jmklix commented on August 16, 2024

Can you please also provide a code sample that reproduces this error with only using the aws-sdk-cpp and s3? We don't guarantee that this sdk will work with curveFS

from aws-sdk-cpp.

sglnd avatar sglnd commented on August 16, 2024

Can you please also provide a code sample that reproduces this error with only using the aws-sdk-cpp and s3? We don't guarantee that this sdk will work with curveFS

We've also conducted ongoing analysis. Prior to the bucket deletion, all accesses resulted in HTTP 403 errors. However, after the deletion, there was a significant increase in HTTP 500 errors, accompanied by a large number of Curl HTTP requests. We've also reviewed the code for CurveFS and aws-sdk-cpp.

CurveFS code:

void S3Adapter::PutObjectAsync(std::shared_ptr<PutObjectAsyncContext> context) {
    Aws::S3::Model::PutObjectRequest request;
    request.SetBucket(bucketName_);
    request.SetKey(Aws::String{context->key.c_str(), context->key.size()});

    request.SetBody(Aws::MakeShared<PreallocatedIOStream>(
        AWS_ALLOCATE_TAG, context->buffer, context->bufferSize));

    auto originCallback = context->cb;
    auto wrapperCallback =
        [this,
         originCallback](const std::shared_ptr<PutObjectAsyncContext>& ctx) {
            inflightBytesThrottle_->OnComplete(ctx->bufferSize);
            ctx->cb = originCallback;
            ctx->cb(ctx);
        };

    Aws::S3::PutObjectResponseReceivedHandler handler =
        [context](
            const Aws::S3::S3Client * /*client*/,
            const Aws::S3::Model::PutObjectRequest & /*request*/,
            const Aws::S3::Model::PutObjectOutcome &response,
            const std::shared_ptr<const Aws::Client::AsyncCallerContext>
                &awsCtx) {
            std::shared_ptr<PutObjectAsyncContext> ctx =
                std::const_pointer_cast<PutObjectAsyncContext>(
                    std::dynamic_pointer_cast<const PutObjectAsyncContext>(
                        awsCtx));

            LOG_IF(ERROR, !response.IsSuccess())
                << "PutObjectAsync error: "
                << response.GetError().GetExceptionName()
                << "message: " << response.GetError().GetMessage()
                << "resend: " << ctx->key;

            ctx->retCode = (response.IsSuccess() ? 0 : -1);
            ctx->timer.stop();
            ctx->cb(ctx);
        };

    if (throttle_) {
        throttle_->Add(false, context->bufferSize);
    }

    inflightBytesThrottle_->OnStart(context->bufferSize);
    context->cb = std::move(wrapperCallback);
    s3Client_->PutObjectAsync(request, handler, context);
}

aws-sdk-cpp code:

HttpResponseOutcome AWSClient::AttemptExhaustively(const Aws::Http::URI& uri,
    Aws::Http::HttpMethod method,
    const char* signerName,
    const char* requestName,
    const char* signerRegionOverride,
    const char* signerServiceNameOverride) const
{
    if (!Aws::Utils::IsValidHost(uri.GetAuthority()))
    {
        return HttpResponseOutcome(AWSError<CoreErrors>(CoreErrors::VALIDATION, "", "Invalid DNS Label found in URI host", false/*retryable*/));
    }

    std::shared_ptr<HttpRequest> httpRequest(CreateHttpRequest(uri, method, Aws::Utils::Stream::DefaultResponseStreamFactoryMethod));
    HttpResponseOutcome outcome;
    AWSError<CoreErrors> lastError;
    Aws::Monitoring::CoreMetricsCollection coreMetrics;
    auto contexts = Aws::Monitoring::OnRequestStarted(this->GetServiceClientName(), requestName, httpRequest);
    const char* signerRegion = signerRegionOverride;
    Aws::String regionFromResponse;

    Aws::String invocationId = Aws::Utils::UUID::PseudoRandomUUID();
    RequestInfo requestInfo;
    requestInfo.attempt = 1;
    requestInfo.maxAttempts = 0;
    httpRequest->SetHeaderValue(Http::SDK_INVOCATION_ID_HEADER, invocationId);
    httpRequest->SetHeaderValue(Http::SDK_REQUEST_HEADER, requestInfo);
    AppendRecursionDetectionHeader(httpRequest);

    for (long retries = 0;; retries++)
    {
        if(!m_retryStrategy->HasSendToken())
        {
            return HttpResponseOutcome(AWSError<CoreErrors>(CoreErrors::SLOW_DOWN,
                                                            "",
                                                            "Unable to acquire enough send tokens to execute request.",
                                                            false/*retryable*/));

        };
        outcome = AttemptOneRequest(httpRequest, signerName, requestName, signerRegion, signerServiceNameOverride);
        outcome.SetRetryCount(retries);
        if (retries == 0)
        {
            m_retryStrategy->RequestBookkeeping(outcome);
        }
        else
        {
            m_retryStrategy->RequestBookkeeping(outcome, lastError);
        }
        coreMetrics.httpClientMetrics = httpRequest->GetRequestMetrics();
        TracingUtils::EmitCoreHttpMetrics(httpRequest->GetRequestMetrics(),
            *m_telemetryProvider->getMeter(this->GetServiceClientName(), {}),
            {{TracingUtils::SMITHY_METHOD_DIMENSION, requestName},{TracingUtils::SMITHY_SERVICE_DIMENSION, this->GetServiceClientName()}});
        if (outcome.IsSuccess())
        {
            Aws::Monitoring::OnRequestSucceeded(this->GetServiceClientName(), requestName, httpRequest, outcome, coreMetrics, contexts);
            AWS_LOGSTREAM_TRACE(AWS_CLIENT_LOG_TAG, "Request successful returning.");
            break;
        }
        lastError = outcome.GetError();

        DateTime serverTime = GetServerTimeFromError(outcome.GetError());
        auto clockSkew = DateTime::Diff(serverTime, DateTime::Now());

        Aws::Monitoring::OnRequestFailed(this->GetServiceClientName(), requestName, httpRequest, outcome, coreMetrics, contexts);

        if (!m_httpClient->IsRequestProcessingEnabled())
        {
            AWS_LOGSTREAM_TRACE(AWS_CLIENT_LOG_TAG, "Request was cancelled externally.");
            break;
        }

        // Adjust region
        bool retryWithCorrectRegion = false;
        HttpResponseCode httpResponseCode = outcome.GetError().GetResponseCode();
        if (httpResponseCode == HttpResponseCode::MOVED_PERMANENTLY ||  // 301
            httpResponseCode == HttpResponseCode::TEMPORARY_REDIRECT || // 307
            httpResponseCode == HttpResponseCode::BAD_REQUEST ||        // 400
            httpResponseCode == HttpResponseCode::FORBIDDEN)            // 403
        {
            regionFromResponse = GetErrorMarshaller()->ExtractRegion(outcome.GetError());
            if (m_region == Aws::Region::AWS_GLOBAL && !regionFromResponse.empty() && regionFromResponse != signerRegion)
            {
                signerRegion = regionFromResponse.c_str();
                retryWithCorrectRegion = true;
            }
        }

        long sleepMillis = TracingUtils::MakeCallWithTiming<long>(
            [&]() -> long {
                return m_retryStrategy->CalculateDelayBeforeNextRetry(outcome.GetError(), retries);
            },
            TracingUtils::SMITHY_CLIENT_SERVICE_BACKOFF_DELAY_METRIC,
            *m_telemetryProvider->getMeter(this->GetServiceClientName(), {}),
            {{TracingUtils::SMITHY_METHOD_DIMENSION, requestName},{TracingUtils::SMITHY_SERVICE_DIMENSION, this->GetServiceClientName()}});
        //AdjustClockSkew returns true means clock skew was the problem and skew was adjusted, false otherwise.
        //sleep if clock skew and region was NOT the problem. AdjustClockSkew may update error inside outcome.
        bool shouldSleep = !AdjustClockSkew(outcome, signerName) && !retryWithCorrectRegion;

        if (!retryWithCorrectRegion && !m_retryStrategy->ShouldRetry(outcome.GetError(), retries))
        {
            break;
        }

        AWS_LOGSTREAM_WARN(AWS_CLIENT_LOG_TAG, "Request failed, now waiting " << sleepMillis << " ms before attempting again.");

        if (shouldSleep)
        {
            m_httpClient->RetryRequestSleep(std::chrono::milliseconds(sleepMillis));
        }

        Aws::Http::URI newUri = uri;
        Aws::String newEndpoint = GetErrorMarshaller()->ExtractEndpoint(outcome.GetError());
        if (!newEndpoint.empty())
        {
            newUri.SetAuthority(newEndpoint);
        }
        httpRequest = CreateHttpRequest(newUri, method, Aws::Utils::Stream::DefaultResponseStreamFactoryMethod);

        httpRequest->SetHeaderValue(Http::SDK_INVOCATION_ID_HEADER, invocationId);
        if (serverTime.WasParseSuccessful() && serverTime != DateTime())
        {
            requestInfo.ttl = DateTime::Now() + clockSkew + std::chrono::milliseconds(m_requestTimeoutMs);
        }
        requestInfo.attempt ++;
        requestInfo.maxAttempts = m_retryStrategy->GetMaxAttempts();
        httpRequest->SetHeaderValue(Http::SDK_REQUEST_HEADER, requestInfo);
        Aws::Monitoring::OnRequestRetry(this->GetServiceClientName(), requestName, httpRequest, contexts);
    }
    auto meter = m_telemetryProvider->getMeter(this->GetServiceClientName(), {});
    auto counter = meter->CreateCounter(TracingUtils::SMITHY_CLIENT_SERVICE_ATTEMPTS_METRIC, TracingUtils::COUNT_METRIC_TYPE, "");
    counter->add(requestInfo.attempt, {{TracingUtils::SMITHY_METHOD_DIMENSION, requestName},{TracingUtils::SMITHY_SERVICE_DIMENSION, this->GetServiceClientName()}});
    Aws::Monitoring::OnFinish(this->GetServiceClientName(), requestName, httpRequest, contexts);
    return outcome;
}

Summary:
In the logic of AWS SDK for C++, the absence of a sleep buffer mechanism when encountering HTTP 500 errors may lead to the re-creation of HTTP requests, potentially triggering DNS requests. Therefore, there are still several questions we haven't fully understood:

After the bucket is disabled, which requests from object storage will return HTTP 500 error responses before and after deletion?
Why do a sudden surge of Curl requests occur instead of requests related to object storage operations?

from aws-sdk-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.