Code Monkey home page Code Monkey logo

Comments (7)

jmklix avatar jmklix commented on August 16, 2024

Can you provide a minimal code sample that reproduces this? What partSize are you using?

from aws-sdk-cpp.

thierryba avatar thierryba commented on August 16, 2024

my minimal example

#include <iostream>
#include <fstream>
#include <aws/core/Aws.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/PutObjectRequest.h>


int main()
{
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration conf;
    conf.partSize = 50 * 1024 * 1024; // that means using 1GB of RAM...
    Aws::Auth::AWSCredentials creds;
    creds.SetAWSAccessKeyId(Aws::String("your_Access_key"));
        creds.SetAWSSecretKey(Aws::String("your_secret_key"));
    const std::string fileName = "big file so that it takes a bit of time to upload";

    Aws::S3Crt::S3CrtClient client(creds, conf);

    Aws::S3Crt::Model::PutObjectRequest request;

    request.SetBucket("bucket name");
    request.SetKey("my_big_file_on_s3");

    std::shared_ptr<Aws::IOStream> inputData =
        Aws::MakeShared<Aws::FStream>("SampleAllocationTag",
                                      fileName.c_str(),
                                      std::ios_base::in | std::ios_base::binary);

    if (!*inputData) {
        std::cerr << "Error unable to read file " << fileName << std::endl;
        return 1;
    }

    request.SetBody(inputData);

    request.SetDataSentEventHandler([](const Aws::Http::HttpRequest*, long long) {
        std::cout << "callback" << std::endl;
    });

    Aws::S3Crt::Model::PutObjectOutcome outcome = client.PutObject(request);
    if (!outcome.IsSuccess()) {
        std::cerr << "Error: PutObject: " <<
            outcome.GetError().GetMessage() << std::endl;
    } else {
        std::cout << "DONE" << std::endl;
    }


    Aws::ShutdownAPI(options);
}

You can note that the callback is also not called but that is declared as a separate issue...

from aws-sdk-cpp.

DmitriyMusatkin avatar DmitriyMusatkin commented on August 16, 2024

CRT S3 client will automatically split big uploads into multiple parts and upload them in parallel. So during upload, crt will hold several part-sized buffers in memory depending on overall parallelism settings. So depending on how big the file is and how many parts you are trying to upload at the same time, 1GB might be a reasonable number.

On top of that CRT will pool buffers to avoid reallocating them over an over again, so you might see crt holding on to a larger chunk of memory than you would expect. buffer pools are cleared after some period of inactivity

from aws-sdk-cpp.

thierryba avatar thierryba commented on August 16, 2024

Well to be frank, 1GB to upload a file, whatever the size, is a huge price to pay. On restricted cloud environment, this is a ridiculous amount of RAM, not to mention that we could have multiple uploads simultaneously.
On top of this, the fact that it is not controllable makes s3crtclient completely useless for us. I have no idea what the size of the upload will be and I am not sure how the total size is computed. It seems to be something like 20*the part size... But how do I know for sure? what is it dependent on?

from aws-sdk-cpp.

DmitriyMusatkin avatar DmitriyMusatkin commented on August 16, 2024

S3 has a fairly low per connection throughput, so to reach decent amounts of throughput, crt needs to run several connections in parallel and buffer considerable portion of the data being uploaded. Amount of parallelism used by crt can be controlled by target throughput setting (https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/include/aws/s3-crt/ClientConfiguration.h#L58). Unfortunately, that setting already defaults to the lowest possible value in cpp sdl and setting it lower will not have impact on memory usage.

Note: that overall max memory usage for the client will have an upper bound that is derived from part size and number of connections (which in turn is derived from max throughput). so memory usage does not scale directly with the number of s3 requests queued up on the client and once that upper bound is reached, memory usage will stay there.

We've made several improvements to underlying C CRT libs with regards to memory usage un the past couple months that havent made its way to CPP SDK yet, so I would be interested in learning about your use cases. What kind of instances are you running code on? overall ram on the system and NIC bandwidth? what are the typical file sizes you are trying to upload?

from aws-sdk-cpp.

thierryba avatar thierryba commented on August 16, 2024

Hi @DmitriyMusatkin and thank you for the reply. I was actually wondering if setting the throughput to a lower value would help.. Heh too bad for me. I suppose that if the canes to the memory usage does not directly affect those buffers it will not help me much. In essence we are a SaaS provider and there are times where we need to push data. Most likely in files of a few GB but it can go to 10s of GB (there is no actual limit), hence my questions.
The possibilities to run this are actually pretty diverse. Because it could be a SaaS instance on EC2 or on prem.
In any case we are trying to be careful with resources, and 1GB is ridiculously high just to upload a file.

That being said, for now, we have switched to using TransferManager that allows to control better the memory management.
Also TM, allows you to get the current upload progress, which the s3crtclient is failing to do (callbacks are never called...).

from aws-sdk-cpp.

jmklix avatar jmklix commented on August 16, 2024

Thanks for bringing your use case to our attention. I'm sorry that s3crtclient doesn't currently fit you needs. I'm changing this issue to a feature request. This feature would be to add additional options for configuring the s3crtclient. If you have any ideas for which settings you would like to configure please let us know, but I can't guarantee that we will be able to implement them.

from aws-sdk-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.