Comments (7)
Can you provide a minimal code sample that reproduces this? What partSize are you using?
from aws-sdk-cpp.
my minimal example
#include <iostream>
#include <fstream>
#include <aws/core/Aws.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/PutObjectRequest.h>
int main()
{
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::S3Crt::ClientConfiguration conf;
conf.partSize = 50 * 1024 * 1024; // that means using 1GB of RAM...
Aws::Auth::AWSCredentials creds;
creds.SetAWSAccessKeyId(Aws::String("your_Access_key"));
creds.SetAWSSecretKey(Aws::String("your_secret_key"));
const std::string fileName = "big file so that it takes a bit of time to upload";
Aws::S3Crt::S3CrtClient client(creds, conf);
Aws::S3Crt::Model::PutObjectRequest request;
request.SetBucket("bucket name");
request.SetKey("my_big_file_on_s3");
std::shared_ptr<Aws::IOStream> inputData =
Aws::MakeShared<Aws::FStream>("SampleAllocationTag",
fileName.c_str(),
std::ios_base::in | std::ios_base::binary);
if (!*inputData) {
std::cerr << "Error unable to read file " << fileName << std::endl;
return 1;
}
request.SetBody(inputData);
request.SetDataSentEventHandler([](const Aws::Http::HttpRequest*, long long) {
std::cout << "callback" << std::endl;
});
Aws::S3Crt::Model::PutObjectOutcome outcome = client.PutObject(request);
if (!outcome.IsSuccess()) {
std::cerr << "Error: PutObject: " <<
outcome.GetError().GetMessage() << std::endl;
} else {
std::cout << "DONE" << std::endl;
}
Aws::ShutdownAPI(options);
}
You can note that the callback is also not called but that is declared as a separate issue...
from aws-sdk-cpp.
CRT S3 client will automatically split big uploads into multiple parts and upload them in parallel. So during upload, crt will hold several part-sized buffers in memory depending on overall parallelism settings. So depending on how big the file is and how many parts you are trying to upload at the same time, 1GB might be a reasonable number.
On top of that CRT will pool buffers to avoid reallocating them over an over again, so you might see crt holding on to a larger chunk of memory than you would expect. buffer pools are cleared after some period of inactivity
from aws-sdk-cpp.
Well to be frank, 1GB to upload a file, whatever the size, is a huge price to pay. On restricted cloud environment, this is a ridiculous amount of RAM, not to mention that we could have multiple uploads simultaneously.
On top of this, the fact that it is not controllable makes s3crtclient completely useless for us. I have no idea what the size of the upload will be and I am not sure how the total size is computed. It seems to be something like 20*the part size... But how do I know for sure? what is it dependent on?
from aws-sdk-cpp.
S3 has a fairly low per connection throughput, so to reach decent amounts of throughput, crt needs to run several connections in parallel and buffer considerable portion of the data being uploaded. Amount of parallelism used by crt can be controlled by target throughput setting (https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/include/aws/s3-crt/ClientConfiguration.h#L58). Unfortunately, that setting already defaults to the lowest possible value in cpp sdl and setting it lower will not have impact on memory usage.
Note: that overall max memory usage for the client will have an upper bound that is derived from part size and number of connections (which in turn is derived from max throughput). so memory usage does not scale directly with the number of s3 requests queued up on the client and once that upper bound is reached, memory usage will stay there.
We've made several improvements to underlying C CRT libs with regards to memory usage un the past couple months that havent made its way to CPP SDK yet, so I would be interested in learning about your use cases. What kind of instances are you running code on? overall ram on the system and NIC bandwidth? what are the typical file sizes you are trying to upload?
from aws-sdk-cpp.
Hi @DmitriyMusatkin and thank you for the reply. I was actually wondering if setting the throughput to a lower value would help.. Heh too bad for me. I suppose that if the canes to the memory usage does not directly affect those buffers it will not help me much. In essence we are a SaaS provider and there are times where we need to push data. Most likely in files of a few GB but it can go to 10s of GB (there is no actual limit), hence my questions.
The possibilities to run this are actually pretty diverse. Because it could be a SaaS instance on EC2 or on prem.
In any case we are trying to be careful with resources, and 1GB is ridiculously high just to upload a file.
That being said, for now, we have switched to using TransferManager that allows to control better the memory management.
Also TM, allows you to get the current upload progress, which the s3crtclient is failing to do (callbacks are never called...).
from aws-sdk-cpp.
Thanks for bringing your use case to our attention. I'm sorry that s3crtclient doesn't currently fit you needs. I'm changing this issue to a feature request. This feature would be to add additional options for configuring the s3crtclient. If you have any ideas for which settings you would like to configure please let us know, but I can't guarantee that we will be able to implement them.
from aws-sdk-cpp.
Related Issues (20)
- callback set in SetDataSentEventHandler is not called on S3Crt PutRequest HOT 2
- Cannot build sdk on a Amazon Linux 2023 based Container after 1.11.211 Version Tag HOT 9
- The S3 protocol does not support retrieving multiple attributes through GetObjectAttributes. HOT 3
- Wrong OpenSSL CMake targets used HOT 2
- Kinesis Video Stream hangs on getMedia request HOT 5
- Mandatory meta data for PUT S3 presigned url unlike javascript sdk HOT 3
- S3Client instantiation extremely slow HOT 8
- aws-cpp-sdk-core/source/http/curl/CurlHttpClient.cpp: Bad call to cURL HOT 6
- AWS SDK CurlHttp requests HOT 6
- S3Crt GetObject failure HOT 7
- The following imported targets are referenced, but are missing: AWS::aws-c-sdkutils HOT 2
- S3-CRT Client signature error on s3 object keys with special characters HOT 4
- Generate RC files for windows DLLs
- Duplicate definitions for AWS_CREDENTIAL_PROVIDER_EXPIRATION_GRACE_PERIOD HOT 2
- curl-originated error messages not informative enough HOT 1
- InitAPI crash in ubuntu 22.04 in latest code HOT 5
- S3Client Leaks Memory on Windows caused by BCrypt API Misuse HOT 3
- STS does not respect ca cert setting HOT 1
- Between version 1.11.159 and 1.11.305, the GetObjectAsync method of S3CrtClient has become very very slow. HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-sdk-cpp.