Comments (4)
the argument here is that, if we upload 10 parts in parallel let's say from part 1 to part 10, but for example, only part 5 fails, we end up with part 5 missing and when we complete the upload the final file will be corrupted
Now it makes sense, yes. Thanks for the clarification. Keeping track of the sequence of successful parts uploaded could help. This should also happen in tusd, but, to be honest, I have never seen issues with uploading parts to S3, but that doesn't mean we shouldn't be prepared.
from tus-node-server.
Parallel nature of this class can cause file corruption when upload of any part fails (except the one which is last in that moment).
If upload of any part fails when less than 5MB was transferred to S3 then this upload will fail when finalizing multipart upload.
This is a valid point, I believe that tusd suffers from the same issue, see
- UploadParts https://github.com/tus/tusd/blob/main/pkg/s3store/s3store.go#L501-L547
- ListAllParts https://github.com/tus/tusd/blob/main/pkg/s3store/s3store.go#L1028-L1059
CC: @Acconut
We could solve this by acknowledging when part is uploaded and consider only these parts to be valid. When calculating current offset (on HEAD request) we need to consider only first consecutive valid parts.
This could be nice! yes.
CC: @Acconut
No back-pressure support. When TUS server is running on local network, uploads will be cached to disk without any consideration of actual upload speed to S3.
This has been implemented #561
There is no need to constantly split to smaller chunks. Server should cache only first 5MB of data to determine if it is safe to upload as a proper "part" and if it is, then it should stream directly to S3, but only up to 5GB for each part.
I think the need to split into smaller chunks is required because S3 has a maximum parts limit of 10k and maximum file size of 5TB, an ideal part size would be 50MB to be able to upload a single 5TB file.
If we start having parts of different sizes because the request was cancelled / resumed we might end up not having enough parts available to complete a 5TB upload in this case.
By having the server split the file into smaller chunks ensures that each part size is consistent, this is especially true when using client chunking
from tus-node-server.
Parallel nature of this class can cause file corruption when upload of any part fails (except the one which is last in that moment).
If upload of any part fails when less than 5MB was transferred to S3 then this upload will fail when finalizing multipart upload.This is a valid point, I believe that tusd suffers from the same issue, see
I don't think that argument is true. AFAIK, UploadPart (just like PutObject) will not keep any data if the request fails before the entire data has been transferred. You cannot have unfinished objects from interrupted UploadPart or PutObject calls and there won't be any corrupt parts. Otherwise we would simply use that for resuming our uploads :)
from tus-node-server.
I don't think that argument is true. AFAIK, UploadPart (just like PutObject) will not keep any data if the request fails before the entire data has been transferred. You cannot have unfinished objects from interrupted UploadPart or PutObject calls and there won't be any corrupt parts. Otherwise we would simply use that for resuming our uploads :)
@Acconut I think the argument here is that, if we upload 10 parts in parallel let's say from part 1 to part 10, but for example, only part 5 fails, we end up with part 5 missing and when we complete the upload the final file will be corrupted
The argument here is, when listing parts only account for sequential parts, in this case we detect that part 5 is missing so we only list from part 1-4 so that the upload can resume from there (as far as i understand)
from tus-node-server.
Related Issues (20)
- @tus/s3-store: add option to disable object tags HOT 10
- Track server-side upload progress HOT 2
- metdata example uses undefinded validateMetaData function HOT 1
- Demo for GCS incorrectly uses GCSDataStore HOT 3
- CRC errors using @google-cloud/storage 7.7.0 HOT 2
- Mixed content error - Page load over https buyt requested url is insecure HOT 1
- @tus/s3-store: Doesn't work with R2 HOT 1
- S3-store doen't work with Cloudflare R2 storage HOT 2
- Respect X-Forwarded-Port and X-Forwarded-Prefix as well HOT 2
- serverless integration HOT 1
- Upload fail after 10min request HOT 3
- Error installing 1.4.0 from npm HOT 2
- tus-node-server and ietf HOT 11
- Simultaneous file upload leads to corrupted files HOT 2
- Is there a way to pass variables between the `onUploadCreate` and the `onUploadFinish` functions? HOT 12
- Allow hooks to override metadata HOT 1
- Recommended nginx configuration HOT 2
- How to get final name back in client? HOT 4
- [S3Store] 0 bytes files are not uploaded HOT 1
- onUploadFinish should be able to alter response body HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tus-node-server.