Initially I set Asset Duplicates to Not allow, check by file content
After updating from atrocore/core (1.2.5 => 1.2.30)
I couldn't upload files over 2MB.
I got an "Error 400: Such asset already exists." error on this endpoint: /api/v1/Attachment/action/CreateByChunks
That means that the md5 hash of the file is the same - but it was always a different file a bit larger than 2MB and smaller than 4MB.
Long story short something is going on around here:
|
$md5 = ''; |
|
file_put_contents($filePath, ''); |
|
foreach ($files as $file) { |
|
$md5 = md5($md5 . $file); |
|
file_put_contents($filePath, file_get_contents($dirPath . $file), FILE_APPEND); |
|
} |
|
|
|
$attachment->md5 = $md5; |
These are two different files that I uploaded:
and
They both have the same hash as another file already in the database.
the hash is the same, because it consists only of two chunks with the same filename (and the first iteration of $md5 = ''
)
here is how I debugged it (like a caveman ^^)
$GLOBALS['log']->debug('$filePat: ' . $filePath);
$md5 = '';
file_put_contents($filePath, '');
foreach ($files as $file) {
$md5 = md5($md5 . $file);
$GLOBALS['log']->debug('$file: ' . $file);
$GLOBALS['log']->debug('$md5: ' . $md5);
file_put_contents($filePath, file_get_contents($dirPath . $file), FILE_APPEND);
}
$attachment->md5 = $md5;
$attachment->storageFilePath = $destPath;
$attachment->storageThumbPath = $this->getRepository()->getDestPath(FilePathBuilder::UPLOAD);
$GLOBALS['log']->debug('$attachment: ', (array) $attachment);
here is a log:
[2021-05-25 21:24:53] Log.DEBUG: $destPath: 06lsx/xjzn3/8tonu/3itw6/zwvcd/4q5sp [] []
[2021-05-25 21:24:53] Log.DEBUG: $filePat: upload/files/06lsx/xjzn3/8tonu/3itw6/zwvcd/4q5sp/rot.png [] []
[2021-05-25 21:24:53] Log.DEBUG: $file: 0 [] []
[2021-05-25 21:24:53] Log.DEBUG: $md5: cfcd208495d565ef66e7dff9f98764da [] []
[2021-05-25 21:24:53] Log.DEBUG: $file: 2097152 [] []
[2021-05-25 21:24:53] Log.DEBUG: $md5: c78f7d2f1137d0cdcd2865b9804a26a6 [] []
[2021-05-25 21:24:53] Log.DEBUG: $attachment: {"chunkId":"ea9907089e1afa36b7cba91345aaae73",...} - Such asset already exists. [] []
[2021-05-25 21:24:53] Log.ERROR: Display Error: Such asset already exists., Code: 400 URL: /api/v1/Attachment/action/CreateByChunks [] []
[2021-05-25 21:26:22] Log.DEBUG: $destPath: 06lsx/xjzn3/8tonu/3itw6/zwvcd/apt8a [] []
[2021-05-25 21:26:22] Log.DEBUG: $filePat: upload/files/06lsx/xjzn3/8tonu/3itw6/zwvcd/apt8a/blau.png [] []
[2021-05-25 21:26:22] Log.DEBUG: $file: 0 [] []
[2021-05-25 21:26:22] Log.DEBUG: $md5: cfcd208495d565ef66e7dff9f98764da [] []
[2021-05-25 21:26:22] Log.DEBUG: $file: 2097152 [] []
[2021-05-25 21:26:22] Log.DEBUG: $md5: c78f7d2f1137d0cdcd2865b9804a26a6 [] []
[2021-05-25 21:26:22] Log.DEBUG: $attachment: {"chunkId":"065dc8c83dd368069b73341c7c1f9b11",...} - Such asset already exists. [] []
I didn't test with 3 chunks per file (file >4MB && <6MB) and I didn't test with notAllowByContentAndName
.
I suppose the foreach loop should be something like this:
$md5 = '';
foreach ($files as $file) {
$md5 .= md5($md5.md5_file($dirPath . $file)); // Hash the md5 of each chunk
// or maybe cleaner:
$md5Arr[] = md5_file($dirPath . $file);
file_put_contents($filePath, file_get_contents($dirPath . $file), FILE_APPEND);
}
$attachment->md5 = $md5;
// or:
$attachment->md5 = md5(implode('',$md5Arr));
// or:
// Hashing the whole file, but with a 50+ MB file this will be too slow
$attachment->md5 = md5_file($filePath);
I guess to md5 the chunks seems to be the best way (see also: https://www.php.net/manual/en/function.md5-file.php#94494)
But I might be wrong here.
What do you think?
Edit: added the .
to $md5 .= md5(...