Comments (18)
Yes, the problem is likely in the cast
function.
The query has 2 filters: the "where parent in" and the "not exists". In theory, the "where parent in" filter should be applied first, so the "not exists" filter have casts over the previews, which are expected to have numbers as names. However, if the "not exists" filter is applied first for whatever reason, we could have casts over regular filenames such as the "welcome.txt" file.
from core.
I think that I have seen this sometimes in CI, but then the tests pass on rerun, which is really strange (specially for a unit test)
from core.
I was able to reproduce the invalid input syntax for..
exception on PostgreSQL 15.4 + oC 10.13.2. However, the exception is not reproducible 100% of the time when following steps below (maybe this explains why the tests pass on rerun):
- Scenario 1:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and shares it with user "user1"
- user "admin" removes user "user1"
- user "admin" deletes "file2.txt", also from trashbin
- now running "occ previews:cleanup" will trigger an exception (at first run only)
- Scenario 2:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and shares it with user "user1"
- user "admin" deletes "file2.txt", also from trashbin
- now running "occ previews:cleanup" will trigger an exception (at first run only)
- Scenario 3:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and deletes it (NOTE: file is still in trashbin)
- run "occ previews:cleanup" —> no exception
- user "admin" now deletes "files2.txt" from trashbin
- now running "occ previews:cleanup2" will trigger an exception (at first run only)
In PDOStatement.php line 117:
[PDOException (22P02)]
SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for type bigint: "files_trashbin"
Exception trace:
at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:117
PDOStatement->execute() at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:117
Doctrine\DBAL\Driver\PDOStatement->execute() at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:1304
Doctrine\DBAL\Connection->executeQuery() at /var/www/html/owncloud/lib/private/DB/Connection.php:191
OC\DB\Connection->executeQuery() at /var/www/html/owncloud/lib/private/PreviewCleanup.php:145
OC\PreviewCleanup->queryPreviewsToDelete() at /var/www/html/owncloud/lib/private/PreviewCleanup.php:53
OC\PreviewCleanup->process() at /var/www/html/owncloud/core/Command/Previews/Cleanup.php:72
OC\Core\Command\Previews\Cleanup->execute() at /var/www/html/owncloud/lib/composer/symfony/console/Command/Command.php:298
Symfony\Component\Console\Command\Command->run() at /var/www/html/owncloud/core/Command/Base.php:159
OC\Core\Command\Base->run() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:1040
Symfony\Component\Console\Application->doRunCommand() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:301
Symfony\Component\Console\Application->doRun() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:171
Symfony\Component\Console\Application->run() at /var/www/html/owncloud/lib/private/Console/Application.php:165
OC\Console\Application->run() at /var/www/html/owncloud/console.php:94
require_once() at /var/www/html/owncloud/occ:11
previews:cleanup [--output [OUTPUT]] [--all] [--] [<chunk_size>]
from core.
Maybe related to the cast done around https://github.com/owncloud/core/blob/master/lib/private/PreviewCleanup.php#L105-L111
from core.
@jvillafanez following seems to solve the issue for me: https://gist.github.com/pako81/bbd7277130947becbe0bf0739ae3a31d#file-previewcleanup-php-L137-L159. But maybe there is a more efficient way to achieve this without the need to rewrite the query for PostgreSQL only.
from core.
There is not really an issue in having multiple dedicated queries for different dbms - at least in this highly specific case where the query has some complexity.
This is also why I originally did not use the query builder as this adds a hell of complexity to maintain it.
Having just two or three explicit strings holding the query is better for maintenance.
from core.
We could check the performance of the following query:
SELECT * FROM (
SELECT `fileid`, `name`, `storage`
FROM `oc_filecache` `thumb`
WHERE `parent` IN (
SELECT `fileid`
FROM `oc_filecache`
WHERE `storage` IN (
SELECT `numeric_id`
FROM `oc_storages`
WHERE `id` LIKE 'home::%' OR `id` LIKE 'object::user:%'
)
AND `path_hash` = '3b8779ba05b8f0aed49650f3ff8beb4b'
)
) `t`
WHERE NOT EXISTS (
SELECT 1
FROM `oc_filecache`
WHERE `fileid` = CAST(`t`.`name` AS SIGNED)
)
AND `fileid` > 0
ORDER BY `storage`;
the performance should be fine (checked with mysql)
+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | oc_filecache | NULL | index | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_path_hash | 134 | NULL | 10 | 10.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | oc_storages | NULL | eq_ref | PRIMARY,storages_id_index | PRIMARY | 4 | owncloud.oc_filecache.storage | 1 | 100.00 | Using where |
| 1 | SIMPLE | thumb | NULL | ref | PRIMARY,fs_parent_name_hash,fs_parent_storage_size | fs_parent_name_hash | 8 | owncloud.oc_filecache.fileid | 1 | 100.00 | Using index condition |
| 1 | SIMPLE | oc_filecache | NULL | eq_ref | PRIMARY | PRIMARY | 8 | func | 1 | 100.00 | Using where; Not exists; Using index |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
4 rows in set, 2 warnings (0.01 sec)
That should also work for postgresql I think
from core.
#41047 should fix the issue. I haven't tested with postgresql yet, but the CI passes
from core.
Unfortunately, still reproducible with #41047 in place.
from core.
Yes, it seems postgresql still changes the query plan even for the new query....
owncloud=# explain SELECT * FROM (
SELECT "fileid", "name", "storage"
FROM "oc_filecache" "thumb"
WHERE "parent" IN (
SELECT "fileid"
FROM "oc_filecache"
WHERE "storage" IN (
SELECT "numeric_id"
FROM "oc_storages"
WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
)
AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
)
) "t"
WHERE NOT EXISTS (
SELECT 1
FROM "oc_filecache"
WHERE "fileid" = CAST("t"."name" AS BIGINT)
)
AND "fileid" > 0
ORDER BY "storage";
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Sort (cost=38.13..38.14 rows=1 width=528)
Sort Key: thumb.storage
-> Nested Loop Anti Join (cost=26.98..38.12 rows=1 width=528)
-> Nested Loop (cost=26.84..34.88 rows=1 width=528)
-> HashAggregate (cost=26.70..26.71 rows=1 width=8)
Group Key: oc_filecache_1.fileid
-> Nested Loop Semi Join (cost=0.15..26.69 rows=1 width=8)
-> Seq Scan on oc_filecache oc_filecache_1 (cost=0.00..10.50 rows=1 width=12)
Filter: ((path_hash)::text = '3b8779ba05b8f0aed49650f3ff8beb4b'::text)
-> Index Scan using oc_storages_pkey on oc_storages (cost=0.15..8.17 rows=1 width=4)
Index Cond: (numeric_id = oc_filecache_1.storage)
Filter: (((id)::text ~~ 'home::%'::text) OR ((id)::text ~~ 'object::user:%'::text))
-> Index Scan using fs_parent_storage_size on oc_filecache thumb (cost=0.14..8.16 rows=1 width=536)
Index Cond: (parent = oc_filecache_1.fileid)
Filter: (fileid > 0)
-> Index Only Scan using oc_filecache_pkey on oc_filecache (cost=0.14..3.24 rows=1 width=8)
Index Cond: (fileid = (thumb.name)::bigint)
(17 rows)
owncloud=# explain SELECT * FROM (
SELECT "fileid", "name", "storage"
FROM "oc_filecache" "thumb"
WHERE "parent" IN (
SELECT "fileid"
FROM "oc_filecache"
WHERE "storage" IN (
SELECT "numeric_id"
FROM "oc_storages"
WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
)
AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
)
) "t"
WHERE NOT EXISTS (
SELECT 1
FROM "oc_filecache"
WHERE "fileid" = CAST("t"."name" AS BIGINT)
)
AND "fileid" > 0
ORDER BY "storage";
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=12.18..12.18 rows=1 width=20)
Sort Key: thumb.storage
-> Nested Loop Semi Join (cost=1.55..12.17 rows=1 width=20)
Join Filter: (thumb.parent = oc_filecache_1.fileid)
-> Hash Anti Join (cost=1.41..2.74 rows=1 width=28)
Hash Cond: ((thumb.name)::bigint = oc_filecache.fileid)
-> Seq Scan on oc_filecache thumb (cost=0.00..1.23 rows=18 width=28)
Filter: (fileid > 0)
-> Hash (cost=1.18..1.18 rows=18 width=8)
-> Seq Scan on oc_filecache (cost=0.00..1.18 rows=18 width=8)
-> Materialize (cost=0.15..9.41 rows=1 width=8)
-> Nested Loop Semi Join (cost=0.15..9.41 rows=1 width=8)
-> Seq Scan on oc_filecache oc_filecache_1 (cost=0.00..1.23 rows=1 width=12)
Filter: ((path_hash)::text = '3b8779ba05b8f0aed49650f3ff8beb4b'::text)
-> Index Scan using oc_storages_pkey on oc_storages (cost=0.15..8.17 rows=1 width=4)
Index Cond: (numeric_id = oc_filecache_1.storage)
Filter: (((id)::text ~~ 'home::%'::text) OR ((id)::text ~~ 'object::user:%'::text))
(17 rows)
I'll close the PR.
I'm checking "common table expresions" as an alternative
from core.
Let's see if #41048 is more explicit and works on all the DBs
from core.
It seems even with #41048 issue is still there. At least manual tests seem to indicate so.
from core.
It seems even with #41048 issue is still there. At least manual tests seem to indicate so.
Works for me, and the tests pass (except for mysql 5, which doesn't have the with
clause). Is it reproducible?
I'm out of ideas. I guess we'll have to write a query specifically for postgresql.
The weird thing is that it could also happen with the rest of the DBs. I mean, they could optimize the query the same way as postgresql; whether it triggers an error or does another thing, I don't know.
Maybe the best option is #41047 + custom query for postgresql. Or we could keep the current code + the custom query for postgresql, assuming the DB won't do anything weird with the query.
from core.
Works for me, and the tests pass (except for mysql 5, which doesn't have the with clause). Is it reproducible?
I was able to reproduce it by following the steps from Scenario 1 at #41040 (comment). Note: it may not be reproducible at first shot.
from core.
Yes, confirmed. You might need to have multiple users / storages so that the DB chooses to use the "not exists" condition first because it might cost less. I'll close that PR too.
I'll check support for regexp in all DBs
from core.
We can try with something like (we'll need to adjust the type of the cast for each DB)
SELECT "fileid", "name", "storage"
FROM "oc_filecache" "thumb"
WHERE "parent" IN (
SELECT "fileid"
FROM "oc_filecache"
WHERE "storage" IN (
SELECT "numeric_id"
FROM "oc_storages"
WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
)
AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
)
AND NOT EXISTS (
SELECT 1
FROM "oc_filecache"
WHERE "fileid" = CAST(REGEXP_REPLACE("thumb"."name", '(^.*[^[:digit:]]+.*$|^$)', CAST("thumb"."fileid" as VARCHAR(250))) as BIGINT)
)
AND "fileid" > 0
ORDER BY "storage"
I think the regexp_replace
function is available in mysql, mariadb, postgresql and oracle. For sqlite3 we'll need to load the regexp extension from https://github.com/nalgeon/sqlean . We'll have to add some basic support to load sqlite3 extensions (this will require additional documentation)
As far as I know, the execution order of the filters depend on the query planner of each DB, and I don't think we have any control over it. This means that what is happening with postgresql could happen with any other DB. A defensive approach like the above should prevent problems
from core.
Let's see if the third time is the charm...
#41051 works for me, and tests pass.
Taking into account that casting a random string to an integer has an undefined / undocumented behavior for most of the DBs (at least it's difficult to find the info), we'll go through the "regexp" route by default to ensure we can cast the value. For mysql 5 and sqlite, which don't have the regexp_replace
function, we'll use a different approach taking advantage of the behavior they have when casting a random string.
Explanation for the tricks used is in the code, and it should be clear enough
from core.
confirmed to work: exceptions are not triggered anymore 👍
from core.
Related Issues (20)
- Trying to access array offset on value of type bool at apps/files_external/lib/Lib/Storage/SFTP.php
- Web Updater cURL Error 35: SSL Routines Error during ownCloud Update from 10.13.2.3 to 10.13.4 on DietPi HOT 4
- Docker upgrade 10.0.9 to 10.13.4 falied HOT 12
- Can't login HOT 10
- Not clear how to set/get config reliably without example HOT 4
- Transactional file locking should be configured to use memory-based locking HOT 11
- Sync is broken after ownCloud update HOT 5
- [QA] app dependencies are not checked during `occ upgrade` HOT 4
- Cannot add csp domain HOT 1
- [QA] skip pdf preview API test scenario for stable server versions (10.13.4) and below
- Cannot set the download repository for owncloud 2.10 on Ubuntu 22.04.3 LTS HOT 1
- The thumbnail of the images does not display. HOT 7
- [QA] docker images have no support for PDF or SVG preview providers HOT 8
- [QA] richdocuments app fails in 10.14.0 HOT 6
- External storage support with windows server 2022 showing not all files / folders
- Update Translation on ownCloud Instance HOT 3
- [QA] typical Photos from a smartphone are no longer rendered in the builtin image viewer HOT 9
- owncloud/server:10.13.4 - possible Table Prefix Chaos? HOT 1
- subfolder pane sorting by last modified function not working HOT 2
- PDF Preview not working HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from core.