Code Monkey home page Code Monkey logo

Comments (18)

jvillafanez avatar jvillafanez commented on June 3, 2024 1

Yes, the problem is likely in the cast function.

The query has 2 filters: the "where parent in" and the "not exists". In theory, the "where parent in" filter should be applied first, so the "not exists" filter have casts over the previews, which are expected to have numbers as names. However, if the "not exists" filter is applied first for whatever reason, we could have casts over regular filenames such as the "welcome.txt" file.

from core.

phil-davis avatar phil-davis commented on June 3, 2024

I think that I have seen this sometimes in CI, but then the tests pass on rerun, which is really strange (specially for a unit test)

from core.

pako81 avatar pako81 commented on June 3, 2024

I was able to reproduce the invalid input syntax for.. exception on PostgreSQL 15.4 + oC 10.13.2. However, the exception is not reproducible 100% of the time when following steps below (maybe this explains why the tests pass on rerun):

  • Scenario 1:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and shares it with user "user1"
- user "admin" removes user "user1"
- user "admin" deletes "file2.txt", also from trashbin 
- now running "occ previews:cleanup" will trigger an exception (at first run only)
  • Scenario 2:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and shares it with user "user1"
- user "admin" deletes "file2.txt", also from trashbin
- now running "occ previews:cleanup" will trigger an exception (at first run only)
  • Scenario 3:
- user "admin" creates "file.txt" with some text in it
- user "admin" deletes "file.txt", also from trashbin
- user "admin" creates "files2.txt" with some text in it and deletes it (NOTE: file is still in trashbin)
- run "occ previews:cleanup" —> no exception
- user "admin" now deletes "files2.txt" from trashbin
- now running "occ previews:cleanup2" will trigger an exception (at first run only)
In PDOStatement.php line 117:
                                                                                                                  
  [PDOException (22P02)]                                                                                          
  SQLSTATE[22P02]: Invalid text representation: 7 ERROR:  invalid input syntax for type bigint: "files_trashbin"  
                                                                                                                  

Exception trace:
  at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:117
 PDOStatement->execute() at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php:117
 Doctrine\DBAL\Driver\PDOStatement->execute() at /var/www/html/owncloud/lib/composer/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:1304
 Doctrine\DBAL\Connection->executeQuery() at /var/www/html/owncloud/lib/private/DB/Connection.php:191
 OC\DB\Connection->executeQuery() at /var/www/html/owncloud/lib/private/PreviewCleanup.php:145
 OC\PreviewCleanup->queryPreviewsToDelete() at /var/www/html/owncloud/lib/private/PreviewCleanup.php:53
 OC\PreviewCleanup->process() at /var/www/html/owncloud/core/Command/Previews/Cleanup.php:72
 OC\Core\Command\Previews\Cleanup->execute() at /var/www/html/owncloud/lib/composer/symfony/console/Command/Command.php:298
 Symfony\Component\Console\Command\Command->run() at /var/www/html/owncloud/core/Command/Base.php:159
 OC\Core\Command\Base->run() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:1040
 Symfony\Component\Console\Application->doRunCommand() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:301
 Symfony\Component\Console\Application->doRun() at /var/www/html/owncloud/lib/composer/symfony/console/Application.php:171
 Symfony\Component\Console\Application->run() at /var/www/html/owncloud/lib/private/Console/Application.php:165
 OC\Console\Application->run() at /var/www/html/owncloud/console.php:94
 require_once() at /var/www/html/owncloud/occ:11

previews:cleanup [--output [OUTPUT]] [--all] [--] [<chunk_size>]

from core.

pako81 avatar pako81 commented on June 3, 2024

Maybe related to the cast done around https://github.com/owncloud/core/blob/master/lib/private/PreviewCleanup.php#L105-L111

from core.

pako81 avatar pako81 commented on June 3, 2024

@jvillafanez following seems to solve the issue for me: https://gist.github.com/pako81/bbd7277130947becbe0bf0739ae3a31d#file-previewcleanup-php-L137-L159. But maybe there is a more efficient way to achieve this without the need to rewrite the query for PostgreSQL only.

from core.

DeepDiver1975 avatar DeepDiver1975 commented on June 3, 2024

There is not really an issue in having multiple dedicated queries for different dbms - at least in this highly specific case where the query has some complexity.

This is also why I originally did not use the query builder as this adds a hell of complexity to maintain it.

Having just two or three explicit strings holding the query is better for maintenance.

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

We could check the performance of the following query:

SELECT * FROM (
  SELECT `fileid`, `name`, `storage`
  FROM `oc_filecache` `thumb`
  WHERE `parent` IN (
    SELECT `fileid`
    FROM `oc_filecache`
    WHERE `storage` IN (
      SELECT `numeric_id`
      FROM `oc_storages`
      WHERE `id` LIKE 'home::%' OR `id` LIKE 'object::user:%'
    )
    AND `path_hash` = '3b8779ba05b8f0aed49650f3ff8beb4b'
  )
) `t`
WHERE NOT EXISTS (
  SELECT 1
  FROM `oc_filecache`
  WHERE `fileid` = CAST(`t`.`name` AS SIGNED)
)
AND `fileid` > 0
ORDER BY `storage`;

the performance should be fine (checked with mysql)

+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table        | partitions | type   | possible_keys                                                                        | key                  | key_len | ref                           | rows | filtered | Extra                                                     |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | oc_filecache | NULL       | index  | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_path_hash | 134     | NULL                          |   10 |    10.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | oc_storages  | NULL       | eq_ref | PRIMARY,storages_id_index                                                            | PRIMARY              | 4       | owncloud.oc_filecache.storage |    1 |   100.00 | Using where                                               |
|  1 | SIMPLE      | thumb        | NULL       | ref    | PRIMARY,fs_parent_name_hash,fs_parent_storage_size                                   | fs_parent_name_hash  | 8       | owncloud.oc_filecache.fileid  |    1 |   100.00 | Using index condition                                     |
|  1 | SIMPLE      | oc_filecache | NULL       | eq_ref | PRIMARY                                                                              | PRIMARY              | 8       | func                          |    1 |   100.00 | Using where; Not exists; Using index                      |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------------------------------+----------------------+---------+-------------------------------+------+----------+-----------------------------------------------------------+
4 rows in set, 2 warnings (0.01 sec)

That should also work for postgresql I think

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

#41047 should fix the issue. I haven't tested with postgresql yet, but the CI passes

from core.

pako81 avatar pako81 commented on June 3, 2024

Unfortunately, still reproducible with #41047 in place.

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

Yes, it seems postgresql still changes the query plan even for the new query....

owncloud=# explain SELECT * FROM (
  SELECT "fileid", "name", "storage"
  FROM "oc_filecache" "thumb"
  WHERE "parent" IN (
    SELECT "fileid"
    FROM "oc_filecache"
    WHERE "storage" IN (
      SELECT "numeric_id"
      FROM "oc_storages"
      WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
    )
    AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
  )
) "t"
WHERE NOT EXISTS (
  SELECT 1
  FROM "oc_filecache"
  WHERE "fileid" = CAST("t"."name" AS BIGINT)
)
AND "fileid" > 0
ORDER BY "storage";
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Sort  (cost=38.13..38.14 rows=1 width=528)
   Sort Key: thumb.storage
   ->  Nested Loop Anti Join  (cost=26.98..38.12 rows=1 width=528)
         ->  Nested Loop  (cost=26.84..34.88 rows=1 width=528)
               ->  HashAggregate  (cost=26.70..26.71 rows=1 width=8)
                     Group Key: oc_filecache_1.fileid
                     ->  Nested Loop Semi Join  (cost=0.15..26.69 rows=1 width=8)
                           ->  Seq Scan on oc_filecache oc_filecache_1  (cost=0.00..10.50 rows=1 width=12)
                                 Filter: ((path_hash)::text = '3b8779ba05b8f0aed49650f3ff8beb4b'::text)
                           ->  Index Scan using oc_storages_pkey on oc_storages  (cost=0.15..8.17 rows=1 width=4)
                                 Index Cond: (numeric_id = oc_filecache_1.storage)
                                 Filter: (((id)::text ~~ 'home::%'::text) OR ((id)::text ~~ 'object::user:%'::text))
               ->  Index Scan using fs_parent_storage_size on oc_filecache thumb  (cost=0.14..8.16 rows=1 width=536)
                     Index Cond: (parent = oc_filecache_1.fileid)
                     Filter: (fileid > 0)
         ->  Index Only Scan using oc_filecache_pkey on oc_filecache  (cost=0.14..3.24 rows=1 width=8)
               Index Cond: (fileid = (thumb.name)::bigint)
(17 rows)

owncloud=# explain SELECT * FROM (
  SELECT "fileid", "name", "storage"
  FROM "oc_filecache" "thumb"
  WHERE "parent" IN (
    SELECT "fileid"
    FROM "oc_filecache"
    WHERE "storage" IN (
      SELECT "numeric_id"
      FROM "oc_storages"
      WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
    )                                                 
    AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
  )             
) "t"
WHERE NOT EXISTS (
  SELECT 1                                       
  FROM "oc_filecache"
  WHERE "fileid" = CAST("t"."name" AS BIGINT)
)                  
AND "fileid" > 0
ORDER BY "storage";
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Sort  (cost=12.18..12.18 rows=1 width=20)
   Sort Key: thumb.storage
   ->  Nested Loop Semi Join  (cost=1.55..12.17 rows=1 width=20)
         Join Filter: (thumb.parent = oc_filecache_1.fileid)
         ->  Hash Anti Join  (cost=1.41..2.74 rows=1 width=28)
               Hash Cond: ((thumb.name)::bigint = oc_filecache.fileid)
               ->  Seq Scan on oc_filecache thumb  (cost=0.00..1.23 rows=18 width=28)
                     Filter: (fileid > 0)
               ->  Hash  (cost=1.18..1.18 rows=18 width=8)
                     ->  Seq Scan on oc_filecache  (cost=0.00..1.18 rows=18 width=8)
         ->  Materialize  (cost=0.15..9.41 rows=1 width=8)
               ->  Nested Loop Semi Join  (cost=0.15..9.41 rows=1 width=8)
                     ->  Seq Scan on oc_filecache oc_filecache_1  (cost=0.00..1.23 rows=1 width=12)
                           Filter: ((path_hash)::text = '3b8779ba05b8f0aed49650f3ff8beb4b'::text)
                     ->  Index Scan using oc_storages_pkey on oc_storages  (cost=0.15..8.17 rows=1 width=4)
                           Index Cond: (numeric_id = oc_filecache_1.storage)
                           Filter: (((id)::text ~~ 'home::%'::text) OR ((id)::text ~~ 'object::user:%'::text))
(17 rows)

I'll close the PR.

I'm checking "common table expresions" as an alternative

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

Let's see if #41048 is more explicit and works on all the DBs

from core.

pako81 avatar pako81 commented on June 3, 2024

It seems even with #41048 issue is still there. At least manual tests seem to indicate so.

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

It seems even with #41048 issue is still there. At least manual tests seem to indicate so.

Works for me, and the tests pass (except for mysql 5, which doesn't have the with clause). Is it reproducible?
I'm out of ideas. I guess we'll have to write a query specifically for postgresql.

The weird thing is that it could also happen with the rest of the DBs. I mean, they could optimize the query the same way as postgresql; whether it triggers an error or does another thing, I don't know.

Maybe the best option is #41047 + custom query for postgresql. Or we could keep the current code + the custom query for postgresql, assuming the DB won't do anything weird with the query.

from core.

pako81 avatar pako81 commented on June 3, 2024

Works for me, and the tests pass (except for mysql 5, which doesn't have the with clause). Is it reproducible?

I was able to reproduce it by following the steps from Scenario 1 at #41040 (comment). Note: it may not be reproducible at first shot.

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

Yes, confirmed. You might need to have multiple users / storages so that the DB chooses to use the "not exists" condition first because it might cost less. I'll close that PR too.

I'll check support for regexp in all DBs

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

We can try with something like (we'll need to adjust the type of the cast for each DB)

SELECT "fileid", "name", "storage"
FROM "oc_filecache" "thumb" 
WHERE "parent" IN (
  SELECT "fileid"
  FROM "oc_filecache"
  WHERE "storage" IN (
    SELECT "numeric_id"
    FROM "oc_storages"
    WHERE "id" LIKE 'home::%' OR "id" LIKE 'object::user:%'
  )
  AND "path_hash" = '3b8779ba05b8f0aed49650f3ff8beb4b'
)
AND NOT EXISTS (
  SELECT 1
  FROM "oc_filecache"
  WHERE "fileid" = CAST(REGEXP_REPLACE("thumb"."name", '(^.*[^[:digit:]]+.*$|^$)', CAST("thumb"."fileid" as VARCHAR(250))) as BIGINT)
)
AND "fileid" > 0
ORDER BY "storage"

I think the regexp_replace function is available in mysql, mariadb, postgresql and oracle. For sqlite3 we'll need to load the regexp extension from https://github.com/nalgeon/sqlean . We'll have to add some basic support to load sqlite3 extensions (this will require additional documentation)

As far as I know, the execution order of the filters depend on the query planner of each DB, and I don't think we have any control over it. This means that what is happening with postgresql could happen with any other DB. A defensive approach like the above should prevent problems

from core.

jvillafanez avatar jvillafanez commented on June 3, 2024

Let's see if the third time is the charm...

#41051 works for me, and tests pass.

Taking into account that casting a random string to an integer has an undefined / undocumented behavior for most of the DBs (at least it's difficult to find the info), we'll go through the "regexp" route by default to ensure we can cast the value. For mysql 5 and sqlite, which don't have the regexp_replace function, we'll use a different approach taking advantage of the behavior they have when casting a random string.

Explanation for the tricks used is in the code, and it should be clear enough

from core.

pako81 avatar pako81 commented on June 3, 2024

confirmed to work: exceptions are not triggered anymore 👍

from core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.