Comments (14)
There is documentation at http://mongodb.github.io/mongo-php-library/api/class-MongoDB.Collection.html#_count, although that's not particularly useful or extensive.
The original driver's count() method, does in fact, not do something with a cursor. It constructs the original query, and then runs that with a count command. In fact, this is what happens under the hood (old driver);
$filter = [ 'name' => 'Derick' ];
$cursor = $collect->find( $filter ); // nothing happens right now
foreach ( $cursor as $record ) { }; // upon foreach, the query gets executed
echo $cursor->count();
/* original query is wrapped in command, which then is executed and its result returned. Under the hood, it would run: */
$collection->db->command( [ 'count' => $collection->db->getName(), 'query' => $filter ] );
In the new driver, this behaviour is different:
$filter = [ 'name' => 'Derick' ];
$q = new \MongoDB\Driver\Query( $filter );
$r = $manager->executeQuery( "{$dbname}.{$collectionname}", $q ); // query is already executed here
/* To count, you'd simply construct the command yourself: */
$cmd = new \MongoDB\Driver\Command( [ 'count' => $collectionname, 'query' => $filter ] );
$r = $manager->executeCommand( $dbname, $cmd );
I also wrote about this in an article: https://derickrethans.nl/new-drivers-part3-cursor.html
from mongo-php-driver.
This was previously asked in #178. In the new driver, a Cursor object now represents the actual results of an executed command or query, whereas cursors in the legacy driver had a dual-nature (pre- or post-executed). The legacy MongoCursor::count()
method actually invoked a count command with the same criteria; however, this meant it was entirely possible for the count command's calculation to differ from the actual query results, since they were not necessarily operating on the same result data.
The userland library implements a count()
method on the collection, which also takes filter criteria, and we'd rather encourage that so users understand that the operations are separate. If we do end up making Cursor countable in the userland library, we would use iterator_count()
to provide an exact count on the results; however, this will also require us to cache the results so that users can rewind the cursor and iterate again (this was most recently discussed in PHPLIB-81). While the legacy driver would re-execute the query on a rewind, that's also not an option given the nature of a Cursor in the new driver.
I realize this may come across as an inconvenience, but I believe the explicit API is a ultimately a step forward from the legacy driver, where many users might have been oblivious to what was actually happening under the hood. Feel free to follow up if you have additional questions.
from mongo-php-driver.
To query by index ranges, would I have to add an extra indexed, auto-incrementing id field with predictable page boundaries, or can I just use _id with limit?
Since the first four bytes of the 12-byte ObjectID is a timestamp, values are generated in increasing order and can be used for sorting. There should be no need for an extra field or index. See this Stack Overflow thread for more info.
from mongo-php-driver.
Closing this out as I believe it was answered, but feel free to follow up if you have additional questions.
from mongo-php-driver.
The reasoning for taking out count(), sort(), and limit() I get. However, those are a ton of breaking changes for moving to the new driver here on a large application. Lots of rewriting unfortunately.
My question, is where's an example of the new best practice for easily counting filtered results on http://mongodb.github.io/mongo-php-library/classes/collection/?
It states: "MongoDB\Collection is perhaps the most useful class in this library. It provides methods for common operations on a collection, such as inserting documents, querying, updating, counting, etc.". However, there is no example of counting.
Will you please provide the updated example using the code below of using MongoDB\Collection for counting?
$cursor = $collection->find($filter);
if ($cursor->count() > 0) {
}
It would be great on http://mongodb.github.io/mongo-php-library/ to see a complete list of breaking changes from the legacy mongo driver and new best practices code examples of their equivalents using the new mongodb driver and library.
from mongo-php-driver.
Ah, I didn't see https://derickrethans.nl/new-drivers-part3-cursor.html. That would have saved a lot of time. Having a breaking changes section on mongo-php-library would be helpful. My two cents there.
Would $count = $collection->count($filter); be the one liner of your example above? Seems like it works. The part I'm still confused about is $cursor->count() was removed because it could possibly be inaccurate, but we're still running the query twice, so what have we gained by not simply being able to call $cursor->count() other than pointing out the fact that it is not accurate?
from mongo-php-driver.
The part I'm still confused about is $cursor->count() was removed because it could possibly be inaccurate, but we're still running the query twice, so what have we gained by not simply being able to call $cursor->count() other than pointing out the fact that it is not accurate?
By removing the method from the Cursor class, we're forcing users to acknowledge that it is not a true count of the cursor's results. This goes hand in hand with eliminating the dual nature of the cursor (as MongoCursor had in the legacy extension) and its pre- and post-executed states -- along with the fact that rewind()
would re-execute the query. I'm convinced that avoiding misunderstandings of how the cursor operators is worth the extra line of code or two to invoke the count command through Collection.
The only accurate solution would be to use iterator_count()
; however, that requires traversing the iterator to compute the count. While you could do that, it would leave you unable to rewind and traverse the actual result data, which I doubt is very useful. This is because the Cursor is merely a handle on results from an executed query and can only move forward. Pending PHPLIB-81, we should have a mechanism in place to compose a Cursor object and allow its results to be cached so that users can rewind, count, etc. (and accurately on the very same data!).
from mongo-php-driver.
Does this mean the only way to iterate over a cursor, short of using toArray
, is to use a foreach
loop? I've got some code that was using a while ($result->count() > 0)
loop, which I cannot find a way to replicate. I thought iterator_count was the answer, but it throws
MongoDB\Driver\Exception\LogicException with message 'Cursors cannot yield multiple iterators'
.
from mongo-php-driver.
@hackel There are two ways to retrieve the count. Either run the same filter through a count query
$count = $collection->count($filter);
or increment a counter while iterating over the cursor
$cursor = $collection->find($filter);
$count = 0;
foreach ($cursor as $document) {
$count++;
}
I can't recall exactly how the old count()
method worked, but did that while
loop really work? I'm not so sure that it actually returned the number of documents remaining, which is how it must have worked for that loop to work as it should. But to answer your question, there are more ways of iterating over a cursor than to use a foreach
loop, but it's by far the simplest. Can I ask why you would want to do it in another way?
from mongo-php-driver.
I can't recall exactly how the old count() method worked, but did that while loop really work? I'm not so sure that it actually returned the number of documents remaining, which is how it must have worked for that loop to work as it should.
In the legacy driver, MongoCursor::count()
executed a count command with each invocation. It had no direct relation to the cursor's result set. I also don't understand how such a while
loop would have worked in practice, but I'd have to reserve judgement until seeing a full paste of code for it.
I cannot find a way to replicate. I thought iterator_count was the answer, but it throws MongoDB\Driver\Exception\LogicException with message 'Cursors cannot yield multiple iterators'
iterator_count()
traverses an iterator and and counts its elements, which means that you cannot use it with a MongoDB\Driver\Cursor
since those can only be iterated once and never rewound. The "iterator" referred to in the exception message is an internal PHP iterator (not to be confused with PHP's Iterator class), which Traversable objects expose for iterating via language constructs and internal functions.
We do have PHPLIB-81 open to track the idea of creating an Iterator class that could decorate a MongoDB\Driver\Cursor
and cache its results in memory. That would allow rewinding, counting (of the actual elements to be iterated), and movement in either direction across the result set.
from mongo-php-driver.
@lindelius It's from Laravel's chunk
method used to repeat a query over a large result set by paging through it. I was trying to replicate this outside of the jenssegers/laravel-mongodb library because it uses a ton of memory. It actually calls iterator_to_array
on the cursor first, which is why the loop works, but is not particularly efficient.
I ended up with this hacky solution, based on a stackoverflow post:
$cursor = $collection->find(...);
$it = new \IteratorIterator($cursor);
$it->rewind();
while ($it->valid()) {
if (call_user_func($callback, $it) === false) {
return false;
}
$page++;
$cursor = $collection->find(next $page);
$it = new \IteratorIterator($cursor);
$it->rewind();
}
I guess this is very similar to PHPLIB-81, but using the built-in IteratorIterator
class to decorate the cursor. I have no idea how the valid
method works, but so far it's getting the job done. It seems like CachingIterator::hasNext
would be more appropriate.
The only other way I can think of to do this would be to count the iterations in a foreach loop as you described, then exit the outer loop if the resulting count is less than the page size after finishing the inner loop. Either that or running one extra query at the start to get the document count.
I thought it might be possible to accomplish this with Mongo's batchSize
option, but as far as I can tell, this wouldn't actually work as it is internal to the driver, is that correct?
from mongo-php-driver.
Well, the first problem is that Laravel's Eloquent ORM is designed for SQL databases and the jenssegers/laravel-mongodb library is just a MongoDB wrapper for it. I refuse to use it for my projects (and I use Laravel and Lumen quite a lot) as it doesn't reflect how developers are supposed to work with MongoDB at all. I've created my own library (which wraps the official MongoDB PHP library) to give me the added functionality that I need for my projects. I'm actively using and updating it as I find more things that I need to add or change, so feel free to check it out and use it if it satisfies your needs (sorry for the plug).
When it comes to the chunking, I'm not sure what exactly you are using it for, but the results from the database are already chunked up into batches, which means that you aren't loading the entire result set in from start. Also, if you are intending to chunk the results by requesting, say 20 documents at a time, using new calls to find()
you can do that with a limit and skip instead of what you are doing now (not entirely sure what you are doing now, tbh). If you are expecting to work with big result sets you should also consider writing a custom chunk method for that collection and use something that can be found via an index, e.g. using ObjectID
ranges for chunking the result set, since the skip + limit method isn't ideal for huge collections.
from mongo-php-driver.
I guess this is very similar to PHPLIB-81, but using the built-in IteratorIterator class to decorate the cursor. I have no idea how the valid method works, but so far it's getting the job done. It seems like CachingIterator::hasNext would be more appropriate.
We already use IteratorIterator in the library to decorate a Cursor object (see: CollectionInfoCommandIterator). IteratorIterator is useful for decorating any Traversable and exposing methods to the underlying handlers (e.g. get current element/key, rewind, move forward). In this case, IteratorIterator::valid()
will directly call php_phongo_cursor_iterator_valid()
and return whether there is a current element. For a new Cursor, this requires us to attempt populating the first element, which is exactly what you achieve by calling IteratorIterator::rewind()
(i.e. php_phongo_cursor_iterator_rewind()
).
The downside of your approach is that you're executing one query for each page in sequence, which means more roundtrips and cursors/queries on the server side.
I thought it might be possible to accomplish this with Mongo's batchSize option, but as far as I can tell, this wouldn't actually work as it is internal to the driver, is that correct?
batchSize
is used to control how many documents are provided to the driver at a time over the wire protocol (see: find
command docs). For instance, a small batch size would entail more roundtrips to the database to iterate across a result set. This is largely transparent to the application.
If you are expecting to work with big result sets you should also consider writing a custom chunk method for that collection and use something that can be found via an index, e.g. using ObjectID ranges for chunking the result set, since the skip + limit method isn't ideal for huge collections.
Pagination via index ranges is absolutely more performant than using skip
to advance to later "pages" in the result set. You should certainly look into this approach.
from mongo-php-driver.
Thanks for the feedback, @lindelius and @jmikola. Sorry for hijacking this issue into a support request, that was not my intention!
@lindelius I completely agree, I had to learn the hard way about laravel-mongodb, and now it's too ingrained in this application to remove. Thanks for the plug, though! Your library looks intriguing. I'll definitely keep an eye on that for future projects.
What I'm doing now is using limit/skip, actually. The collection is around 120k now, so not exactly "huge," but may grow a bit. To query by index ranges, would I have to add an extra indexed, auto-incrementing id field with predictable page boundaries, or can I just use _id with limit? Sort by _id and query each page with a limit and {_id: {$gt: $last_id}}`? I'm currently not experiencing any performance issues from MongoDB with skip/limit—it's all on the PHP side, but I'd definitely like to know the best way to do it.
from mongo-php-driver.
Related Issues (20)
- l
- apache stop when I added extension=mongodb.so to php.ini HOT 3
- `selectServer` takes too long on official php base image (connections opened sequentially) HOT 13
- libbson Bug Report : variable ret value type error HOT 8
- No suitable servers found (`serverSelectionTryOnce` set) when rs.initiate with localhost HOT 5
- Extract mongodb-1.6.2.tgz error HOT 5
- Sporadic "Authentication failed" errors HOT 8
- MongoDB PHP Driver 1.15+ increases PHP memory consumption HOT 7
- Error while compiling on macOS HOT 7
- Major slowdown since version 1.17.0 HOT 8
- PHP Warning: PHP Startup: Unable to load dynamic library 'mongodb.so' HOT 2
- MongoDB reporting "AuthenticationAbandoned: Overridden by new authentication session" for a PHP client HOT 14
- Windows build after 1.13.0 HOT 4
- linux general protection fault , sorry for my poor english HOT 1
- PHP refusing to load MongoDB PHP driver on Windows (on one machine but not another) HOT 1
- Can't download desired extension HOT 2
- Unable to install pecl mongodb-1.15.1 in php 8.2 HOT 1
- Bug & Question: How to compile mongodb statically into PHP? HOT 9
- PHPStorm Stub for Manager class has outdated signature HOT 1
- Cannot install with laravel 11 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mongo-php-driver.