flowpack / flowpack.elasticsearch.contentrepositoryqueueindexer Goto Github PK
View Code? Open in Web Editor NEWNeos CMS ElasticSearch indexer based on the Flowpack JobQueue (to handle big indexing tags, +50'000 nodes)
License: MIT License
Neos CMS ElasticSearch indexer based on the Flowpack JobQueue (to handle big indexing tags, +50'000 nodes)
License: MIT License
Hello,
we are using your supervisord config and getting a lot of errors like this:
Cannot delete job 88: NOT_FOUND
Type: Pheanstalk\Exception\ServerException
File: Packages/Libraries/pda/pheanstalk/src/Command/DeleteCommand.php
Line: 44
Seems like the daemons (job:work) are working on the same job and not on 12 diffrent job. Is this true? Any way to solve this? Because it doesn't scale with multiple jobs like it should be.
If something fails while indexing, e.g. due to an failed flowQuery in the indexing, the complete batch fails. At the end, the index is switched, regardless if / how many batches failed.
We need to avoid switching the index if more than a threshold of batches failed.
Unfortunately there is a bug due to the getResponse() returning a psr ResponseInterface into json_decode instead of a string as expected by json_decode.
This commits 464e2ab adds the NodeTypeMappingBuilderInterface to an upcoming 5.0 release for CR-Adaptor. But it is already released in 3.1.0 wich is compativle to CR-Adaptor 4.0 and the Interface does not exist there. So this breaks with
The object "Flowpack\ElasticSearch\ContentRepositoryAdaptor\Driver\NodeTypeMappingBuilderInterface" which was specified as a property in the object configuration of object "Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\Command\NodeIndexQueueCommandController" (automatically registered class) does not exist.
As pointed out in passing in #15 the batch size of 500 breaks (initial) indexing for me. The job payload simply is so big that it causes "argument list too long" when the jobs are passed to a command run (due to executeIsolated
being true, for good reasons) as a shell command.
For me it works with 150, but that could be different depending on the shell in use and other factors.
I'd suggest to use a rather "conservative" (low) default and suggest to try and raise it if needed.
Hi!
I was trying to use this package, but there was an error.
It seems that TYPO3\Jobqueue
was renamed to Flowpack\JobQueue
?
I had to change some lines and after that it worked.
We are using a custom NodeIndexer which extends from Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer
which was kindly provided by @dfeyer.
This NodeIndexer uses an Objects.yaml
definition like the Flowpack.ElasticSearch.ContentRepositoryQueueIndexer
does.
If I understand the code correctly, there is currently no way to use the ContentRepositoryQueueIndexer if a custom NodeIndexer is provided, because the
AbstractIndexingJob
use Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer;
, right?
So I would request a change to allow a custom NodeIndexer.
I would like to provide the code changes by myself, but I'm not yet very confident with neos/flow, such that I see a good/simple solution.
Maybe someone could provide basic ideas and what must be changed to support this use case or give me a hint how I need to modify our own project to inject the custom indexer if this feature might work already.
The FakeNodeDataFactory creates "fake Nodes" which then get automagically persisted nodData table due to calls to setProperty()
.
When a change is saved to a user workspace, the live indexing queue is filled. When the created job is being worked on, an exception is raised:
Exception in line 403 of /var/www/cms/Packages/Libraries/doctrine/orm/lib/Doctrine/ORM/EntityManager.php: The identifier name is missing for a query of Neos\ContentRepository\Domain\Model\Workspace - See also: 20170518132608fbe4ff.txt
This is the trace:
Exception in line 403 of /var/www/cms/Packages/Libraries/doctrine/orm/lib/Doctrine/ORM/EntityManager.php: The identifier name is missing for a query of Neos\ContentRepository\Domain\Model\Workspace
38 Doctrine\ORM\ORMException::missingIdentifierField("Neos\ContentRepository\Domain\Model\Workspace", "name")
37 Doctrine\ORM\EntityManager::find("Neos\ContentRepository\Domain\Model\Workspace", array|1|)
36 Neos\Flow\Persistence\Doctrine\PersistenceManager_Original::getObjectByIdentifier(NULL, "Neos\ContentRepository\Domain\Model\Workspace")
35 Neos\Flow\Persistence\Repository::findByIdentifier(NULL)
34 call_user_func_array(array|2|, array|1|)
33 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("findByIdentifier", array|1|)
32 Neos\ContentRepository\Domain\Service\Context_Original::getWorkspace()
31 Neos\ContentRepository\Domain\Service\Context_Original::getNodeByIdentifier("fa08cdd4-1ee4-3b63-0916-ff08239c5d89")
30 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\{closure}(Neos\ContentRepository\Domain\Model\Node, Neos\ContentRepository\Domain\Service\Context)
29 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::indexNode(Neos\ContentRepository\Domain\Model\Node)
28 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\{closure}()
27 Closure::__invoke()
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::withBulkProcessing(Closure)
25 call_user_func_array(array|2|, array|1|)
24 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("withBulkProcessing", array|1|)
23 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::execute(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
22 Flowpack\JobQueue\Common\Job\JobManager_Original::executeJobForMessage(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
21 call_user_func_array(array|2|, array|2|)
20 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("executeJobForMessage", array|2|)
19 Flowpack\JobQueue\Common\Command\JobCommandController_Original::executeCommand(Flowpack\JobQueue\Redis\Queue\RedisQueue, "TzozODoiRmxvd3BhY2tcSm9iUXVldWVcQ29tbW9uXFF1ZXVlXE…19IjtzOjE5OiIAKgBudW1iZXJPZlJlbGVhc2VzIjtpOjA7fQ==")
18 call_user_func_array(array|2|, array|2|)
17 Neos\Flow\Cli\CommandController_Original::callCommandMethod()
16 Neos\Flow\Cli\CommandController_Original::processRequest(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
15 Neos\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
14 Neos\Flow\Mvc\Dispatcher_Original::dispatch(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
13 Neos\Flow\Cli\CommandRequestHandler::Neos\Flow\Cli\{closure}()
12 Closure::__invoke()
11 Neos\Flow\Security\Context_Original::withoutAuthorizationChecks(Closure)
10 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
9 call_user_func_array(array|2|, array|1|)
8 Neos\Flow\Security\Context::Flow_Aop_Proxy_invokeJoinPoint(Neos\Flow\Aop\JoinPoint)
7 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
6 Neos\Flow\Session\Aspect\LazyLoadingAspect_Original::callMethodOnOriginalSessionObject(Neos\Flow\Aop\JoinPoint)
5 Neos\Flow\Aop\Advice\AroundAdvice::invoke(Neos\Flow\Aop\JoinPoint)
4 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
3 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
2 Neos\Flow\Cli\CommandRequestHandler::handleRequest()
1 Neos\Flow\Core\Bootstrap::run()
My guess: the target workspace can be null and is passed as is to the job in https://github.com/ttreeagency/Flowpack.ElasticSearch.ContentRepositoryQueueIndexer/blob/3.0/Classes/Flowpack/ElasticSearch/ContentRepositoryQueueIndexer/Indexer/NodeIndexer.php#L45 which later leads to the problem.
When using the presets
option of Flowpack.JobQueue.Common to define queue settings, the className
has to be given, because of the way precedence works (see getQueueSettings()
in QueueManager
.)
If the ContentRepositoryQueueIndexer would use presets directly, this would no longer be needed.
When using the CLI command to index a node:
./flow flowpack.elasticsearch.contentrepositoryadaptor:nodeindex:indexnode --workspace live --identifier da1b02a9-cafd-487c-c78f-5c714996df8c
everything is logged as successful, but no actual index update happens. The currentBulkRequest
is always empty, even though it is filled before, as can be seen by the log messages.
This happens only with enableLiveAsyncIndexing
set to false
, in the default state (async indexing enabled) it works as expected.
REASON
The Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer
is a singleton, but the Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\Indexer\NodeIndexer
is not.
Hi,
we just upgraded a site from Neos 4.x to Flow/Neos 7.x and were indexing nodes using the DoctrineQueue with:
./flow job:work Flowpack.ElasticSearch.ContentRepositoryQueueIndexer.Live --verbose
We see the indexing is being done when we edit nodes in the backend, and also indexed is triggered when we later publish the same node.
The problem: only the first change is stored in ES, all further changes to the node (i.e. the "publish") are not being reflected into the ES index. Just try it yourself, change the title of a node from "node1" to "node2", publish that, and then change it to "node3" etc and see what is really written in the ES.
Could you reproduce that, is it a bug? It worked before with the same setup.
It looks like there is some caching going on, and the problem only happens in the endless PHP runner (job:work
without a --limit
). Our workaround is thus currently:
while true; do ./flow job:work Flowpack.ElasticSearch.ContentRepositoryQueueIndexer.Live --limit 1 ; done
so that every indexing starts a fresh new PHP process.
I think it would be nice if the queuename was configurable.
I am using this package with the Doctrine Backend and Doctrine did not like the default queuename Flowpack.ElasticSearch.ContentRepositoryQueueIndexer
. The dots mixed up the SQL.
When one ore more nodes are deleted and live indexing runs, an exception is raised:
Exception: Argument 1 passed to Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData() must be an instance of Neos\ContentRepository\Domain\Model\NodeData, null given
I suspect, "$nodeData" (https://github.com/ttreeagency/Flowpack.ElasticSearch.ContentRepositoryQueueIndexer/blob/3.0/Classes/Flowpack/ElasticSearch/ContentRepositoryQueueIndexer/IndexingJob.php#L102) is null because they deleted node is no longer found in the repository.
Trace:
Exception: Argument 1 passed to Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData() must be an instance of Neos\ContentRepository\Domain\Model\NodeData, null given
31 Neos\ContentRepository\Domain\Factory\NodeFactory_Original::createFromNodeData(NULL, Neos\Neos\Domain\Service\ContentContext)
30 call_user_func_array(array|2|, array|2|)
29 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("createFromNodeData", array|2|)
28 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\{closure}()
27 Closure::__invoke()
26 Flowpack\ElasticSearch\ContentRepositoryAdaptor\Indexer\NodeIndexer_Original::withBulkProcessing(Closure)
25 call_user_func_array(array|2|, array|1|)
24 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("withBulkProcessing", array|1|)
23 Flowpack\ElasticSearch\ContentRepositoryQueueIndexer\IndexingJob_Original::execute(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
22 Flowpack\JobQueue\Common\Job\JobManager_Original::executeJobForMessage(Flowpack\JobQueue\Redis\Queue\RedisQueue, Flowpack\JobQueue\Common\Queue\Message)
21 call_user_func_array(array|2|, array|2|)
20 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("executeJobForMessage", array|2|)
19 Flowpack\JobQueue\Common\Command\JobCommandController_Original::executeCommand(Flowpack\JobQueue\Redis\Queue\RedisQueue, "TzozODoiRmxvd3BhY2tcSm9iUXVldWVcQ29tbW9uXFF1ZXVlXE…19IjtzOjE5OiIAKgBudW1iZXJPZlJlbGVhc2VzIjtpOjA7fQ==")
18 call_user_func_array(array|2|, array|2|)
17 Neos\Flow\Cli\CommandController_Original::callCommandMethod()
16 Neos\Flow\Cli\CommandController_Original::processRequest(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
15 Neos\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
14 Neos\Flow\Mvc\Dispatcher_Original::dispatch(Neos\Flow\Cli\Request, Neos\Flow\Cli\Response)
13 Neos\Flow\Cli\CommandRequestHandler::Neos\Flow\Cli\{closure}()
12 Closure::__invoke()
11 Neos\Flow\Security\Context_Original::withoutAuthorizationChecks(Closure)
10 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
9 call_user_func_array(array|2|, array|1|)
8 Neos\Flow\Security\Context::Flow_Aop_Proxy_invokeJoinPoint(Neos\Flow\Aop\JoinPoint)
7 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
6 Neos\Flow\Session\Aspect\LazyLoadingAspect_Original::callMethodOnOriginalSessionObject(Neos\Flow\Aop\JoinPoint)
5 Neos\Flow\Aop\Advice\AroundAdvice::invoke(Neos\Flow\Aop\JoinPoint)
4 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
3 Neos\Flow\Security\Context::withoutAuthorizationChecks(Closure)
2 Neos\Flow\Cli\CommandRequestHandler::handleRequest()
1 Neos\Flow\Core\Bootstrap::run()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.