Code Monkey home page Code Monkey logo

pimcore-lucene-search's Introduction

Pimcore Lucene Search

lucenesearch crawler

Note

The Pimcore Lucene Search Bundle will be marked as abandoned as soon the Dynamic Search Bundle reached a stable state. After that, bugfixing will be supported in some cases. However, PRs are always welcome.

Requirements

  • Pimcore >= 5.8
  • Pimcore >= 6.0

Pimcore 4

Get the Pimcore4 Version here.

Installation

  1. Add code below to your composer.json
  2. Activate & install it through the ExtensionManager
"require" : {
    "dachcom-digital/lucene-search" : "~2.3.0"
}

Configuration

To enable LuceneSearch, add those lines to your AppBundle/Resources/config/pimcore/config.yml:

lucene_search:
    enabled: true

A complete setup could look like this:

lucene_search:
    enabled: true
    fuzzy_search_results: false
    search_suggestion: true
    seeds:
        - 'http://your-domain.dev'
    filter:
        valid_links:
            - '@^http://your-domain.dev.*@i'
    view:
        max_per_page: 10
    crawler:
        content_max_size: 4
        content_start_indicator: '<!-- main-content -->'
        content_end_indicator: '<!-- /main-content -->'

You need to add the config parameter to your config.yml to override the default values. Execute this command to get some information about all the config elements of LuceneSearch:

# configuration about all config parameters
$ bin/console config:dump-reference LuceneSearchBundle

# configuration info about the "fuzzy_search_results" parameter
$ bin/console config:dump-reference LuceneSearchBundle fuzzy_search_results

We also added a detailed documentation about all possible config values.

Features

  • Maintenance driven indexing
  • Auto Complete
  • Restricted Documents & Usergroups (member plugin recommended but not required)

Usage

Default
The crawler Engine will start automatically every night by default. Please check that the pimcore default maintenance script is properly installed.

Command Line Command
If you want to start the crawler manually, use this command:

$ php bin/console lucenesearch:crawl -f -v
command short command type description
force -f force crawler start sometimes the crawler stuck because of a critical error mostly triggered because of wrong configuration. use this command to force a restart
verbose -v show some logs good for debugging. you'll get some additional information about filtered and forbidden links while crawling.

Logs

You'll find some logs from the last crawl in your backend (at the bottom on the LuceneSearch settings page). Of course you'll also find some logs in your var/logs folder. Note: please enable the debug mode in pimcore settings to get all types of logs.

Further Information

Copyright and license

Copyright: DACHCOM.DIGITAL
For licensing details please visit LICENSE.md

Upgrade Info

Before updating, please check our upgrade notes!

pimcore-lucene-search's People

Contributors

aarongerig avatar alasdair-shields avatar cruiser13 avatar dpfaffenbauer avatar ktallafus avatar solverat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pimcore-lucene-search's Issues

Empty result list

Hi

We configured lucene search according to the documentation. The crawler has finished successfully and the search template was added with following code:

<?= $this->actions()->render(new \Symfony\Component\HttpKernel\Controller\ControllerReference('LuceneSearchBundle:List:getResult')); ?>

But when we add /search?p=test nothing is shown in the list. We only get empty divs generated by the lucene search layout.

Did we forgot something?

Our config is:

lucene_search:
    enabled: true
    sitemap:
        render: false
    fuzzy_search_results: false
    search_suggestion: true
    seeds:
        - 'http://domain-xy.com'
    filter:
        valid_links:
            - '@^http://domain-xy.com.*@i'
    view:
        max_per_page: 10
    crawler:
        content_max_size: 4
    locale:
        ignore_language: true

country based PDF indexing

if country restriction has been activated, no pdf will be shown up - it's not possible to detect.

Task I

Add assigned_country as asset property, to get associated country

Task II

Allow user to set asset free from country restriction: Update countryQuery and exclude assets from country restriction queries.

fresh install - didn't find config-file

This is the message:
Fatal error: Class 'LuceneSearch\Model\Configuration' not found in /var/www/plugins/LuceneSearch/controllers/FrontendController.php on line 48

Andreas

Sorry - installed it by composer and thought that plugin is installed by composer. But now I see that it is necessary to install it also by "activate extension" in pimcore backend.

[LuceneSearch2] Implement Category Based Search

  • respect categories meta tag, if available.
  • allow multiple categories
  • update readme.md and provide frontend category selection sample

Configuration

lucene_search:
    categories:
        - {id: 1, label: 'Category 1'}
        - {id: 2, label: 'Category 2'}

View

{% if lucene_search_crawler_active() %}
      <meta name="lucene-search:categories">1,2</head>
{% endif %}

New Install - Pimcore 5

I installed for Pimcore 5 using composer . Activated using extension manager
but can't get the crawler to start.

Keep getting error :
no valid task iterators defined!

I start the crawler with :
php bin/console lucenesearch:crawl -v

Any idea what could be going wrong ?

Thanks!

Use APPEND_FILE flag instead of re-writing the complete logfile

Is there a specific reason why you read the complete log file in order to add a line to it? This seems to be much overhead once the log file grows bigger - why not just something like:

$current = date('d.m.Y H:i') . '|' . $this->getRealLevel($level) . '|' . $message . "\n";
file_put_contents($file, $current, FILE_APPEND);

file_put_contents($file, $current);

allow custom meta data

Allow some custom meta data which should be indexed only in crawler mode.

if( \LuceneSearch\Tool\Request::isLuceneSearchCrawler() )
{
       $this->view->headMeta()->setName( 'lucene-search-meta', 'awesome special data' );
}

[LuceneSearch1] multiple start urls for crawling

Multiple Start Urls for crawling result in having only indexed the latest one. For example:

b2b.domain.com
www.domain.com
test.domain.com

LuceneSearch crawls these domains, but only indexes the last one.

Cannot start crawl without existing state.cnf

If there is no state.cnf, LuceneSearch fails with following error:

invalid state config slot "finished"

This is due to a missing state.cnf. But there is currently no state.cnf, because LuceneSearch support has just been deployed.

Ajax AutoComplete example

I followed the guide to implement the search page with ajax autoComplete
when ,I search, I get the following error :

Symfony\Component\Debug\Exception\ContextErrorException
in vendor\dachcom-digital\lucene-search\src\LuceneSearchBundle\Controller\AutoCompleteController.php (line 26)
foreach ($terms as $term) { $t = $term->text;

Any help ?

PDF Indexing Configuration

  • Add "enable/disable PDF indexing" to backend configuration
  • Add "PDF max file size for indexing" to backend configuration

[LuceneSearch2] refacture config style

use a more generic config style:

instead of

restriction:ignore: true
restriction:class: ~
restriction:method: ~

use

restriction:
    ignore: true
    class: ~
    method: ~

No database connection for the categories class

I need categories from the Pimcore objects. Unfortunately I don't have access to the class category.

AppBundle\LuceneSearch\Services\Categories

    public function getCategories() : array
    {
        $categoryArray = [];

        array_push($categoryArray, ['id' => 1, 'label' => 'Blog']);
        array_push($categoryArray, ['id' => 2, 'label' => 'Support']);
        array_push($categoryArray, ['id' => 3, 'label' => 'Project']);

        $catList = new DataObject\....\Listing(); // -> ERROR (Fatal error: Uncaught Symfony\Component\Debug\Exception\FatalThrowableError: Call to a member function get() on null in /var/www/html/pimcore/lib/Pimcore/Db.php on line 51) 
        foreach ($catList as $cat) {
            array_push($categoryArray, ['id' => $cat->getId(), 'label' => $cat->getName()]);
        }

        return $categoryArray;
    }

Allow Exclude Content Indicators

Allow to set some exclude tags. For example:

main-content: Keep Data in between!
main-content-exclude: Remove Data in between!

<!-- main-content-->
<span>Good Content! Vivamus magna justo, lacinia eget consectetur sed, convallis at tellus.</span>
<!-- main-content-exclude -->
<span class="boring-text">Boring! This will be excluded in search!</span>
<!-- /main-content-exclude -->
<span>Good Content! Vivamus magna justo, lacinia eget consectetur sed, convallis at tellus</span>
<!-- /main-content-->

Improvement: language and subsite separation.

Hello,

I was asking me if the plugin can handle subsite and langage separation (for search, autocomplete and suggestion), so it send only result for the langage or subsite ?
Is there any way to configure it like this ?

If not, maybe it could be improved to handle multi-site and multi-langage case.

Thanks again for this wonderfull plugin very complete (i really liked the auto sugestion : )) like Google)

[LuceneSearch2] replace zf1/zend-search-lucene

Tried already and had some conflicts with composer since pimcore requires
"zendframework/zend-servicemanager": "^3.2" which relies on the "zendframework/zend-stdlib": "^3.1" package.

Since the ZendSearch packages requires "zendframework/zend-stdlib": "2.*" we're going to run into some conflicts.

Crawl Error on fresh Lucene Install - Pimcore 5

Hi, first of all - sorry for re-opening but in my case it needs to be :( I'm running pimcore 5, everything works well and I installed lucene using the composer, activated at the extension manager and so on. I placed the lucene config from above example at src/Appbundle/Resources/config/pimcore/config.yml as suggested but if i start a manual crawl on the console I still having the Error "no valid task iterators defined!"

any ideas ? thanks in advance ! kind regards

[LuceneSearch2] generic restrictions

  • parser: check asset restriction with eventdispatcher
  • frontend: check query restriction with eventdispatcher and remove class/method config
  • config: if restriction.enabled = false:
    • do not add any restriction terms to lucene
    • skip event dispatching

demo regex wrong

the regex @^http://www\\.pimcore\\.org*@i is wrong, should be @^http://www\.pimcore\.org*@i

Having some issues with search form

The instructions state that I should setup a view script containing:
$this->action('find', 'frontend', 'LuceneSearch', array('viewScript' => 'frontend/find.php'));
This yields the following error:

script 'frontend/find.php' not found in path (/Users/patrick/Sites/externt/lucenetest/plugins/LuceneSearch/views/scripts/:/Users/patrick/Sites/externt/lucenetest/website/views/layouts/:/Users/patrick/Sites/externt/lucenetest/website/views/scripts/:./views/layouts/:./views/scripts/)

Just copying the contents of plugins/LuceneSearch/views/scripts/search/find.php won't do since it's useless without it's controller ..

What am I missing?

add more debug infos

Sometimes, its very hard to debug Lucene Search. Therefore it would be nice to have more debug informations like:

  • I am currently crawling this site: $url
  • I am denying this site: $url cause of this regex: $url
  • I added this site $site to the index with this $language/$country (or whatever we can print here)

Installing the bundle throws an exception in dev-Environment/debug mode

In order to install the Plugin, it must be enabled first. However - once enabled, the class DependencyInjection/LuceneSearchExtension.php expects the config file var/bundles/LuceneSearchBundle/config.yml to alreay exist. This yields a PHP warning, which in turn throws an exception in dev-Environment or when debug mode is enabled.

Steps to reproduce: (PIMCORE_ENVIRONMENT is "dev")

vagrant@develop:/workspace/test$ bin/console pimcore:bundle:enable LuceneSearchBundle
[OK] Bundle "LuceneSearchBundle\LuceneSearchBundle" was successfully enabled

vagrant@develop:/workspace/test$ bin/console pimcore:bundle:install LuceneSearchBundle
PHP Fatal error: Uncaught Symfony\Component\Debug\Exception\ContextErrorException: Warning: file_get_contents(/workspace/test/var/bundles/LuceneSearchBundle/config.yml): failed to open stream: No such file or directory in src/LuceneSearchBundle/DependencyInjection/LuceneSearchExtension.php:26

Proposed fix: Just suppressing the warning using the @ in front of file_get_contents:
$bundleConfig =Yaml::parse(@file_get_contents(BundleConfiguration::SYSTEM_CONFIG_FILE_PATH));

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.