Code Monkey home page Code Monkey logo

fts's Introduction

CUBA Full Text Search Add-on

license Build Status Documentation

Full Text Search Add-on provides unstructured search within the values of entity attributes and content of uploaded files.

For more information see github.com/cuba-platform/cuba.

Build and install

In order to build the add-on from source, you need to install the following:

Let's assume that you have cloned sources into the following directories:

work/
    cuba/
    cuba-gradle-plugin/
    fts/

Open terminal in the work directory and run the following command to build and install the plugin into your local Maven repository (~/.m2):

cd cuba-gradle-plugin
gradlew install

After that, go to the cuba directory and build and install it with the same command:

cd ../cuba
gradlew install

Finally, go to the fts directory and build and install it with the same command:

cd ../fts
gradlew install

fts's People

Contributors

alexbudarov avatar comru avatar daxzel avatar gavrilov-ivan avatar genapavlov avatar gglcrash avatar gorbunkov avatar jreznot avatar knstvk avatar sergey-sw avatar shakhovv avatar soraksh avatar tinhol avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sergeev-ms

fts's Issues

NPE when invoked reindexEntity method with empty entity name string

  1. Create the new project
  2. Enable FTS in the project
  3. Start the application server
  4. Go to Administration - JMX console
  5. Open fts bean
  6. Try to invoke java.lang.String asyncReindexEntity() or java.lang.String reindexEntity()
    AR : NPE
	at org.apache.lucene.util.BytesRef.<init>(BytesRef.java:87)
	at org.apache.lucene.index.Term.<init>(Term.java:65)
	at com.haulmont.fts.core.sys.LuceneIndexerBean.deleteDocumentsForEntity(LuceneIndexerBean.java:348)
	at com.haulmont.fts.core.app.FtsManager.deleteIndexForEntity(FtsManager.java:306)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy257.deleteIndexForEntity(Unknown Source)
	at com.haulmont.fts.core.jmx.FtsManager.reindexEntity(FtsManager.java:77)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
	at com.haulmont.cuba.security.sys.AuthenticationInterceptor.aroundInvoke(AuthenticationInterceptor.java:41)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
	at com.haulmont.cuba.core.sys.MBeanInterceptor.aroundInvoke(MBeanInterceptor.java:39)
	at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:629)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:618)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
	at com.sun.proxy.$Proxy260.reindexEntity(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
	at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
	at javax.management.modelmbean.RequiredModelMBean$4.run(RequiredModelMBean.java:1252)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80)
	at javax.management.modelmbean.RequiredModelMBean.invokeMethod(RequiredModelMBean.java:1246)
	at javax.management.modelmbean.RequiredModelMBean.invoke(RequiredModelMBean.java:1085)
	at org.springframework.jmx.export.SpringModelMBean.invoke(SpringModelMBean.java:90)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
	at com.haulmont.cuba.web.jmx.JmxControlBean.lambda$invokeOperation$7(JmxControlBean.java:374)
	at com.haulmont.cuba.web.jmx.JmxConnectionHelper.withConnection(JmxConnectionHelper.java:106)
	at com.haulmont.cuba.web.jmx.JmxControlBean.invokeOperation(JmxControlBean.java:363)
	at com.haulmont.cuba.web.app.ui.jmxcontrol.inspect.MbeanInspectWindow$2.run(MbeanInspectWindow.java:259)
	at com.haulmont.cuba.gui.backgroundwork.LocalizedTaskWrapper.run(LocalizedTaskWrapper.java:54)
	at com.haulmont.cuba.web.gui.executors.impl.WebBackgroundWorker$WebTaskExecutor.call(WebBackgroundWorker.java:202)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.haulmont.cuba.web.gui.executors.impl.WebBackgroundWorker$WebTaskExecutor.lambda$startExecution$1(WebBackgroundWorker.java:359)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)```

!image.png!
!image2.png!

---
Original issue: https://youtrack.haulmont.com/issue/PL-10590

NPE while indexing entity which contains collection with NULL element

Environment

  • Platform version: 6.8.9

Description of the bug or enhancement

  • Minimal reproducible example
    fts_metaproperty.zip
    Create entity with transient collection metaproperty. Collection must contain NULL values. Try to index entity.
    Error occurs:
Caused by: java.lang.NullPointerException: null
    at com.haulmont.fts.core.sys.LuceneIndexer.addLinkedPropertyEx(LuceneIndexer.java:319)
    at com.haulmont.fts.core.sys.LuceneIndexer.createLinksFieldContent(LuceneIndexer.java:291)
    at com.haulmont.fts.core.sys.LuceneIndexer.indexEntity(LuceneIndexer.java:153)
    at com.haulmont.fts.core.app.FtsManager.initIndexer(FtsManager.java:291)
    at com.haulmont.fts.core.app.FtsManager.processQueue(FtsManager.java:231)
    at sun.reflect.GeneratedMethodAccessor1097.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
    at com.sun.proxy.$Proxy176.processQueue(Unknown Source)
    ... 10 common frames omitted

Optimize FTS search performance

FtsServiceBean.makeSearchResult uses the DataManager for reloading each entity of the Lucene search result. This is done to check security constraints. A new query in a new transaction is executed for each entity. We can execute query not for the single, but for the batch of entities.


Original issue: https://youtrack.haulmont.com/issue/PL-10748

Postgres hangs during FtsManager - asyncReindexAll for a large number of entities

Shortly
In Sherlock we have > 500000 of entities (address database) for FTS indexing.
When I try to reindex the whole queue with FtsManager.asyncReindexAll on local machine - it does not work. One of iterations hangs somewhere in the end of the reindexing.
The problem is the same in cuba 5.6.

How to reproduce

  1. Open refapp master

  2. Enable scheduling:

cuba.schedulingActive = true

  1. Create sample data
insert into sec_user (id, version, create_ts, created_by, update_ts, login, login_lc, name, email, group_id)
select newid(), 1, now(), 'admin', now(), 
'us' ||  t, 'us' ||  t, 
'Name ' || t, 'user' || t || '@example.com', '0fa2b1a5-1d68-4d69-9fbd-dff348347f93'
from generate_series(1,500000) t
  1. Create scheduled task. cuba_FtsManager - reindexNextBatch. Period = 5, Timeout = 3600. Activate it.

  2. Jmx Console -> FtsManager -> app-core.fts:type=FtsManager -> asyncReindexAll

  3. Activate scheduled task.
    Go to Scheduled tasks -> our task -> Executions
    and wait.

  4. One of iterations hangs when number of elements in the fts queue reaches 420000.
    See screenshot.

Check current queries:

refapp_6=# select state, (now() - query_start) as "running for", query from pg_stat_activity where state <> 'idle';
 active | 00:21:10.972806 | SELECT t0.ID AS a1 FROM SEC_USER t0 WHERE (t0.ID NOT IN (SELECT t1.ENTITY_ID FROM SYS_FTS_QUEUE t1 WHERE (t1.ENTITY_NAME = $1)) AND (t0.DELETE_TS IS NULL)) LIMIT $2 OFFSET $3

Look at query execution plan.

refapp_6=# explain
refapp_6-# SELECT t0.ID AS a1 FROM SEC_USER t0 WHERE (t0.ID NOT IN (SELECT t1.ENTITY_ID FROM SYS_FTS_QUEUE t1 
refapp_6(# WHERE (t1.ENTITY_NAME = 'sec$User')) AND (t0.DELETE_TS IS NULL)) LIMIT 5000 OFFSET 0
refapp_6-# ;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Limit  (cost=0.00..84585317.12 rows=5000 width=16)
   ->  Seq Scan on sec_user t0  (cost=0.00..4228453836.80 rows=249952 width=16)
         Filter: ((delete_ts IS NULL) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Materialize  (cost=0.00..15867.00 rows=420000 width=16)
                 ->  Seq Scan on sys_fts_queue t1  (cost=0.00..11716.00 rows=420000 width=16)
                       Filter: ((entity_name)::text = 'sec$User'::text)
(7 rows)

The cost is... terrible. Looks like it is a Postgres query planner bug?

Just for curiosity, try to rewrite SQL query to use a left join:

refapp_6=# explain analyze
refapp_6-# SELECT t0.ID AS a1 FROM SEC_USER t0
refapp_6-# left join sys_fts_queue t1 on (t1.ENTITY_NAME = 'sec$User' and t0.id = t1.entity_id)
refapp_6-# where t1.id is null and t0.delete_ts is null
refapp_6-# limit 5000
refapp_6-# ;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19838.00..48269.68 rows=1 width=16) (actual time=462.856..467.726 rows=5000 loops=1)
   ->  Hash Left Join  (cost=19838.00..48269.68 rows=1 width=16) (actual time=462.854..467.376 rows=5000 loops=1)
         Hash Cond: (t0.id = t1.entity_id)
         Filter: (t1.id IS NULL)
         Rows Removed by Filter: 209373
         ->  Seq Scan on sec_user t0  (cost=0.00..14603.04 rows=499904 width=16) (actual time=0.007..80.960 rows=430064 loops=1)
               Filter: (delete_ts IS NULL)
         ->  Hash  (cost=11716.00..11716.00 rows=420000 width=32) (actual time=218.475..218.475 rows=420000 loops=1)
               Buckets: 262144  Batches: 2  Memory Usage: 15134kB
               ->  Seq Scan on sys_fts_queue t1  (cost=0.00..11716.00 rows=420000 width=32) (actual time=0.012..107.442 rows=420000 loops=1)
                     Filter: ((entity_name)::text = 'sec$User'::text)
 Planning time: 0.207 ms
 Execution time: 471.037 ms
(13 rows)

The plan is much better, and the query executes quickly.

DB parameters

budarov@budarov:~$ dpkg -l | grep postgresql
ii  postgresql-9.5                              9.5.7-1.pgdg16.04+1                           amd64        object-relational SQL database, version 9.5 server

postgres config:

shared_buffers = 1200MB			# min 128kB
temp_buffers = 48MB			# min 800kB
work_mem = 16MB				# min 64kB
maintenance_work_mem = 512MB		# min 1MB
#effective_cache_size = 4GB


Original issue: https://youtrack.haulmont.com/issue/PL-9480

JVM craches with MMAPDirectory fts mapping and network ftsIndex folder and locked with FSDirectory

Conditions:

  1. OS Windows
  2. Connection with findex folder must be unstable.

Actions:

  1. Move findex directory to network folder and setup server to this folder.
  2. Add entities in system.
  3. Run reindex or search file.
    If fts engine has begun search in folder but does not finish yet, and connection has been lost, then work of jvm is terminated.

Error is in attachment.

If change MMAPDirectory to NIOFSDirectory or FSDirectory then work of jvm is not terminated, but index will be locked because file write.lock has not been removed.


Original issue: https://youtrack.haulmont.com/issue/PL-9921

FTS does not work in projects migrated from 6.5 or earlier

See: https://www.cuba-platform.com/discuss/t/demo-not-woking-fts/6163

Environment

  • Platform version: 6.9.5

Description of the bug or enhancement

  • Minimal reproducible example
  1. Create a project on the platform version 6.5
  2. Enable FTS
  3. ReindexAll and ProcessQueue
  4. Migrate the project to 6.9, update Db
  5. Launch Reindex all or try to search something
  • Actual behavior The index could not be deleted and FTS also doews not work
org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="D:\studioSN\2018.08.16\fts65\deploy\tomcat\work\app-core\ftsindex\segments_3"))): this index is too old (version: 5.3.0). This version of Lucene only supports indexes created with release 6.0 and later.

Alternative script:

  1. Open https://demo.cuba-platform.com/app/#!
  2. Try to search something

Service info in the search results for xlsx files

  1. Create a project
  2. Create an entity
  3. Create association attribute to FileDescriptor entity with Many-To-One cardinality
  4. Create entity instance and attach xlsx file to it
  5. Try to find this file by the content in it
  6. In the results of the search, some strange service info is presented

image.png

Add bouncycastle dependencies for PDF parsing

Environment

  • Platform version: 6.9.0

Description of the bug or enhancement

Add dependencies to bouncycastle. Bouncycastle is used for parsing PDF with digital signature
Otherwise exception occurs:

Caused by: java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
    at org.apache.pdfbox.pdmodel.encryption.PDEncryption.<init>(PDEncryption.java:96)
    at org.apache.pdfbox.pdfparser.PDFParser.prepareDecryption(PDFParser.java:310)
    at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:225)
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132)
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141)
    at com.haulmont.fts.core.sys.LuceneIndexer.appendFileContent(LuceneIndexer.java:228)
    at com.haulmont.fts.core.sys.LuceneIndexer.createAllFieldContent(LuceneIndexer.java:205)
    at com.haulmont.fts.core.sys.LuceneIndexer.indexEntity(LuceneIndexer.java:137)
    at com.haulmont.fts.core.app.FtsManager.initIndexer(FtsManager.java:291)
    at com.haulmont.fts.core.app.FtsManager.processQueue(FtsManager.java:231)
    at sun.reflect.GeneratedMethodAccessor1160.invoke(Unknown Source)

FTS indexing spends most of time fetching entitities one by one from database

Environment

  • Platform version: 6.10 snapshot

Description of the bug or enhancement

FTS indexing is much slower than it could be. It spends most of time not to write lucene index, but to load entities and lazy-load entity graph from database.

  • Open refapp 6.10
  • Create 100 000 of User instances by invoking this script:
insert into sec_user (id, version, create_ts, created_by, update_ts, login, login_lc, name, email, group_id, position_)
select newid(), 1, now(), 'admin', now(), 
'us' ||  t, 'us' ||  t, 
'Name ' || t, 'user' || t || '@example.com', 
'0fa2b1a5-1d68-4d69-9fbd-dff348347f93', 'Manager'
from generate_series(1,100000) t
  • Launch JVisualVM, start CPU sampling.
  • Invoke JMX FtsManager -> asyncReindexEntity "sec$User"
  • Wait until indexing is finished and analyze CPU snapshot.
    image

CPU time distribution:

  • FtsManager.processQueue() - 14160 ms
  • lazy-loading User.group field from LuceneIndexerBean.addLinkedPropertyEx - 3136 ms
  • lazy-loading User collections (roles, substitutions) from LuceneIndexerBean.addLinkedPropertyEx - 2602 ms
  • loading every entity to be indexed one by one by using em.find() - 4351 ms

So time share to load indexed entities from DB is:
(3136 + 2602 + 4351) / 14160 * 100% = 71.25%

Thus FtsManager indexing speed is limited by ping to database. On my machine it is about 1000 entities per second.
For 1 million entities indexing time is about 20 minutes - which means a significant index downtime for those systems which rely on FTS search.

FTS does not work for identity entities

Environment

  • Platform version: 6.9.0

FTS does not work for identity entities

  • Create identity entity, create an editor for entity
  • Enable FTS for entity
    Try to create identity entity
    Error occurs:
java.lang.IllegalStateException: Cannot get primary key value: entity is null
        at com.haulmont.cuba.core.entity.IdProxy.getNN(IdProxy.java:135) ~[cuba-global-6.9.0.jar:6.9.0]
        at com.haulmont.cuba.core.entity.FtsQueue.setObjectEntityId(FtsQueue.java:143) ~[cuba-global-6.9.0.jar:6.9.0]
        at com.haulmont.fts.core.app.FtsSenderBean.persistQueueItem(FtsSenderBean.java:128) ~[fts-core-6.9.0.jar:6.9.0]
        at com.haulmont.fts.core.app.FtsSenderBean.enqueue(FtsSenderBean.java:106) ~[fts-core-6.9.0.jar:6.9.0]
        at com.haulmont.fts.core.app.FtsSenderBean.lambda$enqueue$0(FtsSenderBean.java:85) ~[fts-core-6.9.0.jar:6.9.0]
        at com.haulmont.cuba.core.sys.PersistenceImpl$EntityManagerContextSynchronization.afterCompletion(PersistenceImpl.java:329) ~[cuba-core-6.9.0.jar:6.9.0]
        at org.springframework.transaction.support.TransactionSynchronizationUtils.invokeAfterCompletion(TransactionSynchronizationUtils.java:168) ~[spring-tx-4.3.14.RELEASE.jar:4.3.14.RELEASE]
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.invokeAfterCompletion(AbstractPlatformTransactionManager.java:1002) [spring-tx-4.3.14.RELEASE.jar:4.3.14.RELEASE]
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.triggerAfterCompletion(AbstractPlatformTransactionManager.java:977) [spring-tx-4.3.14.RELEASE.jar:4.3.14.RELEASE]
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:806) [spring-tx-4.3.14.RELEASE.jar:4.3.14.RELEASE]
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:730) [spring-tx-4.3.14.RELEASE.jar:4.3.14.RELEASE]

Fts search window doesn't show last empty page

Environment

  • Platform version: 6.10

Description of the bug or enhancement

AR: When user click Next page and page is empty, system shows empty page.
ER: System must show notification "Nothing to found" and haven't to show empty page

Support Analyzer setting and customs analyzers in FTS

Apache Lucene has built-in [Analyzers|https://lucene.apache.org/core/6_4_0/analyzers-common/overview-summary.html] to support different languages and their accented letters in search, for example, searching for foo would also find fôo, föo, and fòo.
The user should have an ability to select the required analyzer in FTS properties and to force FTS to ignore accents or, vice versa, search only for exact matches of accents.
See also:
https://www.cuba-platform.com/discuss/t/diacritics-in-full-text-search-match/3329
https://community.alfresco.com/thread/194124-lucene-search-with-accented-characters
https://lucene.apache.org/core/7_1_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html#foldToASCII-char:A-int-
https://stackoverflow.com/questions/24825662/cant-return-results-for-words-with-accents-on-lucenetika


Original issue: https://youtrack.haulmont.com/issue/PL-10468

Expose LRUQueryCache statistics to JMX

Description of the bug or enhancement

Lucene has global cache LRUQueryCache stored in org.apache.lucene.search.IndexSearcher#DEFAULT_QUERY_CACHE field.

From LRUQueryCache javadocs:

This cache exposes some global statistics (hit count, miss count, number of cache entries, total number of DocIdSets that have ever been cached, number of evicted entries).

I think this can be useful for monitoring, if we expose it through jmx attributes.

FTS improvements

@haulmont-git commented on Mon Nov 30 2015

основное что долго происходит в полнотекстовом поиске - это поиск hitInfo. это особенно видно когда идет поиск по FileDescriptor.
в тезисе есть реализация ленивой подгрузки hitinfo:

  1. для поиска в списках hitinfo ищется только если навели мышкой на строку (см. LazyHitInfoMap)
  2. для глобального поиска hitinfo ищется только для отображаемых данных (см. ThesisFtsServiceBean#makeSearchResult)

также при глобальном поиске хотелось бы видеть инфу отсортированную по дате создания в обратном порядке .сейчас инфа о дате создания не хранится в индексе, но может быть стоит хранить?


Original issue: https://youtrack.haulmont.com/issue/PL-6375

Denormalized index in FTS

At present, if some related entity is indexed with the main entity, then FTS search performs two queries and works this way:

  1. Find all entities of a given type where the search term is found in the local attributes.
  2. Find all entities of given type that have a related entity where the search term is found in the local attributes of the related entity.
    This approach won't allow us to execute queries like "Term1 AND Term2" where only "Term1" can be in the main entity and only "Term2" in the related one.
    Seems that the current approach also will not allow us using Lucene query parse syntax (https://lucene.apache.org/core/2_9_4/queryparsersyntax.html), e.g. "+apple -orange".
    The possible solution is to use a denormalized index where a content of the main and of all related entities is stored in one document. In this case, when some entity is modified, we will have to find and reindex all related entities.
    Most likely the new mechanism (using a denormalized index) should be optional and the old approach should remain.

Original issue: https://youtrack.haulmont.com/issue/PL-9360

Search field for fts to be supported not only for the main screen

Enable fts search field for all screens. Search field should be able to consider concrete graph and attributes to be considered for search through this field (e.g. on customer browser it should search by a customer, attached documents and customer's employees only). The idea is to create multiple FTS Search Configs (might be using existing fts.xml) describing a graph for search, so it can be specified in the component and search will consider only entities and their fields from this graph.

  • support of this visual component in the Studio

🔗 Related links: {"relates to:": "https://youtrack.haulmont.com/issue/PL-10099"}


Original issue: https://youtrack.haulmont.com/issue/PL-8732

Return metadata in entities list response

The result of entities list method is a JSON array of entities objects. The count is returned in a response header. Consider returning an object that has a field "entities" with a JSON array and all required metadata in other fields. Probably, returning a JSON object instead of array will also be a good practice for other controllers (e.g. queries). This will simplify the code at the client side and also give an ability to extend the response with other information in future.


Original issue: https://youtrack.haulmont.com/issue/PL-9964

Improvement of the FTS components and API for showing search results

@devmix commented on Tue Jul 03 2018

Environment

  • Platform version: 6.9.1
  • Client type: Web
  • Component: FTS

Description of the enhancement

Minimal reproducible example

Any browser screen with FTS search.

Actual behavior

The founded hits shows in the tooltips for each row of the table. In this case I can only view a founded entities plus some information via tooltips, but that not always comfortable for an users. For example, when I want to see not only founded entities but also information which entity field or attached file (FileDescriptor) contains a search word then there only small tooltip and no any way to open that file.

Expected behavior

The first, it good to have ability to work not only with a table and tooltips but have some mechanism to show hits in a custom components, e.g. DataGrid with custom DetailsGenerator which will be opened for each row or new and configurable component for showing FTS results. That all for a standard browser screens.

The second one, there is some missed information in the hits, e.g. I can't create a link to related entity (for example FileDescriptor) because the hit does not contains such information (only name of file).

Ability to change or reset FTS index directory without restarting the server

Currently field com.haulmont.fts.core.app.FtsManager#directory is assigned only one time, first time when it is used. And it cannot be cleared or changed later without restarting the server.

Use cases when we may need to reinitialize index directory:

  • We want to change index location
  • Index files were corrupted, we need to delete all indexes manually and reindex from scratch.
  • Index files are obsolete and can't be upgraded (e.g. you can't upgrade indexes when upgrading from cuba 5.6 to cuba 6.8).

Original issue: https://youtrack.haulmont.com/issue/PL-10585

Expose FTS index statistics through jmx

Currently as I know there is no out-of-the-box way to determine number of entities indexed.

In Sherlock we have created a JMX bean factory for monitoring purposes. It can be useful for any CUBA application which uses FTS.

FtsEntityTypeStatistics bean factory works on server start. It creates a jmx bean for every indexed entity. Such templated jmx beans can be used e.g. by Zabbix "discovery rules" to dynamically discover attributes by MBean name template.

Jmx bean has two parameters: NumExisting and NumActive.

NumExisting can be simply determined as count(*) from entity_table.
For entities with "searchableIf" it will be incorrect, so for accurate estimation here an extension point would be useful.

NumIndexed can easily be determined by this code:

     Directory directory = ftsManager.getDirectory();
     DirectoryReader reader = DirectoryReader.open(directory);
     Term term = new Term(Lucene.FLD_ENTITY, entityName);
     long result = reader.docFreq(term);

!image.png!


Original issue: https://youtrack.haulmont.com/issue/PL-10609

Instances of different entities with the same IDs are not found

Environment

  • Platform version: 6.9.4

Description of the bug or enhancement

  • Minimal reproducible example
  1. create three IntId/IntIdentity entities (For Long*Id and StringId - absolutely the same)

  2. create standard screens for it

  3. enable FTS in the project

  4. Create several instances of the entities. Save the same values in some fields of this entities

  5. Reindex all
    All the entities are saved to queue
    image.png

  6. Proces queue

  7. FTS search by the value

  • Actual behavior
    Entities with the same IDs are exluded from search results
    image.png

FTS add-on localization doesn't work

Environment

  • Platform version: 6.9.5

Description of the bug or enhancement

  1. Create a new project
  2. Add the fts component
  3. Add a new language (Spanish, for example) to the project
  4. Add the component localization, as it is described in the translations addon
  5. Run the app, reindex some entities
  6. Try to find something

A.R. Notifications and captions are not localized
Workaround: add the path to the localisation: com.haulmont.fts.web to the cuba.mainMessagePack property in the web-app.properties file.

Relates to cuba-platform/translations#40

Search window doesn't show results for file

Environment

  • Platform version: 6.10.0

Description of the bug or enhancement

  • Minimal reproducible example
    • Create model: entity with reference to FileDescriptor
    • Start application. Create entity with file.
    • Index entity
    • Find entity in the search window by file content
  • Expected behavior
    Entity should be found
  • Actual behavior
    Entity isn't found

Help with FTS index backup / copying

FTS indexes are one of three data storage components for a standard CUBA application (two other are database and file storage).
When you copy server data to another server, or when you backup server data - you may want to copy FTS indexes as well.

But you can't just copy contents of tomcat/work/app-core/ftsindex folder - indexing can be in progress at the moment of files copying. So some support from FTS is necessary.

Short googling shows advices to use SnapshotDeletionPolicy: https://stackoverflow.com/questions/5897784/lucene-index-backup


Original issue: https://youtrack.haulmont.com/issue/PL-10586

NPE in the search window for LONG/INT identity id entities

https://www.cuba-platform.com/discuss/t/nullpointerexception-in-fts-add-on-with-integer-based-identity/7155

Environment

  • Platform version: 6.10.0

Description of the bug or enhancement

  • Minimal reproducible example
  • Create long/int identity entity in the model using Studio
  • Enable FTS for project
  • Start application
  • Create several identity entities. Reindex entity.
  • Try to search by this entity
  • Actual behavior
    Exception:
Caused by: java.lang.NullPointerException: null
	at com.haulmont.fts.core.sys.DatabaseDataLoader.mergeSearchData(DatabaseDataLoader.java:76) ~[na:na]
	at com.haulmont.fts.core.app.FtsServiceBean.searchByTerm(FtsServiceBean.java:212) ~[na:na]
	at com.haulmont.fts.core.app.FtsServiceBean.search(FtsServiceBean.java:57) ~[na:na]
	at com.haulmont.fts.app.FtsService.search(FtsService.java:28) ~[fts-global-6.10-SNAPSHOT.jar:6.10-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_144]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_144]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_144]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) ~[spring-aop-4.3.18.RELEASE.jar:4.3.18.RELEASE]
	at com.sun.proxy.$Proxy258.search(Unknown Source) ~[na:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_144]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_144]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_144]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_144]
	at com.haulmont.cuba.core.sys.remoting.LocalServiceInvokerImpl.invoke(LocalServiceInvokerImpl.java:94) ~[na:na]
	at com.haulmont.cuba.web.sys.remoting.LocalServiceProxy$LocalServiceInvocationHandler.invoke(LocalServiceProxy.java:154) ~[cuba-web-6.10-SNAPSHOT.jar:6.10-SNAPSHOT]
	at com.sun.proxy.$Proxy60.search(Unknown Source) ~[na:na]
	at com.haulmont.fts.web.ui.results.SearchLauncher.call(SearchLauncher.java:45) ~[fts-web-6.10-SNAPSHOT.jar:6.10-SNAPSHOT]
	at com.haulmont.fts.web.ui.results.SearchLauncher.call(SearchLauncher.java:23) ~[fts-web-6.10-SNAPSHOT.jar:6.10-SNAPSHOT]
	at com.haulmont.cuba.gui.WindowManager.createWindowByScreenClass(WindowManager.java:721) ~[cuba-gui-6.10-SNAPSHOT.jar:6.10-SNAPSHOT]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.