Comments (9)
Thank you for bringing this to our attention. To help us understand the issue better, could you please try implementing a diagnostic step in your code?
You can use the dd() function within your custom tokenizer during a search operation. This way we'll see if it hits the correct tokenizer or the default one
public function tokenize($text, $stopwords = []) {
$return = preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY);
dd($return);
}
from tntsearch.
And while testing this, please set the fuzziness to false
TNTSEARCH_FUZZINESS=false
from tntsearch.
@nticaric Thanks for the quick reply! So, I tried this and interestingly enough, there is no debug output. This search is being done via an API call, but when I go to the API server and view the request for the search in Debugbar, there is no dd output. I am not sure what I am doing given this revealation, but any suggestions would be welcome!
from tntsearch.
Can you query the info
table of the index? It could be that the original index was build with the default tokenizer
from tntsearch.
@nticaric, here you go:
from tntsearch.
Ok, I had forgotten to set the return as a var, so it was returning before the dd was called. Now it appears to be breaking the request, which tells me it is using the custom tokenizer.
If it helps, it almost looks as though it is struggling with numerical content at the beginning of the keyword. For example, sx-70 seems to return relevant results, but 70-200 returns items with either 70 or 200 in the name and a dash in the name outside of that, but prioritizes 200.
And I remember that the import seems to have indexed just simply "-" as a keyword, and I wonder if that is the issue. I am going to try to delete that keyword from the keywords and see what that does.
from tntsearch.
Are you sure fuzziness is turned off?
from tntsearch.
Are you sure fuzziness is turned off?
Yep, confirmed. I realized my regexp is allowing spaces and dashes with spaces around them to be indexed as keywords. I am slow with Regexp, so I am trying to remedy that now.
from tntsearch.
Ok @nticaric here is the regex I am currently using:
static protected $pattern = '/[^\p{L}\p{N}\.\+-](?!\s-\s)+/u';
I am still seeing the behavior where 70-200mm does not return relevant results (items with "200" and "70" show up, but nothing with "70-200"), but sx-70 returns relevant results. Any thoughts or guidance is welcome. Again, I suck at regex, so perhaps there is a better way to say:
Do not allow the following characters to be treated as stop words:
- a-z
- A-Z
- 0-9
- plus sign
- hyphen
- .
Again indexing seems to allow phrases like 70-200mm into the wordlist table, but searching for them does not yield expected results.
Thanks again for your help.
from tntsearch.
Related Issues (20)
- The highlighter highlights itself
- Filesystem driver , score calculation -- fix proposal
- Update a geosearch index HOT 1
- Result not matched HOT 6
- Undefined index: docScores HOT 2
- Does it possible to reate index by array of data? HOT 4
- Dynamic properties used in "TeamTNT\TNTSearch\Indexer" HOT 3
- Depreciation : Using ${var} in strings is deprecated, use {$var} instead in PHP 8.2 HOT 1
- tntsearch Deprecated: Creation of dynamic property HOT 1
- Anyone know what this random SMS-Texts file is? HOT 2
- Diacritic-Insensitive Search Support (Czech characters) HOT 3
- Performance issues with large datasets HOT 6
- Class 'TeamTNT\TNTSearch\Engines\Exception' not found in 'vendor/teamtnt/tntsearch/src/Engines/EngineTrait.php' line 46 HOT 1
- Per-Model Fuzzy Search Configuration in Laravel Scout HOT 1
- [FEATURE] Support of PSR-16 adapter
- How to add MYSQL_ATTR_SSL_CA option? HOT 1
- $startpos adjustment may return minus value. HOT 1
- How to update index for which no index.
- Fuzziness / Fuzzy-Search not working HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tntsearch.