clouway / cuse Goto Github PK
View Code? Open in Web Editor NEWclouWay universal search engine
clouWay universal search engine
now we should write it this way
for (Address address : addresses) {
searchEngine.register(new AddressIndex(address));
}
it would be a lot easier if we can just call
searchEngine.registerAll(addressIndexes);
Provide a way to execute search queries by Date.
Should be possible to execute searches by Date applying the following inequality operators ( >, >=, <, <=, =)
When some one wants to use that api ti is very hard to find all the details for the integration.
The integration steps and examples of the api usage must be added.
cuse may provide and composite queries with "or" and "and"
// Query: "(deletedOn >= :nextDay or deleted:true)"
During development of searching for client device by partial serial number we came up with TextIndex
which will provide a set of indexes, the all combinations of subwords for a given set of words.
This is simple comparison between the two approaches (TextIndex
and IndexWriter
).
You can use it if you find it useful.
Test word : mississippi
public class TextIndex {
private final Set<String> words;
public TextIndex(String... words) {
this.words = newLinkedHashSet(newArrayList(words));
}
public List<String> generate() {
Set<String> index = newLinkedHashSet();
for (String word : words) {
word = word.toUpperCase();
for (int i = 0; i < word.length(); i++) {
for (int j = i + 1; j < word.length(); j++) {
index.add(word.substring(i, j));
}
index.add(word.substring(i));
}
}
return newArrayList(index);
}
}
Generated Indexes: 53
M
MI
MIS
MISS
MISSI
MISSIS
MISSISS
MISSISSI
MISSISSIP
MISSISSIPP
MISSISSIPPI
I
IS
ISS
ISSI
ISSIS
ISSISS
ISSISSI
ISSISSIP
ISSISSIPP
ISSISSIPPI
S
SS
SSI
SSIS
SSISS
SSISSI
SSISSIP
SSISSIPP
SSISSIPPI
SI
SIS
SISS
SISSI
SISSIP
SISSIPP
SISSIPPI
ISSIP
ISSIPP
ISSIPPI
SSIP
SSIPP
SSIPPI
SIP
SIPP
SIPPI
IP
IPP
IPPI
P
PP
PPI
PI
IndexWriter
Generated Indexes: 25
ssissippi
issippi
sissippi
mis
ssissip
iss
sippi
mississ
m
ippi
ississipp
mississipp
mississippi
i
ppi
missi
mississip
missis
mississi
ssippi
sissi
ississippi
pi
mi
miss
example strategy
public class SearchIndexStrategy implements IndexingStrategy<SearchIndex> {
@Override
public String getIndexName() {
return SearchIndex.class.getSimpleName();
}
@Override
public String getId(SearchIndex index) {
return String.valueOf(index.getId());
}
@Override
public IndexingSchema getIndexingSchema() {
return IndexingSchema.aNewIndexingSchema()
.fields("id", "state", "creationDate", "tags", "operationZones")
.fullTextFields("city", "street", "customer")
.build();
}
can be changed like that :
@SearchIndex("SearchIndex")
public class SearchIndex {
@SearchIndexId
private Long id;
@FullWordSearch
private String city;
@FullTextSearch
private String street;
@FullWordSearch
private String customer;
private String state;
@SearchableDate
private Date creationDate;
private List<String> tags = Lists.newArrayList();
private List<Long> operationZones = Lists.newArrayList();
public SearchIndex(Object object) {
id = object.getId();
operationZones = Lists.newArrayList(object.getOperationZoneIds());
if (object.getState() != null) {
state = object.getState().getState();
}
if (object.getCreationInfo() != null) {
creationDate = object.getCreationInfo().getCreationDate();
}
if (object.getTags() != null) {
tags = Lists.newArrayList(object.getTags());
}
//the post code should be separated property and not full text search , for better search
Address serviceAddress = object.getServiceAddress();
if (serviceAddress != null) {
city = serviceAddress.getCityLine();
street = serviceAddress.getAddressLine();
}
if (object.getClient() != null) {
String customerLine = ObjectLineBuilder.line()
.wordSuf(object.getClient().getNameLine(), " ")
.wordSuf(object.getClient().getEmail(), " ")
.wordSuf(object.getClient().getTelephone(), " ")
.build();
customer = customerLine;
}
}
//not sure that we will need this when we have @SearchIndexId annotation
public Long getId() {
return id;
}
}
suggestion: proper exceptions should be thrown, so while something important is missing it will be easy to fix.
take a look in this issue too register type converters #13
In the past we had problems on the local environment while executing different search queries which sorts the returned results by date (asc/desc order) and passed offset.
We've ran some tests once again to verify thit behavior, but as result we received correct data. We will accept that the API is working fine for now.
Here are some of the tests
@Test
public void sortConsecutiveRecordsByDateAndPassedOffsetAsc() {
store(aNewEmployee().id(1l).birthDate(aNewDate(2014, 1, 3)).build());
store(aNewEmployee().id(2l).birthDate(aNewDate(2014, 1, 5)).build());
store(aNewEmployee().id(3l).birthDate(aNewDate(2014, 1, 8)).build());
store(aNewEmployee().id(4l).birthDate(aNewDate(2014, 1, 13)).build());
store(aNewEmployee().id(5l).birthDate(aNewDate(2014, 1, 21)).build());
List<Employee> result = searchEngine.search(Employee.class)
.sortBy("birthDate", SortOrder.ASCENDING, SortType.NUMERIC)
.offset(2)
.returnAll()
.now();
assertThat(result.size(), is(3));
assertThat(result.get(0).id, is(3l));
assertThat(result.get(1).id, is(4l));
assertThat(result.get(2).id, is(5l));
}
@Test
public void sortNonConsecutiveRecordsByDateAndPassedOffsetAsc() {
store(aNewEmployee().id(5l).birthDate(aNewDate(2014, 1, 21)).build());
store(aNewEmployee().id(2l).birthDate(aNewDate(2014, 1, 5)).build());
store(aNewEmployee().id(4l).birthDate(aNewDate(2014, 1, 13)).build());
store(aNewEmployee().id(1l).birthDate(aNewDate(2014, 1, 3)).build());
store(aNewEmployee().id(3l).birthDate(aNewDate(2014, 1, 8)).build());
List<Employee> result = searchEngine.search(Employee.class)
.sortBy("birthDate", SortOrder.ASCENDING, SortType.NUMERIC)
.offset(4)
.returnAll()
.now();
assertThat(result.size(), is(1));
assertThat(result.get(0).id, is(5l));
}
@Test
public void sortConsecutiveRecordsByDateAndPassedOffsetDesc() {
store(aNewEmployee().id(1l).birthDate(aNewDate(2014, 1, 5)).build());
store(aNewEmployee().id(2l).birthDate(aNewDate(2014, 1, 10)).build());
store(aNewEmployee().id(3l).birthDate(aNewDate(2014, 1, 15)).build());
store(aNewEmployee().id(4l).birthDate(aNewDate(2014, 1, 4)).build());
List<Employee> result = searchEngine.search(Employee.class)
.sortBy("birthDate", SortOrder.DESCENDING, SortType.NUMERIC)
.offset(1)
.fetchMaximum(10)
.now();
assertThat(result.size(), is(3));
assertThat(result.get(0).id, is(2l));
assertThat(result.get(1).id, is(1l));
}
@Test
public void sortNonConsecutiveRecordsByDateAndPassedOffsetDesc() {
store(aNewEmployee().id(2l).birthDate(aNewDate(2014, 1, 10)).build());
store(aNewEmployee().id(4l).birthDate(aNewDate(2014, 1, 4)).build());
store(aNewEmployee().id(1l).birthDate(aNewDate(2014, 1, 5)).build());
store(aNewEmployee().id(3l).birthDate(aNewDate(2014, 1, 15)).build());
List<Employee> result = searchEngine.search(Employee.class)
.sortBy("birthDate", SortOrder.DESCENDING, SortType.NUMERIC)
.offset(2)
.fetchMaximum(10)
.now();
assertThat(result.size(), is(2));
assertThat(result.get(0).id, is(1l));
assertThat(result.get(1).id, is(4l));
}
searchEngineProvider.get().search(Employee.class)
.inIndex(EmployeeIndex.class)
.where("technicalCenter", SearchFilters.isAnyOf(technicalCenterIds))
.whereNotRequired("active", SearchFilters.is(true))
.returnAll().now();
In the documentation of GAE Search API it is said that underscores do not break up words.
We've written some tests to verify that behaviour and as a result we couldn't receive the expected results from executing some of our tests.
Here is an example test.
@Test
public void searchForFieldThatContainsUnderscore() {
store(aNewEmployee().id(1l).firstName("John Adam").build());
store(aNewEmployee().id(2l).firstName("John_Adam").build());
List<Employee> result = searchEngine.search(Employee.class).where("firstName", SearchFilters.is("John")).returnAll().now();
assertThat(result.size(), is(1));
}
Running the following test fails. Instead of returning only one results, it returns both of them. We've deployed this code in the test application to try it out and as result we receive only one matching result (which was the expected results). It turns out that the local environment of the search api have a different behavior compared to the production and breaks up words when there is underscore.
Due to this fact in the future when we need to index some fields with values containing underscores it's better to avoid them.
Caused by: com.google.appengine.api.search.SearchQueryException: Unable to parse query: куче/088888888 closed:false locationId:(7057283 OR 7133203 OR 7173275 OR 7229040 OR 7237249 OR 7237264 OR 7241130 OR 7245148 OR 7247156 OR 7251005 OR 7251008 OR 7251009 OR 7251010 OR 7259004 OR 7268195 OR 7271007 OR 7271008 OR 7275002 OR 7277005 OR 7286002 OR 7286003 OR 7289002 OR 7290002 OR 7292002 OR 8804086 OR 6384259640066048 OR 5229884100050944) departmentId:(7295047 OR 7296057 OR 7297060 OR 7299058 OR 7301067 OR 7302062 OR 7302063 OR 7304046 OR 7305071 OR 7306056 OR 7308031)
at com.google.appengine.api.search.checkers.QueryChecker.checkQueryParses(QueryChecker.java:44)
at com.google.appengine.api.search.checkers.QueryChecker.checkQuery(QueryChecker.java:28)
at com.google.appengine.api.search.Query$Builder.setQueryString(Query.java:91)
at com.google.appengine.api.search.Query$Builder.build(Query.java:107)
at com.clouway.cuse.gae.GaeSearchApiMatchedIdObjectFinder.buildQuery(GaeSearchApiMatchedIdObjectFinder.java:53)
at com.clouway.cuse.gae.GaeSearchApiMatchedIdObjectFinder.find(GaeSearchApiMatchedIdObjectFinder.java:26)
at com.clouway.cuse.spi.Search.now(Search.java:128)
When we store a field with value containing many words, we should store in the index each word. This way later we can execute searches by passing any of the stored words.
Provided that we have a map of field names and values, generated from a user-filled nomenclature for example, a search should be able to be made by a specific field.
Example case:
@SearchIndex(name = "DeviceIndex")
public class DeviceIndex {
@SearchId
private Long id;
private String type;
...
@DynamicFields // example new annotation for this purpose
private Map<String, String> fields; //{"fieldName1": "value1", "fieldName2": "value2", ...}
...
}
That way in the search index the values of fields
should be broken down for FullTextSearch, but the keys(the names of the dynamic fields) should not be broken down.
Potential problems where to store the field names in the index:
fields
then what sort of delimiter will be used for separating key from value, and also when making the search how is the key going to be specified.We should be able to execute search queries on fields containing many values by passing a list of values.
For example if we have the following indexed field
tags: 1, 2, answered
after execute the following query we should receive results
searchEngine.search(Index.class).where("tags", SearchFilters.is(Arrays.asList("1", "answered"))).returnAll().now();
We could improve search api, by adding of backing cache which to be used for caching of query results.
The eviction policy should be considered, cause we have to ensure that indexes are consistent between cache and api calls.
When search query is like this "12:34:c4" then string is interpreted as three separated words and found matches are many.
When search query contains words with special characters should be interpreted as one word and search query will lock like this ""12:34:c4"" after escaping.
Еxample:
search query : "Tarnovo 12:34:c4"
should be escaped like this: "Tarnovo "12:34:c4""
Common usage of 'Ignore' annotations is like this:
@Ignore
@SearchId
private String id;
but field "id" is added to search index when have another annotation
Make deployment to automatically add latest jar in the maven repository.
It is unclear which will be the index name.
searchEngine.get().delete("indexName", addressIds);
the index name is placed in the IndexingStrategy object, there is a method :
...
String getIndesName();
...
so would be better if we pass the index class(maybe interface is need), and then it can find the strategy and the index name, or we can pass directly the index strategy
At some projects, users are using Custom data types for "Date & Time" objects, so we should provide a mechanism for registration of type converters for these types.
class MyIndex {
private DateTime creationTime;
}
...
GaeSearchApiCuseModule module = new GaeSearchApiCuseModule(TwigEntityLoader.class);
module.registerTypeConverter(new Converter<DateTime,Date>() {
public Date convert(DateTime dateTime) {
if (dateTime == null) {
return null;
}
return dateTime.getDate();
}
}
So if we have such converters, we will be able to remove duplication in our index classes such as.
public TeamIndex(Team team){
this.id = team.getId();
this.locationIds = team.getLocationIds();
this.departmentIds = team.getDepartmentIds();
if(team.getDeletedOn() != null) {
this.deletedOn = team.getDeletedOn().getDate();
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.