deepstreamio / deepstream.io-provider-search-rethinkdb Goto Github PK
View Code? Open in Web Editor NEWA data-provider that makes every table searchable via rethinkdb
License: Other
A data-provider that makes every table searchable via rethinkdb
License: Other
This code:
client.login({ username: 'username', clientUid, password: 'asd' }, (success, data) => {
const query = JSON.stringify({
table: 'table',
query: [
[ 'createdBy', 'eq', 'username' ]
]
})
const result = client.record.getList( 'search?' + query )
result.whenReady(async r => {
const ids = r.getEntries()
let projects = await Promise.all(ids.map(async (id) => this.get(id)))
result.delete()
})
})
throws this error:
21:06:56:691 | received subscription for search?{"table":"table","query":[["createdBy","eq","username"]]}
LOCAL_LISTEN | nothing to stop for R:search?{"table":"table","query":[["createdBy","eq","username"]]}
RECORD_DELETION | search?{"table":"table","query":[["createdBy","eq","username"]]}
21:06:56:708 | discard subscription for search?{"table":"table","query":[["createdBy","eq","username"]]}
21:06:56:709 | Removing search search?{"table":"table","query":[["createdBy","eq","username"]]}
search?{"table":"table","query":[["createdBy","eq","username"]]}
/Users/.../node_modules/bluebird/js/main/async.js:43
fn = function () { throw arg; };
^
TypeError: Cannot read property 'name' of null
at Search._processInitialValues (/Users/.../node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:144:66)
at Search._readChange (/Users/.../node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:103:12)
at /Users/.../node_modules/rethinkdb/cursor.js:261:20
at tryCatcher (/Users/.../node_modules/bluebird/js/main/util.js:26:23)
at Promise.successAdapter (/Users/.../node_modules/bluebird/js/main/nodeify.js:23:30)
at Promise._settlePromiseAt (/Users/.../node_modules/bluebird/js/main/promise.js:582:21)
at Promise._settlePromiseAtPostResolution (/Users/.../node_modules/bluebird/js/main/promise.js:248:10)
at Async._drainQueue (/Users/.../node_modules/bluebird/js/main/async.js:128:12)
at Async._drainQueues (/Users/.../node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues [as _onImmediate] (/Users/.../node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:785:20)
at tryOnImmediate (timers.js:747:5)
at processImmediate [as _immediateCallback] (timers.js:718:5)
which completely crashes the deepstream.io server.
In our system we make extensive use of generated lists, such as by the rethink search provider. Unfortunately, we sometimes need to be able to distinguish between a default falsey value and an actual generated falsey value - such as to show a "loading" message to the user, or because we generate a lot of data iff it is missing - and we don't know if we've received a reply of false or are only looking at the default value. (Actual current need for the generated data case: find last user week record; if none, set to user's hire date. Show a list of all weeks from last user week record till now.)
Asking on Slack, a user suggested emitting a deepstream event when the calculation is done - while I admit that this is possible, I am uncomfortable treating that as a general solution, much less the official mechanism for the rethink search provider (which we heavily use). There appears to be no reason the deepstream protocol should be changed to support this - the client merely needs to set a flag / emit / whatever when it gets the first value from the deepstream server, to distinguish it from the whenReady
event (unless the server is preemptively generating that default value for the client, in which case the rabbit hole goes deeper).
If there is an easy solution, it might be worth documenting it - otherwise, some pointers would be very helpful.
Is it possible to query on a path kind of properties
{
"name":"John",
"phone": { "areaCode":"571", "state" : "va" }
}
query
var queryString = JSON.stringify({
table: 'users',
query: [
[ 'phone.areaCode', 'eq', '571' ],
[ 'phone.state', 'eq', "va" ]
]
});
ds.record.getList( 'search?' + queryString );
https://deepstream.io/tutorials/core/searching-and-querying/ states:
It's a good idea to delete your queries when you're finished with them. Do this with:
colorPriceString.delete()
This is not a good idea - it requires writing client code which is aware that it's own queries might be deleted at any moment and resubscribing, should another client delete their search. Furthermore, the code is written such that on the last unsubscription, the record is deleted anyway.
The bug here is as documented at https://deepstream.io/tutorials/core/datasync-records/:
The listen callback with isSubscribed=false is only triggered once the last subscriber has disconnected or discarded the record. If your active data provider is subscribed to the record as well in order to write to it, it counts as a subscriber and the callback won't be invoked. This is a known limitation and will be addressed in future releases that will also introduce load-balancing for listeners, only-one rules etc
emphasis mine.
Until this update is released, it would be best to only hold an instance of the _list
record while actively updating it, and discarding it immediately afterward.
var list = deepstream.record.getList( 'search?' + queryString);
list.subscribe(function( entries ){
// ...
}
list.delete();
The first time it's Ok
But the second time it doesn't return anything.
The log shows
MULTIPLE_SUBSCRIPTIONS | repeat supscription to "search?{"table"...
And with deepstream.io 2.0.1 it's fine.
The business case is client side joins, in a sense - getting all records which match some condition, then getting all records which reference those records in a separate query. I recognize we could use RPC providers (presumably we could instead of the search provider entirely, though it would be messy), but we're trying to keep query code clean and go through a single interface.
Have code, will merge and create a pull request referencing this issue.
Example data model:
A project can have many collaborators, thus userIds
are stored as an array in the collaborators
field.
A user needs to query for projects that the user owns, as well projects that the user is a collaborator of.
Ideally, there should be a contains
operator for a query as such:
{
"table": "projects",
"query": [
[ "collaborators", "contains", "<userId>" ]
]
}
Any way this can be included?
I strongly recommend to specify in the documentation user MUST protect search
endpoint if the database contains ANY confidential data in tables. Search provider has direct access to database and avoids all server permission checks.
Bad news it's not trivial to do this. Endpoint contains stringify JSON and it is hard to split permissions by table (or other params) with valve. Custom permission handler is required.
Simple attack vector
const checkAmount = (min, max) => {
if (max - min < 1) {
return console.log(`Amount is ${Math.round(min)}`);
}
const mid = (min + max) / 2;
const query = JSON.stringify({
table: 'accounts',
query: [
['name', 'match', 'alice'],
['amount', 'gt', mid]
]
});
const list = client.record.getList(`search?${query}`);
list.subscribe(entries => {
list.discard();
if (entries.length > 0)
checkAmount(mid, max);
else
checkAmount(min, mid);
});
};
checkAmount(0, 1000000);
I'm trying to use the search-provider from NPM (version 0.1.2) with Deepstream (version 0.6.0) and Deepstream client (version 0.3.5) and I keep getting an empty array when I'm calling .getEntries()
:
4:48:21 PM:924 | Found 1 initial matches for search?{"table":"user","query":[["username","eq","[email protected]"]]}
Error: entries must be an array of record names
at List.setEntries (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/deepstream.io-client-js/src/record/list.js:85:10)
at Search._populateList (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:123:13)
at Search._processInitialValues (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:101:8)
at tryCatcher (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/util.js:26:23)
at Promise.successAdapter (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/nodeify.js:23:30)
at Promise._settlePromiseAt (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/promise.js:579:21)
at Promise._settlePromiseAtPostResolution (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/promise.js:245:10)
at Async._drainQueue (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/async.js:128:12)
at Async._drainQueues (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues [as _onImmediate] (/home/johan/Development/mail2/briteback/repos/briteback-setting/node_modules/bluebird/js/main/async.js:15:14)
at processImmediate [as _immediateCallback] (timers.js:383:17)
This is my search-provider-conf:
var searchProvider = new SearchProvider({
logLevel: 3,
deepstreamClient: DSClientFrontend,
rethinkdbConnectionParams: {
host: config.db.host,
port: config.db.port,
db: config.db.name
}
});
(DSClientFrontend
is a started and signed-in Deepstream Client)
This is how I'm using it:
var queryString = JSON.stringify({
table: 'user',
query: [
['username', 'eq', '[email protected]']
]
});
var test = deepstreamClient.record.getList('search?' + queryString);
test.whenReady(() => {
console.log(hej.getEntries());
});
client.record.getList('search?' + JSON.stringify({ table: 'users', query: [] }));
will always return [], regardless of whether there are records or not. While one could use the predicate ['ds_id', 'ne', '0']
for virtually all conceivable situations, it should be much less awkward to get all records on a table.
Is there something I've missed? Normally deepstream.io wouldn't have a dynamic list of all records on the table, and I don't see another way to build such a list from rethink without using the search?
list builder.
--------------------------------------------------------------------------------
5:49:47 PM:303 | Initialising RethinkDb Connection
5:49:47 PM:324 | Initialising Deepstream connection
_ _ _
__| | ___ ___ _ __ ___| |_ _ __ ___ __ _ _ __ ___ (_) ___
/ _` |/ _ \/ _ \ '_ \/ __| __| '__/ _ \/ _` | '_ ` _ \ | |/ _ \
| (_| | __/ __/ |_) \__ \ |_| | | __/ (_| | | | | | |_| | (_) |
\__,_|\___|\___| .__/|___/\__|_| \___|\__,_|_| |_| |_(_)_|\___/
|_|
========================= starting ==========================
Can't connect! Deepstream server unreachable on ws://localhost:6020/deepstream
5:49:47 PM:523 | Connection to deepstream established
5:49:47 PM:523 | listening for search[\?].*
5:49:47 PM:524 | rethinkdb search provider ready
app/node_modules/bluebird/js/main/async.js:43
fn = function () { throw arg; };
^
TypeError: Cannot read property 'name' of null
at Search._populateList (app/node_modules/deepstream.io-provider-search-rethinkdb/sr
c/search.js:130:91)
at Search._readChange (app/node_modules/deepstream.io-provider-search-rethinkdb/src/
search.js:98:12)
at app/node_modules/rethinkdb/cursor.js:261:20
at tryCatcher (app/node_modules/bluebird/js/main/util.js:26:23)
at Promise.successAdapter (app/node_modules/bluebird/js/main/nodeify.js:23:30)
at Promise._settlePromiseAt (app/node_modules/bluebird/js/main/promise.js:582:21)
at Promise._settlePromises (app/node_modules/bluebird/js/main/promise.js:700:14)
at Async._drainQueue (app/node_modules/bluebird/js/main/async.js:123:16)
at Async._drainQueues (app/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues (app/node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:637:20)
at tryOnImmediate (timers.js:610:5)
at processImmediate [as _immediateCallback] (timers.js:582:5)
For a user to call ds.record.getList('search?' + queryString), that user needs create record permission. Why?
My initial expectation was:
Is this not the case?
Since or
is a pain to implement, and ge
and le
are available and simple (and not
would also be a bit of bother), we should add these two to the conditions array.
When I was trying to figure out the code I wrote my own version to analyse it. Since we can't use that anymore, I might as well share it in case you've any use for it. Remember that the number of bugs is a function of the number of lines/tokens.
https://gist.github.com/mclark-newvistas/74a0a960e3e6d4b35455580b2acfc0fb
primaryKey
can be specified for storage provider. But search provider uses hardcoded value (ds_id
). It will be nice to make primaryKey
configurable.
Hi Deepstream Team,
I've been integrating this provider and came across an issue in regards to query length. Once I started building slightly more complex queries (more than 5 conditions) I began to see:
storage:ReqlQueryLogicError: Primary key too long (max 127 characters)
It appears the creation of a list's primary id is composed of the stringified query. Therefore, with any longer query we begin bumping into issues with the 127 max allowed by RethinkDB.
Do you have any ideas on how to best handle this? Possibly rather than using the stringified query as the list's primary id maybe it should be a SHA of it?
How can I specify sort order (orderBy) in the query? Not sure how the list is currently sorted if at all.
Because we are JSON.parse
ing the input, we have ISO strings instead of Date
s - in order to search on date values, we should support converting those ISO strings back into Date
objects to pass to the rethink driver.
Have code, will merge.
I'd like to limit my results. Implementation should be possible without breaking something, right?
Subscribing to a search, then deleting it when done crashes the server. This is the best practice suggested at https://deepstream.io/tutorials/core/searching-and-querying/.
var users = client.record.getList('search?' + JSON.stringify({ table: 'users', query: [] })); users.on('error', e => console.log("fail", e)); users.whenReady((foo) => console.log(foo.getEntries()));
users.delete(); // after `whenReady` has returned
Note that I'm running the search provider and the deepstream.io server both from the same process, using the nodejs api for the latter.
Log output:
_ _ _
__| | ___ ___ _ __ ___| |_ _ __ ___ __ _ _ __ ___ (_) ___
/ _` |/ _ \/ _ \ '_ \/ __| __| '__/ _ \/ _` | '_ ` _ \ | |/ _ \
| (_| | __/ __/ |_) \__ \ |_| | | __/ (_| | | | | | |_| | (_) |
\__,_|\___|\___| .__/|___/\__|_| \___|\__,_|_| |_| |_(_)_|\___/
|_|
========================= starting ==========================
2:36:36 PM:168 | Initialising RethinkDb Connection
2:36:36 PM:178 | RethinkDb connection established
2:36:36 PM:178 | Initialising Deepstream connection
2:36:36 PM:185 | Connection to deepstream established
2:36:36 PM:186 | listening for search[\?].*
2:36:36 PM:187 | rethinkdb search provider ready
2:36:53 PM:553 | received subscription for search?{"table":"sub","query":[["random","gt",0.5]]}
undefined
2:36:59 PM:555 | received subscription for search?{"table":"users","query":[]}
2:37:03 PM:681 | Removing search search?{"table":"users","query":[]}
2:37:03 PM:684 | discard subscription for search?{"table":"users","query":[]}
/var/www/vhosts/modules.payzoom.com/node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:54
this._provider.log( 'Removing search ' + this._list.name )
^
TypeError: Cannot read property 'name' of null
at Search.destroy (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/src/search.js:54:54)
at Provider._onSubscription (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/src/provider.js:220:28)
at Listener._$onMessage (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/node_modules/deepstream.io-client-js/src/utils/listener.js:30:14)
at RecordHandler._$handle (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/node_modules/deepstream.io-client-js/src/record/record-handler.js:243:26)
at Client._$onMessage (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/node_modules/deepstream.io-client-js/src/client.js:128:42)
at Connection._onMessage (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/node_modules/deepstream.io-client-js/src/message/connection.js:322:17)
at emitOne (events.js:96:13)
at emit (events.js:188:7)
at TcpConnection._onData (/var/www/test.example.com/node_modules/deepstream.io-provider-search-rethinkdb/node_modules/deepstream.io-client-js/src/tcp/tcp-connection.js:180:7)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:172:18)
at Socket.Readable.push (_stream_readable.js:130:10)
at TCP.onread (net.js:542:20)
A query can accepts many operands, like the following example:
var queryString = JSON.stringify({
table: 'book',
query: [
[ 'title', 'match', '^Harry Potter.*' ],
[ 'price', 'lt', 15.30 ]
]
});
ds.record.getList( 'search?' + queryString );
But each opererand is executed like if they where seperated by a «AND» operator.
In SQL, this would results in the following WHERE clause: title LIKE 'Harry Potter%' AND price < 15.30
.
I would like to be able to combine two operands with an «OR» operator. Something like ['or', ['title', 'eq', 'A title'], ['price', 'lt', 15.30]]
would mean : Give me the list of all books that have 'A title' or that the price is less than 15.30.
When querying for initial results to a search condition, processing the results, then subscribing to changes, there is a small gap during which a change can slip in undetected. Fortunately, rethinkdb offers a convenient method to subscribe to both the changes and the initial dataset at the same time. We should use it since they've made it so simple to use.
Code will be supplied, per usual.
I installed deepstream in ubuntu with apt-get using storage of rethinkdb,
how can I use this plugin?
please help me.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.