Code Monkey home page Code Monkey logo

ldbc_snb_docs's People

Contributors

alexaverbuch avatar arnauprat avatar imbur avatar jackwaudby avatar jmarton avatar marci543 avatar mkaufmann avatar szarnyasg avatar wangzk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ldbc_snb_docs's Issues

modify query 11 description

suggest modifying description to make sort/order section more clear.
at present it is not clear if the third sort criteria is: (1) Organization Name Ascending (2) Organization Name Descending.

Suggest Descending (no reason).

current version:

Given a start Person, find that Person's friends and friends of friends (excluding start Person) who started Working in some Company in a given Country, before a given date (year).
Return top 10 Persons, the Company they worked at, and the year they started working at that Company.
Sort results ascending by the start date, then ascending by Person identifier, and lastly by Organization name

suggested version:

Given a start Person, find that Person's friends and friends of friends (excluding start Person) who started Working in some Company in a given Country, before a given date (year).
Return top 10 Persons, the Company they worked at, and the year they started working at that Company.
Sort results ascending by the start date, then ascending by Person identifier, and lastly by Organization name descending

LDBC SNB Query 6 description modification

current description is ambiguous. suggest changing from:

Given a start Person and some Tag, find the other Tags that occur together with this Tag on Posts that were created by start Person's friends and friends of friends (excluding start Person).
Return top 10 Tags, and the count of Posts that were created by these Persons, which contain this Tag.
Sort results descending by count, and then ascending by Tag name.

to:

Given a start Person and some Tag, find the other Tags that occur together with this Tag on Posts that were created by start Person's friends and friends of friends (excluding start Person).
Return top 10 Tags, and the count of Posts that were created by these Persons, which contain both this Tag and the given Tag.
Sort results descending by count, and then ascending by Tag name.

Data schema in documentation doesn't match datagen results - missing url field

Hi,
The data generator results have a url field for Organisations and Places. This information is missing from both Figure 2.1 and the Tables 2.4 and 2.6.

Also, you might want to mention that inheritence is implemented in the data generator by adding a "type" field for places and organizations. I found out this information only after writing a considerable amount of queries against a schema constructed according to the documentation, only to find out that the data doesn't match and now I have to choose between rewriting the queries or splitting the files to different entity types with some very ugly code.

Cheers,
Tomer

LDBC SNB Interactive Query 5 description needs updating

--- PROBLEM ---
Current description is ambiguous, it is as follows (with _EMPHASIS_ on ambiguous section):

"Given a start Person, find the Forums which that Person's friends and friends of friends (excluding start Person) became Members of after a given date.
Return top 20 Forums, and the number of Posts in each Forum that was Created by _ANY OF THESE PERSONS_.
Sort results descending by the count of Posts, and then ascending by Forum name."

Ambiguity is about: exactly which persons are covered by "ANY OF THESE PERSONS"?

Assume the following:

  • PERSON_A joined FORUM_A after given date
  • PERSON_A joined FORUM_B before given date
  • PERSON_B joined FORUM_A before given date
  • PERSON_B joined FORUM_B after given date

When counting posts made in FORUM_B, should we include those made by PERSON_A?

Which of the following should we count:
(1) posts by PERSON_A and PERSON_B for FORUM_A, posts by PERSON_A and PERSON_B for FORUM_B
(2) posts by PERSON_A for FORUM_A, posts by PERSON_B for FORUM_B

Note, this has no affect on which forums are returned, only on the post counts associated with each returned forum.

--- SOLUTION ---

Peter & Andrey both preferred option (2).
Given that, I propose the following description:

"Given a start Person, find the Forums which that Person's friends and friends of friends (excluding start Person) became Members of after a given date.
Return top 20 Forums, and the number of Posts in each Forum that was Created by any of these Persons - for each Forum consider only those Persons which joined that particular Forum after the given date.
Sort results descending by the count of Posts, and then ascending by Forum name"

Output spec in Query 10 field order doesn't fit driver method signature

Hi,
In Q10 the spec reads:

Results:
Person.id ID
Person.firstName String
Person.lastName String
Person.gender String
Person-isLocatedIn->Place.name Sting
similarity 32-bit Integer

While the method signature for LdbcQuery10Result is:

 public LdbcQuery10Result(long personId, String personFirstName, String personLastName, int commonInterestScore, String personGender, String personCityName)

Thanks
Tomer

DB schema denormalized and for no apparent reason

Hi,
Person and Message entities both have non 1NF properties. Namely email, speaks and content.
This has no apparent justification.
IMHO It would be more prudent to create entities from these items and connect to them via edges. The data generator supports this, it's just a matter of semantics, but it breaks conventions and makes things non-standard when accessing these properties.
Also, the *content property of *Message is not really a collection of texts per-se. At least not the version created by the data generator. Consider making it a simple string in the class diagram (Figure 2.1) rather than Text[0..1].

Please consider.
Tomer

Clarification of query 12 needed regarding what tags to output

Hi @alexaverbuch and @tomersagi

According to query 12's description, the list of tags of posts to output in the result set seems to not be restricted to only those present in the input tagclass or any subclass. However, in the validation dataset provided at ldbc-dev, it seems that only those tags belonging to that subset are actually output, while the rest are omitted. Should this be clarified in the query's description or is the validation set incorrect? What do you think?

Thanks

Arnau

Update query behavior upon missing vertices is not well defined

Hi,
When I try and implement Update Query 1, I need to add edges between the newly created person and a city, universities, companies etc.

  1. Whether or not these can be assumed to exist is not specified.
  2. Behavior if they do not exist is not specified.

Thanks,
Tomer

Minor mistakes in v 0.1.1

Great work on the spec document.
There are some minor mistakes.
In page 10:
"These systems will can potentially implement the three workloads, though Interactive and Business Intelligence workloads are where they will presumably be more competitive."
Should be:
"These systems can potentially implement the three workloads, though Interactive and Business Intelligence workloads are presumably more competitive."
In page 11, first point: clsuter -> cluster
In page 23 section 2.2.4: For example, SF 100 consists of the activity of a social network of 182K users should be SF 30.
In page 30 there is a missing table reference under Load definition"

Looking forward for the next versions.
Tomer

add D333_4 and D223 to this repository?

I think both of those documents provide valuable information about the workload, that others would find valuable, it would be nice to have these docs easily found all in one place, I propose that one place to be here

[Page26] Query1 studyAt points to University not to Company

In the query 1 results there is the following typo:

Person-studyAt->Company.name
...
Person-studyAt->Company-isLocatedIn->City.name

and it should be:

Person-studyAt->University.name
...
Person-studyAt->University-isLocatedIn->City.name

Clarify in queries on how to decide if "content" or "image" should be returned

In queries that return Posts and/or Comments the spec says to return "content" OR "image", but does not say how to know which one should be returned.
If every Post and Comment have content (which is sometimes just an empty string) then it is not possible to do this test based on existence of attribute, because it always exists and perhaps empty string is in fact a valid content value.

Need to clarify and document this.

E.g. Query 2
Given a start Person, find (most recent) Posts and Comments from all of that Person's friends, that were created before (and including) a given date.
Return the top 20 Posts/Comments, and the Person that created each of them.
Sort results descending by creation date, and then ascending by Post identifier.

Parameter

  • Person.id
  • maxDate

Result (for each result return)

  • Person.id
  • Person.firstName
  • Person.lastName
  • Post.id/Comment.id
  • Post.content/Post.imageFile/Comment.content
  • Post.creationDate/Comment.creationDate

Some typos in v.0.1.3

Hi,
Some typos in the new version:
Section 2.2.2
page 16 "Forum: Is forum represents..." should be "Forum: Represents..."
page 17 under "Person": "contains several information" should be "contains information"
page 18 in the Relations table:
hasModerator row: "A Forum and his moderator" should be "A Forum and its moderator"
isLocatedIn row: "A Person and its home City" should be "A Person and their home City"

Keep up the good work
Tomer

Query 5 Sort Order

Given that the Titles of Forums are not unique, the sort order of Query 5 results is insufficient.

Current description is:

    Given a start Person, find the Forums which that Person’s friends and friends of friends 
    (excluding start Person) became Members of after a given date. 
    Return top 20 Forums, and the number of Posts in each Forum that was Created by any of these Persons - 
    for each Forum consider only those Persons which joined that particular Forum after the given date. 
    Sort results descending by the count of Posts, and then ascending by Forum name

Proposed modification (order by Forum.ID instead of Forum.TITLE):

    Given a start Person, find the Forums which that Person’s friends and friends of friends 
    (excluding start Person) became Members of after a given date. 
    Return top 20 Forums, and the number of Posts in each Forum that was Created by any of these Persons - 
    for each Forum consider only those Persons which joined that particular Forum after the given date. 
    Sort results descending by the count of Posts, and then ascending by Forum identifier

Not clear if all comments should always be in reply to something

In the schema diagram cardinality of replyOf edge is 0..* so comments may not be in reply to anything.
but in update query 7 you have to have either a replyOf Post or a reply of comment. Also, what happens if both are -1? logic gets circular at some point...

LDBC SNB Interactive Query 12 description

Clarify whether we consider (1) only Comments that are direct (1-hop) replies to Posts or (2) the transitive case too, Comments that are part of a conversation that eventually ends up at a Post.

Current description:

"Given a start Person, find the Comments that this Person's friends made in reply to Posts.
Only consider Posts with a Tag in a given TagClass or in a descendent of that TagClass.
Count the number of these reply Comments, and collect the Tags that were attached to the Posts they replied to.
Return top 20 Persons, the reply count, and the collection of Tags.
Sort results descending by Comment count, and then ascending by Person identifier"

No suggested solution yet, as a discussion needs to take place first.

Query14 description is ambiguous regarding weighted/unweighted paths

proposed change (Arnau, this is version that both you and Mirko preferred, it is the version I sent to you on Skype):

Given two Persons, find all (unweighted) shortest paths between these two Persons, in the subgraph induced by the Knows relationship. Then, for each path calculate a weight.
The nodes in the path are Persons, and the weight of a path is the sum of weights between every pair of consecutive Person nodes in the path.
The weight for a pair of Persons is calculated such that every reply (by one of the Persons) to a Post (by the other Person) contributes 1.0, and every reply (by ones of the Persons) to a Comment (by the other Person) contributes 0.5.
Return all the paths with shortest length, and their weights.
Sort results descending by path weight.

Clarification required for Short Query 2

Hi,
From the description (in tex) it isn't clear whether to return messages which are not replies to another message and if we are to return them, what values to put in the original message fields.
Thanks
Tomer

Schema diagram needs updating: Message/Comment/Post

At present, Post does not have "length" attribute but Comment does.
I think length should be placed in Message so both Comment and Post have it.
Also, both Comment and Post both seem to have "content", so it should be moved to Message too.

Does the following make sense?

Message

  • date :DateTime
  • browser: String
  • ip: String
  • length: int
  • content: Text

Post

  • language: String[0..1]
  • image: String[0..1]

Comment

Query 1 return types do not match driver expected return types

hi,
According to: [https://github.com/ldbc/ldbc_driver/blob/master/src/main/java/com/ldbc/driver/workloads/ldbc/snb/interactive/LdbcQuery1Result.java]
The birthday and creation date should be long, but according to the documentation they should be Date and DateTime respectively.
If this is indeed to be long, what long are you expecting?
Thanks

Q4 description modification

Suggest updating Q4 description from:

4. New topics

Given a start Person, find Tags that are attached to Posts that were created by that Person’s
friends. 
Only include Tags that were attached to Posts created within a given time interval, and that were
never attached to Posts created before this interval. 
Return top 10 Tags, and the count of Posts, which were created within the given time interval, that this Tag was attached to. 
Sort results descending by Post count, and then ascending by Tag name.

To:

4. New topics

Given a start Person, find Tags that are attached to Posts that were created by that Person’s
friends. 
Only include Tags that were attached to friends' Posts created within a given time interval, and that were
never attached to friends' Posts created before this interval. 
Return top 10 Tags, and the count of Posts, which were created within the given time interval, that this Tag was attached to. 
Sort results descending by Post count, and then ascending by Tag name.

The only change is the addition of friends' in two places

more documentation regarding the practical parts of validation

we should document where the validation/debugging (a smaller version of validation?) datasets are.
some will go in ldbc_driver documentation, but perhaps we should write a little more about how to get started, how to debug an implementation, where to find a small/toy dataset (and validation parameters) to do that with, etc.

LDBC SNB Interactive Query 7 description modification

The description is not very clear regarding the data that should be returned.

Currently it is:

Given a start Person, find (most recent) Likes on any of start Person's Posts/Comments.
Return top 20 Persons that Liked your Post/Comment, the Post/Comment they liked, the Like, and the latency between creation of Post/Comment and Like.
Additionally, return a flag indicating whether the liker is a friend of start Person.
Sort results descending by creation time of Like, and then ascending by Person identifier of liker.

Suggest changing it to:

Given a start Person, find (most recent) Likes on any of start Person's Posts/Comments.
Return top 20 Persons that Liked any of start Person's Posts/Comments, the most recent Post/Comment they liked, creation date of that Like, and the latency (in minutes) between creation of Post/Comment and Like.
Additionally, return a flag indicating whether the liker is a friend of start Person.
Sort results descending by creation time of Like, and then ascending by Person identifier of liker.

LDBC SNB Interactive Query 14 description - no path found case

The current description says nothing about what should be returned in the case when start person = end person.

Suggestion:

  • for the list of person IDs representing the path, return a list containing only 1 element, the ID of the start/end person
  • for weight return 0, even if person has commented on their own comments/posts

Current description:

"Given two Persons, find all weighted paths of the shortest length between these two Persons in
the subgraph induced by the Knows relationship.
The nodes in the path are Persons.
Weight of a path is sum of weights between every pair of consecutive Person nodes in the path.
The weight for a pair of Persons is calculated such that every reply (by one of the Persons) to a Post (by the other Person) contributes 1.0, and every reply (by ones of the Persons) to a Comment (by the other Person) contributes 0.5.
Return all the paths with shortest length, and their weights.
Sort results descending by path weight."

Suggested solution:

"Given two Persons, find all weighted paths of the shortest length between these two Persons in
the subgraph induced by the Knows relationship.
The nodes in the path are Persons.
Weight of a path is sum of weights between every pair of consecutive Person nodes in the path.
The weight for a pair of Persons is calculated such that every reply (by one of the Persons) to a Post (by the other Person) contributes 1.0, and every reply (by ones of the Persons) to a Comment (by the other Person) contributes 0.5.
In the unlikely case that start and end are the same Person, weight is 0.
Return all the paths with shortest length, and their weights.
Sort results descending by path weight."

Suggesting clarification of short query 6

"Given a Message (Post or Comment), retrieve the Forum that contains it and the Person that moderates that forum." add: "Since comments are not directly contained in forums, for comments, return the forum containing the original post in the thread which the comment is replying to. "

LDBC SNB Interactive Query 8 description

Clarify whether we consider (1) only immediate replies or (2) the transitive case too.
Outcome from discussion on Skype channel is that we go with case (1).

Current description:

"Given a start Person, find (most recent) Comments that are replies to Posts/Comments of the start Person.
Return the top 20 reply Comments, and the Person that created each reply Comment.
Sort results descending by creation date of reply Comment, and then ascending by identifier of reply Comment."

Suggested description:

"Given a start Person, find (most recent) Comments that are replies to Posts/Comments of the start Person. Only consider immediate (1-hop) replies, not the transitive (multi-hop) case.
Return the top 20 reply Comments, and the Person that created each reply Comment.
Sort results descending by creation date of reply Comment, and then ascending by identifier of reply Comment."

More information:

The question was, do we consider only immediate replies, or the transitive case too?
Should we consider a reply to a reply to a comment/post by start person?
Imagine this:
Person-created->Post<-reply-CommentA<-reply-CommentB<-created-PersonA
Person-created->Post<-reply-CommentA<-created-PersonB

should CommentA be returned?
should CommentB be returned?

The outcome was that CommentB should not be returned.

Typo and Grammer issues in Q14

According to [http://grammarist.com/usage/people-persons/] you should use people as plural of person. Also there is a typo "ones of the person" should be "one person" or if you insist: "one of the people".

What can we assume about the entity id's generated?

Hi,
A couple of questions:

  1. Are the id's unique globally or only within an entity type?
  2. For data load purposes, it would probably be better if the csv files came sorted by id, can you add that (maybe as an option) to the datagen configuration? Since this is already running as a hadoop job, it should be easier than anything we add in the importers.
    Thanks

What does query 14 return when no paths exist?

Hi,
While Q13 addresses the limit conditions of same person and no path exists between the two people supplied, Q14 doesn't. Shall I assume the same as Q13 ([],0) and ([],-1) or should I calculate something else for these cases?
Thanks

[Page28] Query7 isNew issue

What happens in the following case:

  • Person B likes the post in creationTime X
  • Person B-knows->startPerson in creationTime X + 2

The variable isNew should be false or true?

LDBC SNB Interactive Query 3 description upgrade

The current description is not explicit about which posts/comments should be counted.
Suggest changing the description from this:

Given a start Person, find Persons that are their friends and friends of friends (excluding start Person) that have made Posts/Comments in the given Countries X and Y within a given period.
Only Persons that are foreign to Countries X and Y are considered, that is Persons whose Location is not Country X or Country Y.
Return top 20 Persons, and their Post/Comment counts.
Sort results descending by total number of Posts/Comments, and then ascending by Person identifier.

To this:

Given a start Person, find Persons that are their friends and friends of friends (excluding start Person) that have made Posts/Comments in the given Countries X and Y within a given period.
Only Persons that are foreign to Countries X and Y are considered, that is Persons whose Location is not Country X or Country Y.
Return top 20 Persons, and their Post/Comment counts, in the given countries and period.
Sort results descending by total number of Posts/Comments, and then ascending by Person identifier.

Cypher version of Q3 on page 64 doesn't match the query description on page 26

Cypher:

MATCH (person:Person {id:{person_id}})-[:KNOWS*1..2]-(friend:Person)<-[:HAS_CREATOR]-
(postX:Post)-[:IS_LOCATED_IN]->
(countryX:Country)
WHERE countryX.name={country_x} AND
postX.creationDate>={min_date} AND
postX.creationDate<={max_date}
WITH friend, count(DISTINCT postX) AS xCount
MATCH (friend)<-[:HAS_CREATOR]-(postY:Post)-[:IS_LOCATED_IN]->
(countryY:Country {name:{country_y}})
WHERE postY.creationDate>={min_date} AND postY.creationDate<={max_date}
WITH friend.firstName + ’ ’ + friend.lastName AS friendName ,
xCount, count(DISTINCT postY) AS yCount
RETURN friendName, xCount, yCount, xCount + yCount AS xyCount
ORDER BY

Required output:

Person.id ID
Person.firstName String
Person.lastName String
countx 32-bit Integer // number of Posts/Comments from Country X made by Person
within the given time
county 32-bit Integer // number of Posts/Comments from Country Y made by Person
within the given time
count 32-bit Integer // countx + county

modify Query 10 description

propose to leave query description unchanged but to change parameters section of description for query 10:

current version is:

Parameters:
Person.id ID
month1 32-bit Integer // between 1-12
month2 32-bit Integer // month1 + 1, but 12 + 1 = 1

propose changing to:

Parameters:
Person.id ID
month 32-bit Integer // between 1-12

Vendor implementations will then simply do endMonth=startMonth+1%12

Reason for suggestion is description already says:
"Given a start Person, find that Person’s friends of friends, who were born on or after the 21st of a given month (in any year) and before the 22nd OF THE FOLLOWING MONTH..."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.