elbraulio / ros_gh Goto Github PK
View Code? Open in Web Editor NEWAnswerers Recommendation System for ROS Answers
License: MIT License
Answerers Recommendation System for ROS Answers
License: MIT License
with small doubles the rank returns NaN, so it should be replaced as 0.
Double.isNaN(rank) ? 0d : rank;
also Aspirant
is returning wrong ranking, must be
this.ka*0.75 + this.da*0.25;
the correct Math is
private double vectorSpace(int i, int j, double[][] rut) {
double sum = 0d;
double length = 0d;
for (int k = 0; k < rut[i].length; k++) {
sum += rut[i][k] * rut[j][k];
length += rut[i][k];
}
return sum/length;
}
must be normalized by doc length like this
length += rut[i][k];
}
return sum/length;
we want to replicate Devrec described in this paper. It will be implemented on the examples
branch using tools committed on master
branch. There is an important difference between Zhang et al. implementation and ours, it is that we are looking for someone to answers a question instead of participating on a project.
All the following quotes were extracted from the original paper.
ros_gh
and available here.UP Connector: This part is to create the association matrix of users and projects based on the activities in GitHub. Here we get a two-value matrix Ru−p, where 1 stands for participation and 0 stands for the opposite.
User Connector: This part is to calculate the association between users based on the user project association matrix using Jaccard algorithm.
Match Engine: In this part, we calculate the association between users and projects according to the user association matrix Ru−u
. If we use UAp⟨u1,u2,...,un⟩
to represent users that have already participated in the target project p, we can obtain the match score of each user towards project p using:
U{u1,u2,...,un}
to represent users in StackOverflow, Tu = {t1,t2,...,tn}
to represent the tags that related to user u
, and C(t,u)
to represent the number of times tag t
relates to user u
. Then we can calculate user tag association matrix using User Connector: After obtaining the user tag association matrix Ru−t
, we calculate the association of users using Vector Space Similarity algorithm.
Match Engine: The same as the match engine part in DA-based approach.
Launcher
is too general, it might be named as DataExtraction
or similar.
check the first aspirant who math at least the half of tags from question.
Also rename DefaultAccuracy to StricAcuracy
the result of querying how many times a given tag is related to all users has different results if it is queried for each user using FetchTagCount
compared with using a single query:
select sum(count) as count
from ros_user_tag
where ros_user_tag.ros_user_id in (select ros_user_id from linked_users) and
ros_user_tag.ros_tag_id = ?;
There are 1277 tags that looks cut, e.g. :
id | name |
---|---|
5 | turtlebot_dash... |
9 | turtlebot_cali... |
206 | message_genera... |
263 | installation_e... |
288 | camera_calibra... |
305 | sicktoolbox_wr... |
306 | xv_11_laser_dr... |
348 | trajectory_fil... |
You can get the complete list with this query:
select *
from ros_tag
where name like '%...'
fix bugs and devrec implementation.
Add this badge to README.md
to receive thanks from users.
it is not suitable to have examples within the project because they have not have unit test and increments the project complexity. These tools must be provided by the project but without containing examples inside ignored test. This packages must be excluded from the main project and can be included on a examples branch:
launcher
GithubInfoTest
ignored test from FetchUsersPageListTest
FetchAnswersTest
IteratePagedContentTest
ParticipantsTest
it seems that they are not being used, then they must be removed.
artifact id : rosgh
group id : com.elbraulio
the script for reset db used in test are not windows Multiplatform compatible. Then those test fail when they run on windows.
When indigo or hydro are being processing , it throws an exception for the ar_sys package.
Will bring more details in future ...
Like builds, all we want to know how the extract process made success or failed. It is important to have logs in order to identify errors and possible missing data cases.
actually is sorting asc. Even the test sort asc.
it should be
<dependencies>
<dependency>
<groupId>com.elbraulio</groupId>
<artifactId>ros_gh</artifactId>
<version>{version}</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
README.md must include all versions of data extracted. This info can be added at the bottom of the document with change review.
currently Devrec is out of artifact domain com.elbraulio.rosgh
. That is confusing when you import this as
import examples.Devrec
when it should be
import com.elbraulio.rosgh.example.Devrec
update the LICENSE file up to 2019.
I need more data from the ROS Answers' user, specifically:
karma
joined_at
last_seen_at
location
has_avatar (or the url to the avatar and NULL if it has the default)
description
real name (if it exist)
age
badges (and its count)
I'm particularly interested in the first 4, so if it's it require more effort to get the rest of the list I'd already be happy with havine just those four (karma, joined_at, last_seen_at, location).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.