The ros_gh from elbraulio

device rank

with small doubles the rank returns NaN, so it should be replaced as 0.

Double.isNaN(rank) ? 0d : rank;

also Aspirant is returning wrong ranking, must be

this.ka*0.75 + this.da*0.25;

RuuKA math error

the correct Math is

    private double vectorSpace(int i, int j, double[][] rut) {
        double sum = 0d;
        double length = 0d;
        for (int k = 0; k < rut[i].length; k++) {
            sum += rut[i][k] * rut[j][k];
            length += rut[i][k];
        }
        return sum/length;
    }

ruuKa wrong vector space math

must be normalized by doc length like this

            length += rut[i][k];
        }
        return sum/length;

devrec implementations as example

we want to replicate Devrec described in this paper. It will be implemented on the examples branch using tools committed on master branch. There is an important difference between Zhang et al. implementation and ours, it is that we are looking for someone to answers a question instead of participating on a project.

All the following quotes were extracted from the original paper.

Data extraction

we use this data already extracted with ros_gh and available here.

Developer Recommendation Based on Social Coding Activities

UP Connector: This part is to create the association matrix of users and projects based on the activities in GitHub. Here we get a two-value matrix Ru−p, where 1 stands for participation and 0 stands for the opposite.
User Connector: This part is to calculate the association between users based on the user project association matrix using Jaccard algorithm.
Match Engine: In this part, we calculate the association between users and projects according to the user association matrix Ru−u. If we use UAp⟨u1,u2,...,un⟩ to represent users that have already participated in the target project p, we can obtain the match score of each user towards project p using:

Developer Recommendation Based on Knowledge Sharing Activities

Relation Creator: In this part, we calculate the user tag association matrix. Here we use TF-IDF method. If we use U{u1,u2,...,un} to represent users in StackOverflow, Tu = {t1,t2,...,tn} to represent the tags that related to user u, and C(t,u) to represent the number of times tag t relates to user u. Then we can calculate user tag association matrix using

User Connector: After obtaining the user tag association matrix Ru−t, we calculate the association of users using Vector Space Similarity algorithm.
Match Engine: The same as the match engine part in DA-based approach.

Launcher examples must be named as what it represents

Launcher is too general, it might be named as DataExtraction or similar.

version on readme wrong link

it should be

and Javadoc should be

accuracy light

check the first aspirant who math at least the half of tags from question.

Also rename DefaultAccuracy to StricAcuracy

wrong tag counting in devrec

the result of querying how many times a given tag is related to all users has different results if it is queried for each user using FetchTagCount compared with using a single query:

select sum(count) as count
from ros_user_tag
where ros_user_tag.ros_user_id in (select ros_user_id from linked_users) and
      ros_user_tag.ros_tag_id = ?;

Some tag names in 'ros_tag' table are cut

There are 1277 tags that looks cut, e.g. :

id	name
5	turtlebot_dash...
9	turtlebot_cali...
206	message_genera...
263	installation_e...
288	camera_calibra...
305	sicktoolbox_wr...
306	xv_11_laser_dr...
348	trajectory_fil...

You can get the complete list with this query:

select *
from ros_tag	
where name like '%...'

release 0.1-beta.1

fix bugs and devrec implementation.

add say thanks badge to README.md

Add this badge to README.md to receive thanks from users.

https://saythanks.io

examples must be excluded from the project scope

it is not suitable to have examples within the project because they have not have unit test and increments the project complexity. These tools must be provided by the project but without containing examples inside ignored test. This packages must be excluded from the main project and can be included on a examples branch:

are extras folders still important to the project?

it seems that they are not being used, then they must be removed.

change artifact id and group id

artifact id : rosgh
group id : com.elbraulio

reset db script is not Multiplatform compatible

the script for reset db used in test are not ~~windows~~ Multiplatform compatible. Then those test fail when they run on windows.

NullPointerException when requesting some GitHub repos

When indigo or hydro are being processing , it throws an exception for the ar_sys package.
Will bring more details in future ...

Logs for data extraction

Like builds, all we want to know how the extract process made success or failed. It is important to have logs in order to identify errors and possible missing data cases.

ByRankDesc does not order desc

actually is sorting asc. Even the test sort asc.

update dependencies

maven example on Readme is wrong

it should be

<dependencies>
    <dependency>
        <groupId>com.elbraulio</groupId>
        <artifactId>ros_gh</artifactId>
        <version>{version}</version>
    </dependency>
</dependencies>

<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

links to download data already extracted

README.md must include all versions of data extracted. This info can be added at the bottom of the document with change review.

move Devrec example into artifact domain

currently Devrec is out of artifact domain com.elbraulio.rosgh. That is confusing when you import this as

import examples.Devrec

when it should be

import com.elbraulio.rosgh.example.Devrec

up to 2019!

update the LICENSE file up to 2019.

Is it possible execute the scrapper to get an updated sample and also more data from the ROS User (e.g. karma, last_seemt_at, etc.) ?

I need more data from the ROS Answers' user, specifically:

karma
joined_at
last_seen_at
location
has_avatar (or the url to the avatar and NULL if it has the default)
description
real name (if it exist)
age
badges (and its count)

I'm particularly interested in the first 4, so if it's it require more effort to get the rest of the list I'd already be happy with havine just those four (karma, joined_at, last_seen_at, location).

elbraulio / ros_gh Goto Github PK

ros_gh's People

Contributors

Watchers

Forkers

ros_gh's Issues

Data extraction

Developer Recommendation Based on Social Coding Activities

Developer Recommendation Based on Knowledge Sharing Activities

Recommend Projects

Recommend Topics

Recommend Org