ualberta-smr / merganser Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 11.66 MB

Merganser is a scalable and extendable tool for analyzing merge scenarios in git repositories

Home Page: https://ualberta-smr.github.io/merganser/

License: MIT License

Python 99.53% Shell 0.47%

dataset merge-conflicts mysql python software-development software-engineering software-merging software-research

merganser's People

Contributors

Stargazers

Watchers

Forkers

prga

merganser's Issues

Logging not working main.py

Add warning about default number of threads used in ReadMe

Pass start date + Check if repository already exists before cloning

It would be good if there's a parameter to indicate the starting date of the mining (i.e., mine commits after date X). In that case, we should check if the repo is already cloned and if it is, we shouldn't clone again.

Revise wording

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/GitUtil.py#L138

Add Contributor list to ReadMe

I would suggest that you encourage people to use Issues for bugs or even questions rather than to email us, since this allows better tracking of things.

I would also suggest adding a Contributors list, where you can have both our names (with yours first as the main author) and contact information.

Change comment

Maybe change to "Temp variables for reading username and repo"

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/main.py#L71

Package Installation

The current code is using setup.py which helps us to install the package py pip. Another option would be using docker.

Do we need to have a docker file too?

Should compile not check code style

Please check this line:

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/merge_replay.py#L77

Add a comment explaining reason for or

Document reason for or (mentioning the weird scenario you ran to if you can look it up)

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/code_quality.py#L28

Log which projects don't have test suites

When running the tests, a project may be using maven but has no test suites. Such projects should be clearly marked. E.g., -1 as the value of the passing test or some other field marked for the project

Log failure and store -2 in case of problems with build or tests

If you could not run the build or test for any reason, or the repo doesn't support mvn, then store -2 (And update documentation in ReadMe or data schema description) and log a failure

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/code_quality.py#L33

Quotes for search query?

For the -q flag for the search module, what if you have multiple search terms, should you use quotes? If so, clarify in instructions/example.

Fix typo in comment

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/validation.py#L18

Encoding Error in Extracting Conflicting Regions

There is encoding error while parsing the output of git to extract the conflicting regions since some commit messages use non-ASCII characters.

What are all the prediction config parameters?

What are these:

PREDICTION_CSV_PATH	The directory path for storing the data of conflict prediction
PREDICTION_CSV_DATA_NAME	The file name of the data of conflict prediction
PREDICTION_CSV_LABEL_NAME	The file name of the labels of conflict prediction

They are not described anywhere?

What happens when the MAX parameters are not set?

I'm assuming that if all the max parameters (e.g., max number of days, merge scenarios etc.) are not set, that all the merge scenarios in a given repo are analyzed. Is that correct?

Is it a single repos list or multiple ones?

The wiki page says "The directory path for reading the list of repositories, as *.txt files", which suggests that you may have multiple input list files for the set of repos to analyze. In the ReadMe instructions, it suggests that there is only one list

Why is dataset a tag for the repo?

Verify what git branch --contains returns

I think the command may return multiple branches and you only consider the first one. It would be good to verify if it does indeed return multiple branches and if for our purposes, considering the first branch is enough

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/GitUtil.py#L108

Use verbs for function names

validation_repository_name --> validate_repository_name

Please check all code.

https://github.com/ualberta-smr/code-owhadi-msr19/blob/1724d8a2ff6b334c0775a8498359ce892249d320/merge_excavator/validation.py#L5