mcserep / codecompass Goto Github PK

This project forked from ericsson/codecompass

CodeCompass is a software comprehension tool for large scale software written in C/C++ and Java

License: GNU General Public License v3.0

CMake 3.90% C 11.26% C++ 38.13% JavaScript 15.96% Python 0.05% Thrift 1.88% CSS 2.75% Java 10.55% Shell 1.31% HTML 0.78% Dockerfile 0.43% TypeScript 13.02% SCSS 0.01%

codecompass's People

Contributors

Watchers

codecompass's Issues

Refactor IncrementalStatus into ParserContext

Currently the file status changes related to incremental parsing are calculated in CppParser. Since this information may be utilized by all plugin parsers, the enum type IncrementalStatus and the std::unordered_map<std::string, IncrementalStatus> map of file status changes defined in CppParser should be refactored into the ParserContext type, thus all plugin parsers will be able to query status changes related to incremental parsing.

Calculate competence rate for folders

Folders are now not parsed and displayed as "just folders" in the competence diagram. A cumulated percentage should be calculated for each user according to their competence rate of the contained files.

Company-level competence diagram

Several (usually open source) projects are developed by multiple companies at once. It would be useful to see the contributions of each company.

Incremental cleanup on file level

Currently the incremental database cleanup is carried out in a single database transaction (per parser), which may lead to a creation of a large database rollback log, especially for the cppparser, where many tables have to be maintained.

Refactor the incremental cleanup (at least for cppparser) to use file-level transactions.

Rewrite self-declared competence rate from website

Self-declared ratio should fit session handling and multiple user data for each file.

No DB table and index creation on incremental parsing

Upon incremental parsing, a valid database already exists, so the main parser (parser/src/parser.cpp) show warnings on table and index creation, e.g.:

[INFO] Creating tables from file /home/mate/bin/cc/sqlite/share/codecompass/sql/cppedge-odb.sql
[WARNING] Exception when running SQL command: 1: table "CppEdge" already exists
...
[INFO] Creating indexes from file /home/mate/bin/cc/sqlite/share/codecompass/sql/cppheaderinclusion-odb.sql
[WARNING] Exception when running SQL command: 1: index CppHeaderInclusion_includer_i already exists

While it does not cause any issue, can be confusing and should be handled properly. Therefore the --incremental flag should be elevated from the cppparser (to parser.cpp), and:

if the flag is defined and the specified database exists, the database table and index creation could be omitted;
if the flag is defined, but the specified database does not exist, it should either be created (with tables and indices), or an error could be raised, aborting.

Read File entities and FileContent hashes from SourceManager cache

During incremental parsing when iterating through all File entities to check whether they still exist or their content hash changed, it is unnecessary to communicate with the database through ODB, since all relevant information is already cached in the SourceManager instance.
This could boost runtime execution performance on larger projects.

Handle workspace directory existance

In case the workspace directory exists, the CodeCompass parser fails with error and suggests forced reparsing. In case of incremental parsing the existence of the workspace directory is normal and should be handled.

Add legend to diagrams

Competence diagram for single files

Display a plain, simple diagram that shows the rate of understanding of a certain file.

Compute individual user competence

Calculate color code for every user in database

Now the color code of a user is a randomly generated hex code that is persisted into the database when a new user is found during parsing. The color code should be calculated by some formula using the email address or (if there is one) the username of the user. When an email address is connected to a user, the color code should be generated of the username from then on.

Determining already parsed files in CppParser

In the CppParser plugin, the BuildAction command hashes are cached in the constructor, into the parsedCommandHashes member. Later, this cache is used to determine which files were already parsed and which were not.
Since incremental parsing modifies the BuildAction entities, this caching should be either:

delayed until the database maintenance is carried out; or
parsedCommandHashes should also be maintained, removing invalid entries.

Detect added files

During incremental parsing, new added files should be detected. There are 3 possible solutions:

Not detecting added files, since parsing will detect and parse them by analyzing the build commands. (Investigate whether this is a valid issue, is this information really unimportant for other tasks too?)
Detect new files via filesystem parsing. Drawbacks: files not included in the compilations process might be detected; and the source code path prefix must be known (e.g. by a console parameter).
Detect new files via the build commands. In this case the compilation database should be already built for the incremental parsing file change detection (currently done in the worker() method), so some refactoring is required.

Compute file competence for every dev

Set self-declared competence rate in foreign projects

Now the user can only add custom ratio to those projects where they have already made at least one commit, thus they have at least one email address registered and a new record can be persisted to any file (or an existing record can be updated). When it is possible to assign an email address to the user on the website, foreign projects should be supported as well.

Add a threshold input parameter for commit history interval

Now commit history interval is 6 months fixed. It should be made an input parameter (preferably in months or days).

Team view

Display a diagram that shows a file or a directory with its files. The nodes are colored according to the user that is the "most competent" in each file.

Select all corresponding email addresses on website

The logged in user should be able to select all of their email addresses from the parsed data. It it a one-to-many relationship, as in one user should be able to have multiple email addresses but one email address should only belong to one user.

Threshold for incremental parsing

Incremental parsing is only efficient if a small part of the source code in a large (legacy) software project changed, and the cleanup and reparsing of the changed parts is faster than a full clean parse of the codebase.

Add a new option named --incremental-threshold to the CodeCompass_parser console binary to configure a threshold of acceptable change in the source code.

Below the threshold an incremental parsing should be performed.
Above the threshold a full forced parse should be performed.

The quantity of change in the source code can be measured by the number of changed files.
Files can be weighted by the lines of code they contain.

Refactor File Diagrams

Refactor file diagram model and creation as already mentioned in #2.

Open a new branch filediagram_refactor from master.
The CppNode struct should be refactored, the only supported domain should be FILE. Therefore the domain field can be omitted, the domainId field can be fileId and its type should be FileId (uint64_t).
The RelationCollector class should be refactored so that only file level relations (USE and PROVIDE) are created and persisted as CppNode and CppEdge entities. Accordingly the CppEdge::Type should be refactored and unused types (DEPEND and IMPLEMENT) can be omitted.
Update the service component of the cppparser (FileDiagram), so that directory level relations are calculated and then visualized on-demand.
Since by this refactoring process, the CppNode type is simplified to only contain a file ID and the CppNodeAttribute type is unused, both of them can be eliminated entirely. Then the CppEdge type may connect 2 File entities instead of CppNode.

Find decent formula for competence calculation

Now competence ratio is calculated from the months passed since a commit, and an exponential function is used to calculate the percentage for each user that committed. The interval of parsing of commit history is decidedly 6 months (yet). This results in non-representative data, for devs that committed even a couple months ago get the result of not remembering at all.

Generalize incremental parsing maintenance

All plugin parsers should be able to perform maintenance operations related to incremental parsing.

Extend the AbstractParser type with a virtual void maintain() {} method to support operations related to incremental parsing or other maintenance.
Refactor the incrementalParse() method of the CppParser type into the parser's maintain() method.
In parser.cpp, call the maintain() method for all parsers before calling the parse() method. The plugin parsers should be traversed in a reverse topological order, hence inverting the dependency relations defined between them for parsing.

Blocked by #8.

Manage CppNode and CppEdge entities on incremental parsing

During incremental parsing CppNode and CppEdge entities should also be managed, deleting invalidated entries related to be deleted File and CppAstNode entries.
Note that on parse new CppNode and CppEdge entities are created not only for the parsed file, thus more thorough deletion or refactoring might be required.

Implement incremental parsing

Now incremental parsing in this plugin means throwing away all present data and reparsing the entire file system. Only actually changed files should be parsed when an incremental parsing is executed.

Competence diagram for directories

Display a diagram that shows the rate of understanding of each file in a directory. Since understanding will be shown by color codes, a diagram structure should be figured out, preferably something that doesn't include files other than the current directory.

Maintain SourceManager on incremental parsing

The File and FileContent cache of the SourceManager should also be maintained (or even reloaded) after the incremental parsing was carried out.

Parse commit history of every file in project

Find the git folder in the project. Iterate through every file and walk through its entire commit history.

Implement parallel parsing

Now parsing is executed on one thread. Individual file parsing is easily feasible, since file commit histories are independent of each other. Parsing should be done on multiple threads.

Extend MetricsParsers to handle incremental parsing

Extend MetricsParser with a maintain() method and implement incremental parsing maintenance functionalities for the Metrics database table. MetricsParser should be dependent on CppParser if the latter manages the maintenance of the File entities.

Blocked by #9.

Parallel incremental cleanup

Currently incremental cleanup is done sequentially on a single processor core, not utilizing the possible capabilities of a machine. Implement a the cleanup in a parallelized manner.
Pay attention to the dependencies between the cleanup jobs, since files should be cleaned up in a topological order.

Detect build system changes

In case the build system changes (the build commands change), incremental parsing cannot be applied - at least currently.

Therefore incremental parsing should be capable to detect changes in the build commands and reject the parsing in such occurrence. (Dry-run.)

User handling

Add user and session handling in order to make the competence parser and able to determine which blame hunks belong to the current user.

mcserep / codecompass Goto Github PK

codecompass's People

Contributors

Watchers

codecompass's Issues

Recommend Projects

Recommend Topics

Recommend Org