e-shreve / rulecheck Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 0.0 354 KB

Rulecheck provides a system to check C, C++, C#, and Java code against custom static rules.

Home Page: https://github.com/e-shreve/rulecheck

License: MIT License

Python 96.14% C 3.86%

srcml static-analysis static-code-analysis code-style

rulecheck's Issues

Standard rule setting failure handling

Document/support rules failing config (bad settings).
Maybe they should print their use/help on error? Or maybe they should provide a method to return that text and loadRules would print it on exception from init of the rule?

Create test module to help rule authors test their rules.

They should be able to specify:
The rule class for rulechecktest to create
A file for input
Maybe support taking in an already created srcml file
Settings dictionary for their rule
Get a list of log calls the rulemade
Run standard set of asserts on their rule creation (does it do all the correct things on object creation?)
Provide help text

Add type checking to log wrapper method

To help catch errors made by rule authors, consider having log wrapper method check type of inputs and throw exception if not correct input.

Output format not automatically parseable by GCC parsers

GCC's output is slightly different in that it adds spaces after some colons:
hello.cpp:4:1: error: ‘returna’ was not declared in this scope

Update the log output to add these spaces so IDEs and other tools that look for errors/warnings in tool output will be more likely to pick up rulecheck's output.

Provide standard way for rules to provide help text

The system should be able to print help text for any rule. A standard way for rules to report help text (or standard method run and it may then print text) will be needed in addition to a command line argument that takes the rule name for which to display help information.

Support a per rule werror setting

Implement such that each rule implementation doesn't have to handle this on its own. Warnings reported on a rule would be promoted to errors automatically by the system. werror setting would be specified in the config json.

Rulecheck doesn't load all xml children on start events

The element passed to a start visitor may not have all children loaded.

Additional integration test

Have integration test parse output, using a any_other rule that logs every xml element with a visit_line that logs every line and show that the reported position always increases

"Mute after n" option

"Mute after n" option to print summary for a rule on a file on n+1th message from a rule (to reduce log size)

Support ignoring rule based on a source comment

Support ignore based on line comment.

Possibly like:
// NORCNEXTLINE(rulepack.myrule)
would ignore rulepack.myrule on the line following the comment.
// NORC(rulepack.myrule)
would ignore rulepack.myrule on the line the comment appears on.

Technically, the implementation does not have to restrict its search for the keywords to comments. It may do a plaintext search on a line. This will be simpler and should be good enough.

Any whitespace between start of comment and the keyword 'rulecheck' would be allowed.

Multiple rules may be specified using comma as a separator:
// NORC(rulepack.rule1, rulepack.rule2)

The following would ignore all violations by any rule:
// NORC(*)
// NORCNEXTLINE(*)

Text after NORC/NORCNEXTLINE except that in immediately following parenthesis will be ignored. That way the comment may contain a reason/rationale for the disabling of the rule.

Unrecognized rules will be ignored as the user may simply not be executing rulecheck with the rule activated.

Syntax is inspired by NOLINT and NOLINTNEXTLINE comments for clang-tidy: https://clang.llvm.org/extra/clang-tidy/.
A consistent syntax will help users if they are using both tools.

If srcml's position's line information decrements, rulecheck will report wrong line information

srcml may report a starting position of line X and then in subsequent elements report starting positions of Y < X. One known case is the following construct

#ifdef __cplusplus
extern "C" {
#endif

#include ... // Or other CPP statements

// C statements

#ifdef __cplusplus
}
#endif

In the above case the block_content tag for the extern "C" block will have a starting position where the C statements start instead of where the following CPP statements are.

Provide handling of srcml namespaces

The elements passed to visitors can be parsed via xpaht searches but they all have the full namespace which makes searches cumbersome. Either a helper map for the srcml namespaces could be provided or perhaps when parsing the SRCML output, the namespaces can be stripped prior to iterating over the tree. See the most popular answer (not the accepted answer) here: https://stackoverflow.com/questions/18159221/remove-namespace-and-prefix-from-xml-in-python-using-lxml

Add argument to run an ignore list cleanup

Ignore list cleanup option would output a new list with line numbers updated (if original list had issue at line 5, but it was found at line 6 then it is still a match and output new list with line 6 listed)

Per rule verbosity setting

Provide per rule verbose settings. Add print_verbose to Rule class.

Support specifying multiple rules in a single object of a config file

If a rulepack has many rules, it is tedious to create/manage a rule object entry in the json config file. Support the use of wildcards (* and ?) in names. May still require the rulepack to be specified. For example:
name: linuxstyle.*
name: linuxstyle.rule2?

Also, support wildcards pulling in multiple rules but if a specific rule is named with setting then instead of instantiating the rule again, replace the one pulled in via wildcard.

NOTE: This last one may need some more thought as it is a bit inconsistent for how multiple rule instantiation is done today.

Support Python 3.7

Currently Python 3.8 is required. Change components to allow 3.7 to be used so that additional systems can be used. (Some corporate systems may not be allowed to upgrade to 3.8 yet.)

Refactor to remove ignore logic from logger

The logger module's log_violation method currently knows too much about how ignores work. It requests the hash of the ignore and then passes it to various other ignore module methods. Update log_violation and ignore module so that the logger doesn't need to request the hash value unless it is going to show it on the console.

Rulecheck sometimes reads beyond last line of file.

This is caused by this srcml bug: srcML/srcML#1697
The incorrect end position specified by srcml is beyond the last 'real' line in the file.
In the example given in the srcml issue:

1: #if THIS\n
2: main() {\n
3: }\n
4: #endif\n
5:

line 5 has no content and thus is not read in by rulecheck when it calls python's readlines function. In other words, readlines will read in an list of 4 strings, not 5. But srcml specifies the end position as being on line 5 and rulecheck then tries to process lines up to and including line 5, but this is beyond the end of the list of lines.

Pass srcml unit _tree_ to file open methods

Consider passing the srcml unit tree to file open so rules can use xpath searches if desired.

Make sure all rule methods are called in a try block

Add try: before calling any rule's visit methods. If exception, log message for the rule and print exception to stderr.

Additional summary output

Have logger keep total count of err and warn (two counts) for each rule (by rule name) over all files and include that in summary. Print Error Count, Warning Count, Rule Name so rule name length doesn't impact output formatting.
The counts between instantiations of the same rule would not be counted separately

Ignore file reader can't handle colons in file names

This prevents ignore items from working when an absolute file location was provided to rulecheck when running on windows as, for example, C: will show up in the log message.

Support "strict" mode to disable ignore directives

Support an argument and per-rule setting to disable ignore via line comments (strict mode).
(Consider and document what will happen if a rule is ignored via the hash lookup method. Maybe also disable that method as well for strict mode, or print warning, or do nothing?)

Provide a visitor to be called once all files have been processed.

This will allow rules to log summary information or do checking of items that required knowledge of all source files.

Support ignore push and pop via source comment

Support line comment to push-disable and pop-last-setting for rule(s) (as opposed to ignoring the next line only)

Ignore via hash should limit line number difference when finding matches

Ignore via hash needs to take line number +/- n lines as many lines may hash to same value as the line content is identical. With this, the hashes must be loaded from the ignore list and duplicates in ignore list counted and then decremented as they are found in the files being searched. If count reaches 0 or line not within n lines of ignore row then it is a reported violation.

Add rule to language applicability support

Rules need to provide which languages they parse. There are the same srcml tags across several languages, but a style or other rule might only apply to a langugage or a few, not all.
Rules can be dynamic in this by taking in a setting and changing what they return in the language getter.

e-shreve / rulecheck Goto Github PK

rulecheck's Issues

Recommend Projects

Recommend Topics

Recommend Org