e-shreve / rulecheck Goto Github PK
View Code? Open in Web Editor NEWRulecheck provides a system to check C, C++, C#, and Java code against custom static rules.
Home Page: https://github.com/e-shreve/rulecheck
License: MIT License
Rulecheck provides a system to check C, C++, C#, and Java code against custom static rules.
Home Page: https://github.com/e-shreve/rulecheck
License: MIT License
Add try: before calling any rule's visit methods. If exception, log message for the rule and print exception to stderr.
Fix for #22 causes last line to not be visited
This would make the executing command shorter. Also, support taking in a path to look for config files in.
Support ignore based on line comment.
Possibly like:
// NORCNEXTLINE(rulepack.myrule)
would ignore rulepack.myrule on the line following the comment.
// NORC(rulepack.myrule)
would ignore rulepack.myrule on the line the comment appears on.
Technically, the implementation does not have to restrict its search for the keywords to comments. It may do a plaintext search on a line. This will be simpler and should be good enough.
Any whitespace between start of comment and the keyword 'rulecheck' would be allowed.
Multiple rules may be specified using comma as a separator:
// NORC(rulepack.rule1, rulepack.rule2)
The following would ignore all violations by any rule:
// NORC(*)
// NORCNEXTLINE(*)
Text after NORC/NORCNEXTLINE except that in immediately following parenthesis will be ignored. That way the comment may contain a reason/rationale for the disabling of the rule.
Unrecognized rules will be ignored as the user may simply not be executing rulecheck with the rule activated.
Syntax is inspired by NOLINT and NOLINTNEXTLINE comments for clang-tidy: https://clang.llvm.org/extra/clang-tidy/.
A consistent syntax will help users if they are using both tools.
They should be able to specify:
The rule class for rulechecktest to create
A file for input
Maybe support taking in an already created srcml file
Settings dictionary for their rule
Get a list of log calls the rulemade
Run standard set of asserts on their rule creation (does it do all the correct things on object creation?)
Provide help text
Have integration test parse output, using a any_other rule that logs every xml element with a visit_line that logs every line and show that the reported position always increases
Support an argument and per-rule setting to disable ignore via line comments (strict mode).
(Consider and document what will happen if a rule is ignored via the hash lookup method. Maybe also disable that method as well for strict mode, or print warning, or do nothing?)
This will allow rules to log summary information or do checking of items that required knowledge of all source files.
Use timeit and print summary of how long each rule ran for and summary of srcml execution time and summary of "other" overhead time.
The system should be able to print help text for any rule. A standard way for rules to report help text (or standard method run and it may then print text) will be needed in addition to a command line argument that takes the rule name for which to display help information.
The element passed to a start visitor may not have all children loaded.
This is due to the path separator differences between OSes.
Provide per rule verbose settings. Add print_verbose to Rule class.
Test and cleanup as needed what happens when: rule path not found, rule not found, source not found, srcml not found, config file not found, srcml returns an error, settings for rule not present in json, no rules specified in config json
To help catch errors made by rule authors, consider having log wrapper method check type of inputs and throw exception if not correct input.
If a rulepack has many rules, it is tedious to create/manage a rule object entry in the json config file. Support the use of wildcards (* and ?) in names. May still require the rulepack to be specified. For example:
name: linuxstyle.*
name: linuxstyle.rule2?
Also, support wildcards pulling in multiple rules but if a specific rule is named with setting then instead of instantiating the rule again, replace the one pulled in via wildcard.
The logger module's log_violation method currently knows too much about how ignores work. It requests the hash of the ignore and then passes it to various other ignore module methods. Update log_violation and ignore module so that the logger doesn't need to request the hash value unless it is going to show it on the console.
Have logger keep total count of err and warn (two counts) for each rule (by rule name) over all files and include that in summary. Print Error Count, Warning Count, Rule Name so rule name length doesn't impact output formatting.
The counts between instantiations of the same rule would not be counted separately
Support pip based install from pypi.
srcml may report a starting position of line X and then in subsequent elements report starting positions of Y < X. One known case is the following construct
#ifdef __cplusplus
extern "C" {
#endif
#include ... // Or other CPP statements
// C statements
#ifdef __cplusplus
}
#endif
In the above case the block_content tag for the extern "C" block will have a starting position where the C statements start instead of where the following CPP statements are.
"Mute after n" option to print summary for a rule on a file on n+1th message from a rule (to reduce log size)
Document/support rules failing config (bad settings).
Maybe they should print their use/help on error? Or maybe they should provide a method to return that text and loadRules would print it on exception from init of the rule?
This is caused by this srcml bug: srcML/srcML#1697
The incorrect end position specified by srcml is beyond the last 'real' line in the file.
In the example given in the srcml issue:
1: #if THIS\n
2: main() {\n
3: }\n
4: #endif\n
5:
line 5 has no content and thus is not read in by rulecheck when it calls python's readlines function. In other words, readlines will read in an list of 4 strings, not 5. But srcml specifies the end position as being on line 5 and rulecheck then tries to process lines up to and including line 5, but this is beyond the end of the list of lines.
Currently Python 3.8 is required. Change components to allow 3.7 to be used so that additional systems can be used. (Some corporate systems may not be allowed to upgrade to 3.8 yet.)
Consider passing the srcml unit tree to file open so rules can use xpath searches if desired.
Support line comment to push-disable and pop-last-setting for rule(s) (as opposed to ignoring the next line only)
The elements passed to visitors can be parsed via xpaht searches but they all have the full namespace which makes searches cumbersome. Either a helper map for the srcml namespaces could be provided or perhaps when parsing the SRCML output, the namespaces can be stripped prior to iterating over the tree. See the most popular answer (not the accepted answer) here: https://stackoverflow.com/questions/18159221/remove-namespace-and-prefix-from-xml-in-python-using-lxml
This prevents ignore items from working when an absolute file location was provided to rulecheck when running on windows as, for example, C: will show up in the log message.
Ignore via hash needs to take line number +/- n lines as many lines may hash to same value as the line content is identical. With this, the hashes must be loaded from the ignore list and duplicates in ignore list counted and then decremented as they are found in the files being searched. If count reaches 0 or line not within n lines of ignore row then it is a reported violation.
Rules need to provide which languages they parse. There are the same srcml tags across several languages, but a style or other rule might only apply to a langugage or a few, not all.
Rules can be dynamic in this by taking in a setting and changing what they return in the language getter.
GCC's output is slightly different in that it adds spaces after some colons:
hello.cpp:4:1: error: ‘returna’ was not declared in this scope
Update the log output to add these spaces so IDEs and other tools that look for errors/warnings in tool output will be more likely to pick up rulecheck's output.
Ignore list cleanup option would output a new list with line numbers updated (if original list had issue at line 5, but it was found at line 6 then it is still a match and output new list with line 6 listed)
Implement such that each rule implementation doesn't have to handle this on its own. Warnings reported on a rule would be promoted to errors automatically by the system. werror setting would be specified in the config json.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.