vulndb / data Goto Github PK
View Code? Open in Web Editor NEWUser, contributor and developer friendly vulnerability database
License: Other
User, contributor and developer friendly vulnerability database
License: Other
Create a unittest that makes sure that the description/fix texts are valid markdown.
The guys from Vega have their own DB https://github.com/subgraph/Vega/tree/develop/xml/alerts , but they use http://www.eclipse.org/legal/epl-v10.html which wouldn't be compatible with w3af's GPLv2
Add CWE / OWASP / WASC for existing vulnerabilities at https://github.com/vulndb/data/tree/master/db , it might be a good chance to create a video tutorial on how to contribute, and try to involve more people in the vulndb project.
Create a unittest that will find descriptions that are duplicated/very similar between two files. I'm worried about some of the data we imported from arachni, namely all the xss_*
we have at https://github.com/vulndb/data/tree/master/db . If they are duplicated we should remove them, and the unittest will also help is avoid similar issues in the future.
Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with low priority
For each vulnerability we need to create a new JSON file inside the db
directory (that looks like this) and make sure it passes all the tests.
Based on andresriancho/w3af#53
Remove Arachni-specific tags:
Arachni has some very interesting data we could use:
https://github.com/Arachni/arachni/blob/master/components/checks/active/file_inclusion.rb
https://github.com/Arachni/arachni/blob/8a7c8cdb2ab00b04a34dc666e3a6607e09b025e2/components/checks/active/xss_event.rb
https://github.com/Arachni/arachni/blob/55f78dbd7d2fc7b53bb7dc5d576662b2810a6b8f/components/checks/active/xss_dom.rb
Agreed with @Zapotek that he'll write a script to export the data out of the arachni tests and into generic JSON files which we can then migrate to the format defined in #5
nmap might be interested in using vulndb/data , they don't seem to have a vulnerability database.
Wait until the vulndb/data is stable so we can present the idea to the mailing list and don't get rejected with: "too new", "spec not well implemented", "duplicates data ..."
Also, it might be a good idea to create a simple LUA (for nse scripts) SDK for vulndb, so their implementation is straight 👍
Write unittest to verify all JSON files comply with schema.json
As a user I complain and say that writing markdown text inside JSON is hard:
"fix": { "$ref": "#/files/fix/123" }
and "description": { "$ref": "#/files/description/123" }
Are 27 and 28 the same vulnerability?
Add WASC references to existing vulnerabilities in DB
Something like "Attribution-ShareAlike CC BY-SA"
https://code.google.com/p/zaproxy/source/browse/trunk/src/lang/vulnerabilities.xml
Simon said it was OK to copy+paste from this database.
In 14-cvs-svn-user-disclosure.json
, the cwe IDs go from specific to general:
"cwe": ["527"],
...
"url": "http://cwe.mitre.org/data/definitions/200.html",
In 44-source-code-disclosure.json
, the cwe IDs go from general to specific:
"cwe": ["200"],
...
"url": "http://cwe.mitre.org/data/definitions/540.html",
And in 15-directory-listing.json
, the cwe IDs are the same:
"cwe": ["548"],
"url": "http://cwe.mitre.org/data/definitions/548.html",
I suppose it will be useful if we add support for template variables in such cases:
The line
Arachni has flagged this not as a vulnerability, but as a ...
is converted to
{{SCANNER}} has flagged this not as a vulnerability, but as a ...
What do you think?
It would be great to have a simple python wrapper that would return an object representation of the JSON object, with all the getters and setters, so that it can be easily used in our software.
This should be implemented in another repo, like vulndb/python-sdk
Think about a solution for duplicated fix guidance texts, which for example are present in these JSON files:
The solution is, of course, to put all the fix (guidance, effort, etc.) in one place and reference them from the "main" / "vulnerability" json. The details is what we need to think about.
Split json
files in two, one holding the vulnerability information and the other holding the fix.
db/vuln/
db/fix/
{
"generic": {
"effort": 50,
"guidance": [
"The first step to remediation is to identify the context in which the ..."
]
}
}
<id>-<fix-title>.json
"fix": 321,
While reviewing the pull request from @robocoder I noticed that we're duplicating data. The information about which CWE is related to each vulnerability is in two places:
"cwe": ["749"],
"references": [
{
"url": "http://httpd.apache.org/docs/2.2/mod/core.html#limitexcept",
"title": "Apache.org"
},
{
"url": "http://cwe.mitre.org/data/definitions/749.html",
"title": "CWE-749"
It doesn't make sense to duplicate this information. If the SDK developer creates a report where he translates the CWE id to a URL and then gets the references he'll get a report with two links to the same URL (http://cwe.mitre.org/data/definitions/749.html in this case)
This was more noticeable at #34 but was a problem before I asked @robocoder to write this PR (sorry for the extra work that won't be merged!)
My proposal to solve this issue is:
cwe.mitre.org
I've seen some json files where the markdown says 1-
, and some others that say 1.
which one is the correct one?
All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/
and the name should be <vuln-id>-<vulnerability-name>.json
, where <vuln-id>
is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name>
is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:
1-cross-site-scripting.json
2-sql-injection.json
3-missing-hsts-header.json
JSON files should look similar to this example:
{
"id": 12345,
"title": "Cross-Site Scripting",
"description": "A very long description for Cross-Site Scripting",
"severity": "medium",
"wasc": ["0003"],
"tags": ["xss", "client side"],
"cwe": ["0003", "0007"],
"owasp_top_10": {
"2010": [1],
"2013": [2]
},
"fix": {
"guidance": "A very long text explaining how to fix XSS vulnerabilities",
"effort": 50
},
"references": [
{"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
{"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
{"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
]
}
Notes about the JSON file:
JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:
"description": "A very long description for Cross-Site Scripting",
"solution": "A very long text explaining how to fix XSS vulnerabilities",
So, the parser should check if the description
and solution
fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:
"description": ["A very long description for Cross-Site Scripting",
" which has more than one line"]
Will be parsed as A very long description for Cross-Site Scripting which has more than one line
and then if the user wants to enter new lines, he needs to do so explicitly:
"description": ["Line1\n",
"Line2\n"]
Will be parsed as:
Line1
Line2
One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.
Implementation details:
db/
LANG
is used for setting the languageLANG
then the English version is useddb/ru/
, db/es/
, etc. "title": "Cross-Site Scripting",
And then the Russian translation file (for the same vulnerability id) says:
"title": "Межсайтовый скриптинг",
And if the LANG
environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian
as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).
We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.
tests/
directoryseverity
field is one of high
, medium
, low
, informational
url
and title
fields, and that they are not emptyhttp://owasp.org/foo/<id>
, the URL is valid, site is online, not 404fix_effort
value is in the right formatAdd a CVSS field.
We already have a severity field with info/low/med/high which is an alternate representation of risk (which is what CVSS scores). Maybe we could:
get_cvss
The migration tool did a great job, but it kept some new lines we don't need. Also, when we '\n'.join(description)
we add more new lines resulting on markdown that's 100% valid but hard to read for humans:
It seems that we need to convert the json files again, finding all simple new lines (\n) and removing them. The "double new lines" (\n\n) should remain untouched.
What if we add a new field that describes who is affected by vulnerability?
{
"id": 45,
"title": "SQL Injection",
"affect": "server"
or
{
"id": 70,
"title": "Persistent Cross-Site Scripting (XSS)",
"target": "user"
possible values: user, server, database, data
Write unittests required by specification , see "#5 Database design"
Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with high priority
The last activity on the repository occured 2 years ago, and since then no new commit was done, nor any issue/pull-request update. So, as I'd like to use this data in a project of mine, I'm wondering if it's worth opening issues and/or submitting pull requests, or if I'd better just fork it and do any change I'd like directly in my fork instead without caring about possibly contributing to this repo.
Join multiline strings with '' or '\n'? That's something I'm not sure about.
If we join using '' the long strings that don't contain new lines will look like:
foo = ["abc def hello world",
" rock stars"]
Result text: abc def hello world rock stars
Rendered markdown: <p>abc def hello world rock stars</p>
Note: The contributor needs to remember to add the empty space before "rock"
And when a contributor wants to add a new line he needs to add it explicitly (which sounds good):
foo = ["abc def hello world\n",
"rock stars"]
Result text: abc def hello world rock stars
Rendered markdown:
<p>abc def hello world
rock stars</p>
Note: The contributor needs to remember to add a double \n\n if he wants two paragraphs
This is the case for two different paragraphs
foo = ["abc def hello world\n\n",
"rock stars"]
Result text: abc def hello world rock stars
Rendered markdown:
<p>abc def hello world</p>
<p>rock stars</p>
If we join with a space we're "fixing" the fact that the contributor needs to remember to add an empty space at the beginning of each line. But that's not very explicit and might confuse contributors.
foo = ["abc def hello world",
" rock stars"]
Result text: abc def hello world\n rock stars
Rendered markdown:
<p>abc def hello world
rock stars</p>
foo = ["abc def hello world",
"\n",
" rock stars"]
Result text: abc def hello world\n\n rock stars
Rendered markdown:
<p>abc def hello world</p>
<p>rock stars</p>
This seems to be the cleanest.
Ideas? Comments?
I looked at the Ruby and Go implementations and whipped up a php sdk. Let me know if it's ok to transfer this repo to this organization.
I didn't embed the vulndb/data as it seems to couple the sdk to the database (i.e., release an updated sdk each time the vulndb changes). But if embedding is the preference, let me know, so I can rectify.
With regards to the tags used by entries in the database, what level of detail should they include? For example the entry on Cross-Site Scripting (XSS) has "xss", "regexp", "injection", "script", which is quite specific. I was wondering if it would be advantageous to add tags related to the domain in which the vulnerability exist. So include "website", "http", "sessions", etc.
Examples of when this might be useful is end-user search engines for the database which are aimed at developers who aren't necessarily infosec pros but want to see what could affect their project. Something where one can type "http", "website", etc and get a list of what they might be up against, allowing them to proactively build with those in mind prior to a security audit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.