data,vulndb

Valid markdown unittest

Create a unittest that makes sure that the description/fix texts are valid markdown.

Initial import from Vega

The guys from Vega have their own DB https://github.com/subgraph/Vega/tree/develop/xml/alerts , but they use http://www.eclipse.org/legal/epl-v10.html which wouldn't be compatible with w3af's GPLv2

Add CWE / OWASP for existing vulnerabilities

Add CWE / OWASP / WASC for existing vulnerabilities at https://github.com/vulndb/data/tree/master/db , it might be a good chance to create a video tutorial on how to contribute, and try to involve more people in the vulndb project.

Unittest: Find similar desc/fix

Create a unittest that will find descriptions that are duplicated/very similar between two files. I'm worried about some of the data we imported from arachni, namely all the xss_* we have at https://github.com/vulndb/data/tree/master/db . If they are duplicated we should remove them, and the unittest will also help is avoid similar issues in the future.

Add new vulnerabilities to database

Description

Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with low priority

Vulnerability list

Task

For each vulnerability we need to create a new JSON file inside the db directory (that looks like this) and make sure it passes all the tests.

References

Add json-scheme format

Based on andresriancho/w3af#53

Remove Arachni-specific tags

Remove Arachni-specific tags:

rdiff
regexp
differential
timing

Import arachni scanner data

Arachni has some very interesting data we could use:

https://github.com/Arachni/arachni/blob/master/components/checks/active/file_inclusion.rb
https://github.com/Arachni/arachni/blob/8a7c8cdb2ab00b04a34dc666e3a6607e09b025e2/components/checks/active/xss_event.rb
https://github.com/Arachni/arachni/blob/55f78dbd7d2fc7b53bb7dc5d576662b2810a6b8f/components/checks/active/xss_dom.rb

Agreed with @Zapotek that he'll write a script to export the data out of the arachni tests and into generic JSON files which we can then migrate to the format defined in #5

nmap might be interested in using vulndb/data

nmap might be interested in using vulndb/data , they don't seem to have a vulnerability database.

Wait until the vulndb/data is stable so we can present the idea to the mailing list and don't get rejected with: "too new", "spec not well implemented", "duplicates data ..."

Also, it might be a good idea to create a simple LUA (for nse scripts) SDK for vulndb, so their implementation is straight 👍

Write unittest to verify all JSON files comply with schema.json

Move markdown out of JSON files

User story

As a user I complain and say that writing markdown text inside JSON is hard:

I have to take care of multi-line JSON lists
My text editor doesn't highlight my markdown text, which leads to more syntax mistakes
It's harder to edit online by just opening the file in github

Tasks

Move vulnerability description and fix texts to external files
Reference those files using this syntax: "fix": { "$ref": "#/files/fix/123" } and "description": { "$ref": "#/files/description/123" }
Write unittests that make sure that all referenced files exist
Write unittests that make sure that all files in the "fix" and "description" directories are referenced by at least one JSON

Reviewers

@m0sth8 is this what we talked about yesterday? Anything else to add?

Remove duplicate entries (which differ on technique used to detect vulnerability)

Are 27 and 28 the same vulnerability?

Add WASC references to existing vulnerabilities in DB

Move WASC TC v2 URLs to vulndb/data

To avoid duplication:

Add suitable license

Something like "Attribution-ShareAlike CC BY-SA"

ZAP has it's own DB - We can copy+paste from it if needed

https://code.google.com/p/zaproxy/source/browse/trunk/src/lang/vulnerabilities.xml

Simon said it was OK to copy+paste from this database.

Translate database in russian

cwe IDs

In 14-cvs-svn-user-disclosure.json, the cwe IDs go from specific to general:

"cwe": ["527"],
...
"url": "http://cwe.mitre.org/data/definitions/200.html",

In 44-source-code-disclosure.json, the cwe IDs go from general to specific:

"cwe": ["200"],
...
"url": "http://cwe.mitre.org/data/definitions/540.html",

And in 15-directory-listing.json, the cwe IDs are the same:

"cwe": ["548"],
"url": "http://cwe.mitre.org/data/definitions/548.html",

Add support for template variables

I suppose it will be useful if we add support for template variables in such cases:

The line

Arachni has flagged this not as a vulnerability, but as a ...

is converted to

{{SCANNER}} has flagged this not as a vulnerability, but as a ...

What do you think?

Python sdk

It would be great to have a simple python wrapper that would return an object representation of the JSON object, with all the getters and setters, so that it can be easily used in our software.

This should be implemented in another repo, like vulndb/python-sdk

Avoid duplicated fix guidance

Task

Think about a solution for duplicated fix guidance texts, which for example are present in these JSON files:

The solution is, of course, to put all the fix (guidance, effort, etc.) in one place and reference them from the "main" / "vulnerability" json. The details is what we need to think about.

Solution

Split json files in two, one holding the vulnerability information and the other holding the fix.

Vulnerabilities are stored in db/vuln/
Fixes are stored in db/fix/
Files in fix directory contain:

{
  "generic": {
    "effort": 50, 
    "guidance": [
      "The first step to remediation is to identify the context in which the ..."
    ]
  }
}

The files in the fix directory have this format: <id>-<fix-title>.json
The vulnerabilities should be changed to point to the fix id:

  "fix": 321,

Notes

Using "generic" in fix file to allow us in the future to have different fixes for each language/server (iis/apache/etc)
The python-sdk should make this process completely
An automated process should be able to migrate from current format to the new one
Manual filtering should allow us to group/remove fixes

Unittest

All "fix" entries in vuln must point to valid json files in the fix directory
The unittests that applied to guidance and effort must apply to fix files

Remove duplicated CWE data

While reviewing the pull request from @robocoder I noticed that we're duplicating data. The information about which CWE is related to each vulnerability is in two places:

   "cwe": ["749"],
   "references": [
     {
       "url": "http://httpd.apache.org/docs/2.2/mod/core.html#limitexcept", 
       "title": "Apache.org"
    },
    {
      "url": "http://cwe.mitre.org/data/definitions/749.html",
      "title": "CWE-749"

It doesn't make sense to duplicate this information. If the SDK developer creates a report where he translates the CWE id to a URL and then gets the references he'll get a report with two links to the same URL (http://cwe.mitre.org/data/definitions/749.html in this case)

This was more noticeable at #34 but was a problem before I asked @robocoder to write this PR (sorry for the extra work that won't be merged!)

My proposal to solve this issue is:

Do not have any links in the "references" section that point to cwe.mitre.org
Write a unittest to assert the above rule

Add reference to CWE/SANS Top 25 (2011)

https://cwe.mitre.org/top25/index.html

Markdown: 1- vs 1.

I've seen some json files where the markdown says 1- , and some others that say 1. which one is the correct one?

Database design

JSON backend

All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/ and the name should be <vuln-id>-<vulnerability-name>.json , where <vuln-id> is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name> is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:

1-cross-site-scripting.json
2-sql-injection.json
3-missing-hsts-header.json

JSON files should look similar to this example:

{
  "id": 12345,
  "title": "Cross-Site Scripting",
  "description": "A very long description for Cross-Site Scripting",
  "severity": "medium",
  "wasc": ["0003"],
  "tags": ["xss", "client side"],
  "cwe": ["0003", "0007"],
  "owasp_top_10": {
        "2010": [1],
        "2013": [2]
  },
  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },
  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
      {"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
      {"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
    ]
  }

Notes about the JSON file:

"id" holds the vulnerability unique ID. The contents of the "id" field must match filename
The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
"fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
Since a long blob of unformatted text is hard to read for any user, the description and solution text MUST use markdown for formatting. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.

Long strings in JSON

JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:

  "description": "A very long description for Cross-Site Scripting",
  "solution": "A very long text explaining how to fix XSS vulnerabilities",

So, the parser should check if the description and solution fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:

  "description": ["A very long description for Cross-Site Scripting",
                         " which has more than one line"]

Will be parsed as A very long description for Cross-Site Scripting which has more than one line and then if the user wants to enter new lines, he needs to do so explicitly:

  "description": ["Line1\n",
                         "Line2\n"]

Will be parsed as:

Line1
Line2

Translations

One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.

Implementation details:

The English version will be in db/
Unix's environment variable LANG is used for setting the language
If there is no translation for the current LANG then the English version is used
Translations will be in db/ru/ , db/es/ , etc.
The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:

  "title": "Cross-Site Scripting",

And then the Russian translation file (for the same vulnerability id) says:

  "title": "Межсайтовый скриптинг",

And if the LANG environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).

Content

We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.

Unittesting

Unittests must be in the tests/ directory
We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
Assert that the severity field is one of high, medium, low, informational
Assert that these fields are present and contain at least 30 chars:
- description
- solution
Assert that these fields are present:
- id
- title
- fix_effort
- severity
Assert that all references have url and title fields, and that they are not empty
For each value in WASC/OWASP, make sure that we can generate a link like http://owasp.org/foo/<id>, the URL is valid, site is online, not 404
For each URL field in the schema, we need to check that:
- It's a valid URL
- The site is online, and no 404 is returned
fix_effort value is in the right format
The markdown in both description and solution are well formed
The ID in the file name is the same as the one inside the JSON file
The lines are not longer than 90 columns

References

andresriancho/w3af#53

Add CVSS

Idea

Add a CVSS field.

Problems

We already have a severity field with info/low/med/high which is an alternate representation of risk (which is what CVSS scores). Maybe we could:

Remove severity from JSON
Add CVSS
In the SDK create a method that returns the CVSS: get_cvss
In the SDK create a method that returns info/low/med/high according to CVSS ranges, 0-2 is info, 2-3 is low, etc.

Tasks

Decide if we want to have CVSS
Decide if we want to remove severity
Decide the ranges that translate CVSS to severity names

Too many new lines

The migration tool did a great job, but it kept some new lines we don't need. Also, when we '\n'.join(description) we add more new lines resulting on markdown that's 100% valid but hard to read for humans:

It seems that we need to convert the json files again, finding all simple new lines (\n) and removing them. The "double new lines" (\n\n) should remain untouched.

Add field to describe who is affected by vulnerability

What if we add a new field that describes who is affected by vulnerability?

{
  "id": 45, 
  "title": "SQL Injection", 
  "affect": "server"

or

{
  "id": 70,
  "title": "Persistent Cross-Site Scripting (XSS)",
  "target": "user"

possible values: user, server, database, data

Write unittests required by specification

Write unittests required by specification , see "#5 Database design"

Add new vulnerabilities to database (must-have)

Description

Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with high priority

Vulnerability list

References

Is this project still alive?

The last activity on the repository occured 2 years ago, and since then no new commit was done, nor any issue/pull-request update. So, as I'd like to use this data in a project of mine, I'm wondering if it's worth opening issues and/or submitting pull requests, or if I'd better just fork it and do any change I'd like directly in my fork instead without caring about possibly contributing to this repo.

Join multiline strings with '' , ' ' or '\n'?

Join multiline strings with '' or '\n'? That's something I'm not sure about.

Joining with empty strings

If we join using '' the long strings that don't contain new lines will look like:

foo = ["abc def hello world",
        " rock stars"]

Result text: abc def hello world rock stars
Rendered markdown: <p>abc def hello world rock stars</p>

Note: The contributor needs to remember to add the empty space before "rock"

And when a contributor wants to add a new line he needs to add it explicitly (which sounds good):

foo = ["abc def hello world\n",
        "rock stars"]

Result text: abc def hello world rock stars
Rendered markdown:

<p>abc def hello world
rock stars</p>

Note: The contributor needs to remember to add a double \n\n if he wants two paragraphs

This is the case for two different paragraphs

foo = ["abc def hello world\n\n",
        "rock stars"]

Result text: abc def hello world rock stars
Rendered markdown:

<p>abc def hello world</p>
<p>rock stars</p>

Joining with a space

If we join with a space we're "fixing" the fact that the contributor needs to remember to add an empty space at the beginning of each line. But that's not very explicit and might confuse contributors.

Joining with a new line

foo = ["abc def hello world",
        " rock stars"]

Result text: abc def hello world\n rock stars
Rendered markdown:

<p>abc def hello world
rock stars</p>

foo = ["abc def hello world",
        "\n",
        " rock stars"]

Result text: abc def hello world\n\n rock stars
Rendered markdown:

<p>abc def hello world</p>
<p>rock stars</p>

This seems to be the cleanest.

Ideas? Comments?

php sdk

I looked at the Ruby and Go implementations and whipped up a php sdk. Let me know if it's ok to transfer this repo to this organization.

https://github.com/vipsoft/vulndb-php
I didn't embed the vulndb/data as it seems to couple the sdk to the database (i.e., release an updated sdk each time the vulndb changes). But if embedding is the preference, let me know, so I can rectify.

Broad/Descriptive Tags On Entries

With regards to the tags used by entries in the database, what level of detail should they include? For example the entry on Cross-Site Scripting (XSS) has "xss", "regexp", "injection", "script", which is quite specific. I was wondering if it would be advantageous to add tags related to the domain in which the vulnerability exist. So include "website", "http", "sessions", etc.

Examples of when this might be useful is end-user search engines for the database which are aimed at developers who aren't necessarily infosec pros but want to see what could affect their project. Something where one can type "http", "website", etc and get a list of what they might be up against, allowing them to proactively build with those in mind prior to a security audit.

vulndb / data Goto Github PK

data's People

Contributors

Stargazers

Watchers

Forkers

data's Issues

Description

Vulnerability list

Task

References

User story

Tasks

Reviewers

Task

Solution

Notes

Unittest

JSON backend

Long strings in JSON

Translations

Content

Unittesting

References

Idea

Problems

Tasks

Description

Vulnerability list

References

Joining with empty strings

Joining with a space

Joining with a new line

Recommend Projects

Recommend Topics

Recommend Org