Code Monkey home page Code Monkey logo

data's People

Contributors

andresriancho avatar m0sth8 avatar robocoder avatar snoopysecurity avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

Unittest: Find similar desc/fix

Create a unittest that will find descriptions that are duplicated/very similar between two files. I'm worried about some of the data we imported from arachni, namely all the xss_* we have at https://github.com/vulndb/data/tree/master/db . If they are duplicated we should remove them, and the unittest will also help is avoid similar issues in the future.

Add new vulnerabilities to database

Description

Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with low priority

Vulnerability list

  • 'Buffer overflow vulnerability'
  • 'MX injection vulnerability'
  • 'Unsafe preg_replace usage'
  • 'ReDoS vulnerability'
  • Server side include vulnerability
  • Persistent server side include vulnerability
  • Basic HTTP credentials
  • Path disclosure vulnerability (maybe it's already in the DB?)
  • Malware identified
  • CSP vulnerability
  • Missing X-Content-Type-Options header
  • Guessable credentials

Task

For each vulnerability we need to create a new JSON file inside the db directory (that looks like this) and make sure it passes all the tests.

References

Import arachni scanner data

nmap might be interested in using vulndb/data

nmap might be interested in using vulndb/data , they don't seem to have a vulnerability database.

Wait until the vulndb/data is stable so we can present the idea to the mailing list and don't get rejected with: "too new", "spec not well implemented", "duplicates data ..."

Also, it might be a good idea to create a simple LUA (for nse scripts) SDK for vulndb, so their implementation is straight 👍

Move markdown out of JSON files

User story

As a user I complain and say that writing markdown text inside JSON is hard:

  • I have to take care of multi-line JSON lists
  • My text editor doesn't highlight my markdown text, which leads to more syntax mistakes
  • It's harder to edit online by just opening the file in github

Tasks

  • Move vulnerability description and fix texts to external files
  • Reference those files using this syntax: "fix": { "$ref": "#/files/fix/123" } and "description": { "$ref": "#/files/description/123" }
  • Write unittests that make sure that all referenced files exist
  • Write unittests that make sure that all files in the "fix" and "description" directories are referenced by at least one JSON

Reviewers

  • @m0sth8 is this what we talked about yesterday? Anything else to add?

cwe IDs

In 14-cvs-svn-user-disclosure.json, the cwe IDs go from specific to general:

"cwe": ["527"],
...
"url": "http://cwe.mitre.org/data/definitions/200.html", 

In 44-source-code-disclosure.json, the cwe IDs go from general to specific:

"cwe": ["200"],
...
"url": "http://cwe.mitre.org/data/definitions/540.html", 

And in 15-directory-listing.json, the cwe IDs are the same:

"cwe": ["548"],
"url": "http://cwe.mitre.org/data/definitions/548.html", 

Add support for template variables

I suppose it will be useful if we add support for template variables in such cases:

The line

Arachni has flagged this not as a vulnerability, but as a ...

is converted to

{{SCANNER}} has flagged this not as a vulnerability, but as a ...

What do you think?

Python sdk

It would be great to have a simple python wrapper that would return an object representation of the JSON object, with all the getters and setters, so that it can be easily used in our software.

This should be implemented in another repo, like vulndb/python-sdk

Avoid duplicated fix guidance

Task

Think about a solution for duplicated fix guidance texts, which for example are present in these JSON files:

The solution is, of course, to put all the fix (guidance, effort, etc.) in one place and reference them from the "main" / "vulnerability" json. The details is what we need to think about.

Solution

Split json files in two, one holding the vulnerability information and the other holding the fix.

  • Vulnerabilities are stored in db/vuln/
  • Fixes are stored in db/fix/
  • Files in fix directory contain:
{
  "generic": {
    "effort": 50, 
    "guidance": [
      "The first step to remediation is to identify the context in which the ..."
    ]
  }
}
  • The files in the fix directory have this format: <id>-<fix-title>.json
  • The vulnerabilities should be changed to point to the fix id:
  "fix": 321,

Notes

  • Using "generic" in fix file to allow us in the future to have different fixes for each language/server (iis/apache/etc)
  • The python-sdk should make this process completely
  • An automated process should be able to migrate from current format to the new one
  • Manual filtering should allow us to group/remove fixes

Unittest

  • All "fix" entries in vuln must point to valid json files in the fix directory
  • The unittests that applied to guidance and effort must apply to fix files

Remove duplicated CWE data

While reviewing the pull request from @robocoder I noticed that we're duplicating data. The information about which CWE is related to each vulnerability is in two places:

   "cwe": ["749"],
   "references": [
     {
       "url": "http://httpd.apache.org/docs/2.2/mod/core.html#limitexcept", 
       "title": "Apache.org"
    },
    {
      "url": "http://cwe.mitre.org/data/definitions/749.html",
      "title": "CWE-749"

It doesn't make sense to duplicate this information. If the SDK developer creates a report where he translates the CWE id to a URL and then gets the references he'll get a report with two links to the same URL (http://cwe.mitre.org/data/definitions/749.html in this case)

This was more noticeable at #34 but was a problem before I asked @robocoder to write this PR (sorry for the extra work that won't be merged!)

My proposal to solve this issue is:

  • Do not have any links in the "references" section that point to cwe.mitre.org
  • Write a unittest to assert the above rule

Markdown: 1- vs 1.

I've seen some json files where the markdown says 1- , and some others that say 1. which one is the correct one?

Database design

JSON backend

All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/ and the name should be <vuln-id>-<vulnerability-name>.json , where <vuln-id> is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name> is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:

  • 1-cross-site-scripting.json
  • 2-sql-injection.json
  • 3-missing-hsts-header.json

JSON files should look similar to this example:

{
  "id": 12345,
  "title": "Cross-Site Scripting",
  "description": "A very long description for Cross-Site Scripting",
  "severity": "medium",
  "wasc": ["0003"],
  "tags": ["xss", "client side"],
  "cwe": ["0003", "0007"],
  "owasp_top_10": {
        "2010": [1],
        "2013": [2]
  },
  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },
  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
      {"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
      {"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
    ]
  }

Notes about the JSON file:

  • "id" holds the vulnerability unique ID. The contents of the "id" field must match filename
  • The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
  • "fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
  • Since a long blob of unformatted text is hard to read for any user, the description and solution text MUST use markdown for formatting. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.

Long strings in JSON

JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:

  "description": "A very long description for Cross-Site Scripting",
  "solution": "A very long text explaining how to fix XSS vulnerabilities",

So, the parser should check if the description and solution fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:

  "description": ["A very long description for Cross-Site Scripting",
                         " which has more than one line"]

Will be parsed as A very long description for Cross-Site Scripting which has more than one line and then if the user wants to enter new lines, he needs to do so explicitly:

  "description": ["Line1\n",
                         "Line2\n"]

Will be parsed as:

Line1
Line2

Translations

One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.

Implementation details:

  • The English version will be in db/
  • Unix's environment variable LANG is used for setting the language
  • If there is no translation for the current LANG then the English version is used
  • Translations will be in db/ru/ , db/es/ , etc.
  • The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:
  "title": "Cross-Site Scripting",

And then the Russian translation file (for the same vulnerability id) says:

  "title": "Межсайтовый скриптинг",

And if the LANG environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).

Content

We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.

Unittesting

  • Unittests must be in the tests/ directory
  • We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
  • Assert that the severity field is one of high, medium, low, informational
  • Assert that these fields are present and contain at least 30 chars:
    • description
    • solution
  • Assert that these fields are present:
    • id
    • title
    • fix_effort
    • severity
  • Assert that all references have url and title fields, and that they are not empty
  • For each value in WASC/OWASP, make sure that we can generate a link like http://owasp.org/foo/<id>, the URL is valid, site is online, not 404
  • For each URL field in the schema, we need to check that:
    • It's a valid URL
    • The site is online, and no 404 is returned
  • fix_effort value is in the right format
  • The markdown in both description and solution are well formed
  • The ID in the file name is the same as the one inside the JSON file
  • The lines are not longer than 90 columns

References

andresriancho/w3af#53

Add CVSS

Idea

Add a CVSS field.

Problems

We already have a severity field with info/low/med/high which is an alternate representation of risk (which is what CVSS scores). Maybe we could:

  • Remove severity from JSON
  • Add CVSS
  • In the SDK create a method that returns the CVSS: get_cvss
  • In the SDK create a method that returns info/low/med/high according to CVSS ranges, 0-2 is info, 2-3 is low, etc.

Tasks

  • Decide if we want to have CVSS
  • Decide if we want to remove severity
  • Decide the ranges that translate CVSS to severity names

Too many new lines

The migration tool did a great job, but it kept some new lines we don't need. Also, when we '\n'.join(description) we add more new lines resulting on markdown that's 100% valid but hard to read for humans:

too-many-new-lines

It seems that we need to convert the json files again, finding all simple new lines (\n) and removing them. The "double new lines" (\n\n) should remain untouched.

Add field to describe who is affected by vulnerability

What if we add a new field that describes who is affected by vulnerability?

{
  "id": 45, 
  "title": "SQL Injection", 
  "affect": "server"

or

{
  "id": 70,
  "title": "Persistent Cross-Site Scripting (XSS)",
  "target": "user"

possible values: user, server, database, data

Add new vulnerabilities to database (must-have)

Description

Using vulndb in w3af I noticed that there are some missing vulnerabilities which need to be added to the database with high priority

Vulnerability list

  • 'Multiple CORS misconfigurations'
  • 'Sensitive and strange CORS methods enabled'
  • 'Sensitive CORS methods enabled'
  • 'Uncommon CORS methods enabled'
  • 'Access-Control-Allow-Origin set to "*"'
  • 'Insecure Access-Control-Allow-Origin with credentials'
  • 'Insecure Access-Control-Allow-Origin'
  • 'Incorrect withCredentials implementation'
  • 'Insecure file upload'
  • 'Insecure Frontpage extensions configuration'
  • 'Phishing vector'
  • 'Insecure SSL version'
  • 'Invalid SSL certificate'
  • Persistent Cross-Site Scripting vulnerability
  • Reflected File Download vulnerability
  • Shell shock vulnerability
  • Unhandled error in web application
  • Missing cache control for HTTPS content
  • Cross-domain javascript source

References

Is this project still alive?

The last activity on the repository occured 2 years ago, and since then no new commit was done, nor any issue/pull-request update. So, as I'd like to use this data in a project of mine, I'm wondering if it's worth opening issues and/or submitting pull requests, or if I'd better just fork it and do any change I'd like directly in my fork instead without caring about possibly contributing to this repo.

Join multiline strings with '' , ' ' or '\n'?

Join multiline strings with '' or '\n'? That's something I'm not sure about.

Joining with empty strings

If we join using '' the long strings that don't contain new lines will look like:

foo = ["abc def hello world",
        " rock stars"]

Result text: abc def hello world rock stars
Rendered markdown: <p>abc def hello world rock stars</p>

Note: The contributor needs to remember to add the empty space before "rock"

And when a contributor wants to add a new line he needs to add it explicitly (which sounds good):

foo = ["abc def hello world\n",
        "rock stars"]

Result text: abc def hello world rock stars
Rendered markdown:

<p>abc def hello world
rock stars</p>

Note: The contributor needs to remember to add a double \n\n if he wants two paragraphs

This is the case for two different paragraphs

foo = ["abc def hello world\n\n",
        "rock stars"]

Result text: abc def hello world rock stars
Rendered markdown:

<p>abc def hello world</p>
<p>rock stars</p>

Joining with a space

If we join with a space we're "fixing" the fact that the contributor needs to remember to add an empty space at the beginning of each line. But that's not very explicit and might confuse contributors.

Joining with a new line

foo = ["abc def hello world",
        " rock stars"]

Result text: abc def hello world\n rock stars
Rendered markdown:

<p>abc def hello world
rock stars</p>
foo = ["abc def hello world",
        "\n",
        " rock stars"]

Result text: abc def hello world\n\n rock stars
Rendered markdown:

<p>abc def hello world</p>
<p>rock stars</p>

This seems to be the cleanest.

Ideas? Comments?

php sdk

  1. I looked at the Ruby and Go implementations and whipped up a php sdk. Let me know if it's ok to transfer this repo to this organization.

    https://github.com/vipsoft/vulndb-php

  2. I didn't embed the vulndb/data as it seems to couple the sdk to the database (i.e., release an updated sdk each time the vulndb changes). But if embedding is the preference, let me know, so I can rectify.

Broad/Descriptive Tags On Entries

With regards to the tags used by entries in the database, what level of detail should they include? For example the entry on Cross-Site Scripting (XSS) has "xss", "regexp", "injection", "script", which is quite specific. I was wondering if it would be advantageous to add tags related to the domain in which the vulnerability exist. So include "website", "http", "sessions", etc.

Examples of when this might be useful is end-user search engines for the database which are aimed at developers who aren't necessarily infosec pros but want to see what could affect their project. Something where one can type "http", "website", etc and get a list of what they might be up against, allowing them to proactively build with those in mind prior to a security audit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.