yahoo / gryffin Goto Github PK

Gryffin is a large scale web security scanning platform.

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.52% Go 53.23% JavaScript 46.26%

gryffin's Introduction

ARCHIVED

Gryffin (beta)

Gryffin is a large scale web security scanning platform. It is not yet another scanner. It was written to solve two specific problems with existing scanners: coverage and scale.

Better coverage translates to fewer false negatives. Inherent scalability translates to capability of scanning, and supporting a large elastic application infrastructure. Simply put, the ability to scan 1000 applications today to 100,000 applications tomorrow by straightforward horizontal scaling.

Coverage

Coverage has two dimensions - one during crawl and the other during fuzzing. In crawl phase, coverage implies being able to find as much of the application footprint. In scan phase, or while fuzzing, it implies being able to test each part of the application for an applied set of vulnerabilities in a deep.

Crawl Coverage

Today a large number of web applications are template-driven, meaning the same code or path generates millions of URLs. For a security scanner, it just needs one of the millions of URLs generated by the same code or path. Gryffin's crawler does just that.

Page Deduplication

At the heart of Gryffin is a deduplication engine that compares a new page with already seen pages. If the HTML structure of the new page is similar to those already seen, it is classified as a duplicate and not crawled further.

DOM Rendering and Navigation

A large number of applications today are rich applications. They are heavily driven by client-side JavaScript. In order to discover links and code paths in such applications, Gryffin's crawler uses PhantomJS for DOM rendering and navigation.

Scan Coverage

As Gryffin is a scanning platform, not a scanner, it does not have its own fuzzer modules, even for fuzzing common web vulnerabilities like XSS and SQL Injection.

It's not wise to reinvent the wheel where you do not have to. Gryffin at production scale at Yahoo uses open source and custom fuzzers. Some of these custom fuzzers might be open sourced in the future, and might or might not be part of the Gryffin repository.

For demonstration purposes, Gryffin comes integrated with sqlmap and arachni. It does not endorse them or any other scanner in particular.

The philosophy is to improve scan coverage by being able to fuzz for just what you need.

Scale

While Gryffin is available as a standalone package, it's primarily built for scale.

Gryffin is built on the publisher-subscriber model. Each component is either a publisher, or a subscriber, or both. This allows Gryffin to scale horizontally by simply adding more subscriber or publisher nodes.

Operating Gryffin

Pre-requisites

Go - go1.13 or later
PhantomJS, v2
Sqlmap (for fuzzing SQLi)
Arachni (for fuzzing XSS and web vulnerabilities)
NSQ ,
- running lookupd at port 4160,4161
- running nsqd at port 4150,4151
- with --max-msg-size=5000000
Kibana and Elastic search, for dashboarding
- listening to JSON over port 5000
- Preconfigured docker image available in https://hub.docker.com/r/yukinying/elk/

Installation

go get -u github.com/yahoo/gryffin/...

Run

(WIP)

TODO

Mobile browser user agent
Preconfigured docker images
Redis for sharing states across machines
Instruction to run gryffin (distributed or standalone)
Documentation for html-distance
Implement a JSON serializable cookiejar.
Identify duplicate url patterns based on simhash result.

Talks and Slides

AppsecUSA 2015: abstract, slide, recording

Credits

Adonis Fung @ Yahoo, for the asynchronous phantomjs based crawler and DOM event navigator.
Simhash algorithm by Moses Charikar
Simhash implementation provided by mfonda/simhash.
Sqlmap
Arachni

Licence

Code licensed under the BSD-style license. See LICENSE file for terms.

gryffin's People

Contributors

Stargazers

Watchers

Forkers

hotelzululima gowebabc zhousichong 4point smoreface bhavyanshu ernestas-poskus mechkov yocruzer kevinqualters evermax james-baud mukteshkrmishra chauhd michalkoczwara crudbug langelee paran0ids0ul helterskelterr tvtritin emadshanab chennqqi gh0st7 andrei3008 endika znanl kartikeyap gamehacker jiangzhw bupt007 fashtimedotcom sdgdsffdsfff git20150901 whugintama duzhanyuan gelahcem maximf webvul jamalmo sxhao pentestit lljgithub thurday lz-securityteam jadore yukinying statik lopi juliustm hattricknz qbektrix minkyoungkook viscript winning1120xx liuqian1990 djmin zhuyue1314 devenlu tdr130 iph0n3 hafeez3000 frontenddeveloping reality9 isserene malei tuhaolam aroksetx codebeautiful alihalabyah securityigi samyoyo malyshkosergey ruyrocha security-goodies barnett-labs faint32 sna-ke wwwiretap binarydreams avldya barseghyanartur securextools cinderalla returnhere lynch1981 ahmatjan tharanga-abeyseela cnbird1999 avaudioplayer jeffreyer tobor ksmaheshkumar easyfmxu bahtiyarb-torba qhwlpg cn27001 sigma-random taogogo 710leo kaizhu256

gryffin's Issues

Results

So, at last I managed to make gryffin-standalone do a full (successful ?) scan.
After the run is finished I get this:

gryffin-standalone http://192.168.1.117:1912/login
=== Running Gryffin ===
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"Fingerprint","Msg":"Computed","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"IsDuplicatedPage","Msg":"Unique Page","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"ShouldCrawl","Msg":"Unique Link","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"Arachni.Scan","Msg":"Run as [arachni --checks xss* --output-only-positives --http-request-concurrency 1 --http-request-timeout 10000 --timeout 00:03:00 --scope-dom-depth-limit 0 --scope-directory-depth-limit 0 --scope-page-limit 1 --audit-with-both-methods --report-save-path /dev/null --snapshot-save-path /dev/null http://192.168.1.117:1912/login]","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"SQLMap.Scan","Msg":"Run as [sqlmap --batch --timeout=2 --retries=3 --crawl=0 --disable-coloring -o --text-only -v 0 --level=1 --risk=1 --smart --fresh-queries --purge-output --os=Linux --dbms=MySQL --delay=0.1 --time-sec=1 -u http://192.168.1.117:1912/login]","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"CrawlAsync","Msg":"Started","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"IsDuplicatedPage","Msg":"Duplicate Page","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"PhantomjsRenderer.Do","Msg":"[Cleanup] Terminating the crawl process.","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"Get Links","Msg":"Finished","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"SQLMap.Scan","Msg":"SQLMap return true","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"Arachni.Scan","Msg":"Arachni return true","Method":"GET","Url":"http://192.168.1.117:1912/login"}
{"Service":"ShouldCrawl","Msg":"Duplicate Link","Method":"GET","Url":"http://192.168.1.117:1912/afterLogin"}
{"Service":"Get Links","Msg":"Finished","Method":"POST","Url":"http://192.168.1.117:1912/afterLogin"}
=== End Running Gryffin ===

Did the output saves at another place ? is this all of the results ? what info can I get from this ?
Also, it was really too fast to be a full sqlmap run on the login form

P.S
If gryffin detects a form, it should maybe try to run sqlmap with the --forms flag

Add Dockerfile preconfigured Docker image

Just adding so people interested in (including myself) can track any updates.

Login the application through Gryffin.

@yukinying I have run gryffin successfully but wanted to know how should I pass uername and password to Gryffin for login and create session. For scanning of whole application it is needed to login the application through Gryffin.
For EX: We do it in SQLMap like,
./sqlmap.py --url="http://www.example.com/Login" --data="UserName=xyz&Password=xyz@123" --banner
Please suggest answer ASAP.

Cannot establish tcp connection to log listener

I'm pretty new to go, so maybe I missed something, I keep getting this error:

make test-mono 
go run cmd/gryffin-standalone/main.go "http://127.0.0.1:8081"
Cannot establish tcp connection to log listener.
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://127.0.0.1:8081"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x47b4e4]

goroutine 1 [running]:
io.(*multiWriter).Write(0xc82000af40, 0xc82008eb40, 0x50, 0x95, 0x50, 0x0, 0x0)
    /usr/lib/go/src/io/multi.go:43 +0xd4
encoding/json.(*Encoder).Encode(0xc820051d48, 0x6b8240, 0xc820014680, 0x0, 0x0)
    /usr/lib/go/src/encoding/json/stream.go:201 +0x16b
github.com/yahoo/gryffin.(*Scan).Log(0xc8200e0bb0, 0x6b8240, 0xc820014680)
    /home/unshadow/go/src/github.com/yahoo/gryffin/gryffin.go:508 +0x96
github.com/yahoo/gryffin.(*Scan).Logm(0xc8200e0bb0, 0x7ca7e0, 0x4, 0x7cb3b8, 0x7)
    /home/unshadow/go/src/github.com/yahoo/gryffin/gryffin.go:495 +0x121
main.main()
    /home/unshadow/Desktop/git-projects/gryffin/cmd/gryffin-standalone/main.go:173 +0x4e4

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/lib/go/src/runtime/asm_amd64.s:1745 +0x1
exit status 2
Makefile:31: recipe for target 'test-mono' failed
make: *** [test-mono] Error 1

Adding a screenshot feature

Hello,

Following the very same approach as Netflix Scumblr, it would be great if you could add a screenshot feature for each scanned site (each page ?).
I don't mean to advertise but I got a functional phantomjs script ready for that purpose.

Cheers.

Failed to crawl page: undefined is not an object (evaluating 'f[name].length')

Some pages at my project doesn't crawls at all:(
For example: https://partners.bitrix24.com/application.php
Add a some debug to page.onError in render.js, launched the phantomjs, catched a JS error:

$ phantomjs ./renderer/resource/render.js https://partners.bitrix24.com/application.php | head -1 | python -mjson.tool
{
    "type": "js error",
    "errorString": "TypeError: undefined is not an object (evaluating 'f[name].length')",
    "trace": [
        {
            "file": "./extractors.js",
            "line": 327,
            "function": "getForm"
        },
        {
            "file": "",
            "line": 0,
            "function": "forEach"
        },
        {
            "file": "./extractors.js",
            "line": 441,
            "function": "extractRequests"
        },
        {
            "file": "./extractors.js",
            "line": 452,
            "function": "_gryffin_onMainFrameReady"
        },
        {
            "file": "phantomjs://webpage.evaluate()",
            "line": 3,
            "function": ""
        },
        {
            "file": "phantomjs://webpage.evaluate()",
            "line": 4,
            "function": ""
        }
    ],
    "msgType": "error",
    "signature": "==lXlKfYWch7H9VdJgPCmJ=="
}

Steps to reproduce:

Testing page:

 $ http -b https://www.buglloc.com/static/gryffin/forms-unencoded-fields.html
<html>
<body>
<a href="/foo">foo</a>
<form name="with-query" action="/" method="post">
    <input type="text" name="foo[]" value="some" />
    <select name="bar[]">
        <option value="one">One</option>
        <option value="two">Two</option>
    </select>
    <input type="submit" name="save" value="Save">
</form>
</body>
</html>

Launch pantomjs:

 $ phantomjs ./renderer/resource/render.js https://www.buglloc.com/static/gryffin/forms-unencoded-fields.html | head -1 | python -mjson.tool

{
    "response": {
        "headers": {[...]},
        "contentType": "text/html; charset=utf-8",
        "status": 200,
        "url": "https://www.buglloc.com/static/gryffin/forms-unencoded-fields.html",
        "body": "<html><head></head><body>\n<a href=\"/foo\">foo</a>\n<form name=\"with-query\" action=\"/\" method=\"post\">\n\t<input type=\"text\" name=\"foo[]\" value=\"some\">\n\t<select name=\"bar[]\">\n\t\t<option value=\"one\">One</option>\n\t\t<option value=\"two\">Two</option>\n\t</select>\n\t<input type=\"submit\" name=\"save\" value=\"Save\">\n</form>\n\n\n</body></html>",
        "details": {
            "links": [],
            "forms": []
        }
    },
    "elasped": 584,
    "ok": 1,
    "msgType": "domSteady",
    "signature": "==lXlKfYWch7H9VdJgPCmJ=="
}

No links or forms in details

Efficiency of redundant fuzzers

Both sqlmap and arachni have SQL injection capabilities. Why run both? It doesn't look like there's an attempt to pare down either to a specific configuration or avoid running identical payloads.

Fuzzer integration and XSS testing

The fuzzing integration seems to be limited to launching against a single URL in a per-process model, as noted in #9.

How does this support testing for stored XSS that requires injection into one page, but checking for the payload's display in another page? Is a relevant fuzzer expected to perform its own crawl and output analysis to discover the display page? If so, how does a test against a subsequent URL re-use that fuzzer's state if only STDOUT is captured?

Installation error

Whenever I try to install gryffin, I get this error:

github.com/yahoo/gryffin

src/github.com/yahoo/gryffin/gryffin.go:226:24: error: reference to undefined field or method ‘Timeout’
client.(*http.Client).Timeout = time.Duration(3) * time.Second

Please assist,
Thank you.

Decoupling of crawl and audit (fuzzers)

The broad decoupling of the crawl and audit seems to limit how well this could grow in terms of accuracy and optimizations.

Each fuzzer apparently only receives state (cookies) and a URL. It doesn't benefit from content or patterns learned during the crawl, such as custom error pages. For example, this limits optimizations that would be useful for testing for persistent and stored XSS as noted in #14. It would also be useful for reducing false positives when looking for common paths or page names.

It's not clear how a fuzzer that discovers new links that the crawler missed, such as via content inspection or guessing path or page names, could feed those links back to be crawled and further audited unless the fuzzer explicitly does so.

A fuzzer that goes through steps to profile a page for default content or custom errors wouldn't be able to share that info with another fuzzer. (This is a limitation both of the architecture and the fuzzers -- they'd need to share a data model.)

The performance characteristics (concurrent requests, requests/sec.) of the crawl may differ significantly from the fuzzers unless each one is pre-configured with identical thresholds. This would cause high variance in the impact on the target.

gryffin/html-distance/cmd/html-distance

In https://github.com/yahoo/gryffin/tree/master/html-distance, it says,

to install, do

go install github.com/yahoo/gryffin/html-distance/cmd/html-distance

but I don't see the gryffin/html-distance/cmd/html-distance within this repo.
Please double-check. Thx.

Page deduplication drawbacks?

Hey guys,

The page deduplication approach is clever, but it sets a hard limit on the scanner's issue detection capabilities.

Since you're comparing HTML code to determine uniqueness, it guarantees that certain issues exposed via text nodes will not be able to be identified, as those pages will never make it to the underlying scanner.
For example, disclosure of sensitive information such as credit card numbers, SSN, etc.

In addition, it also prevents you from identifying server-side resources, like sensitive files and directories (backup, backdoors, etc.); the URL of a template-generated page whose path can be manipulated into returning a sensitive file will never make it to the scanner.

I can see that right now you're only interested in SQL and XSS issues, but the architecture seems limited to only identifying active issues.

Was that a conscious decision?

Cheers,
Tasos L.

Performance and coverage comparison of scan models

Hey guys,

I'm assuming that you've done extensive benchmarks while developing Gryffin, since that's pretty much unavoidable in these cases, and I'd like to urge you to publish them.
I'm sure that most of us technical folk would like to see the performance and coverage characteristics of different scan models.

Performance

Since Arachni is the only dependency that overlaps with Gryffin's features, it'd be interesting to see a comparison for a few scenarios:

Gryffin (all fuzzers) vs Arachni (configured with identical checks and scope)
Gryffin (Arachni only) vs Arachni (configured with identical checks and scope)
Gryffin (crawl only) vs Arachni (crawl only -- same scope, no checks loaded)

Along with permutations on the above such as:

Single node scan vs multi-node scan
Single webapp scan vs multi-webapp scan

In a crawl-only situation I'm assuming that Gryffin will do better, in a full scan situation though performance characteristics get trickier as Arachni has no distinct crawl/audit phases.
Since the crawl operations can be seen as a much smaller subset of the audit operations, the crawl becomes an unnecessary redundancy when performing a full scan.

Personally, I'd like to see in which cases Gryffin's distributed crawl and deduplication start overtaking Arachni's no-crawl, on-the-fly scan model.

Coverage

Have you tried Gryffin's crawler on coverage benchmarks such as WIVET?
We already have the score of pretty much every scanner available, so it'd be nice seeing how Gryffin does against existing ones.

Thanks for your time and for publishing Gryffin, looks very interesting. :)

Cheers,
Tasos L.

README.md - Image is missing

On the project page - one of the images ( https://camo2.githubusercontent.com/5b1cc806dab335a9f72fa6421a5ddf2189aa26fa/68747470733a2f2f7472617669732d63692e6f72672f7961686f6f2f6772796666696e2e7376673f6272616e63683d6d6173746572 ) is not loaded correctly due of missing image.

Crawler stopping too soon

I'm trying to run the crawler to extract links from a simple page using the following command:

phantomjs --ssl-protocol=any --ignore-ssl-errors=true --proxy=127.0.0.1:8080 --proxy-type=http render.js http://192.168.0.40:8899/pages/11.php

11.php is part of wivet, a web crawler test application. The response generated when browsing to 11.php is:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
	<head>
		<meta http-equiv="content-type" content="text/html; charset=windows-1250">
		<link type="text/css" rel="stylesheet" href="/style.css" />
  <script type="text/javascript" src="../js/jquery/jquery.js"></script>
  <script type="text/javascript" >
    $(document).ready(function(){
      $("#link").each(function(){this.href = "../innerpages/11_1f2e4.php";});
    });
  </script>
	</head>
	<body  class="body">
    <center>
      <a id="link" href="" target="body">click me</a>
      <a href="javascript:window.open('../innerpages'+'/11_2d3ff.php', 'windowopen', 'resizable=yes,width=500,height=400');">click me 2</a>
    </center>
	</body>
</html>

I see this in the browser I'm running in 127.0.0.1:8080 (see proxy param in the phantomjs call). I also see the jquery.js page being requested.

The output seen in stdout when running the command is:

{"response":{"headers":{"Date":["Thu, 02 Feb 2017 21:14:30 GMT"],"Server":["Apache/2.4.10 (Debian) PHP/5.6.11"],"X-Powered-By":["PHP/5.6.11"],"Set-Cookie":["PHPSESSID=2a81288b4a514a017c4d79bd89de6c51; path=/"],"Expires":["Thu, 19 Nov 1981 08:52:00 GMT"],"Cache-Control":["no-store, no-cache, must-revalidate, post-check=0, pre-check=0"],"Pragma":["no-cache"],"Vary":["Accept-Encoding"],"Content-Length":["726"],"Connection":["close"],"Content-Type":["text/html; charset=UTF-8"]},"contentType":"text/html; charset=UTF-8","status":200,"url":"http://192.168.0.40:8899/pages/11.php","body":"<html><head>\n\t\t<meta http-equiv=\"content-type\" content=\"text/html; charset=windows-1250\">\n\t\t<link type=\"text/css\" rel=\"stylesheet\" href=\"/style.css\">\n  <script type=\"text/javascript\" src=\"../js/jquery/jquery.js\"></script>\n  <script type=\"text/javascript\">\n    $(document).ready(function(){\n      $(\"#link\").each(function(){this.href = \"../innerpages/11_1f2e4.php\";});\n    });\n  </script>\n\t</head>\n\t<body class=\"body\">\n    <center>\n      <a id=\"link\" href=\"../innerpages/11_1f2e4.php\" target=\"body\">click me</a>\n      <a href=\"javascript:window.open('../innerpages'+'/11_2d3ff.php', 'windowopen', 'resizable=yes,width=500,height=400');\">click me 2</a>\n    </center>\n\t\n\n</body></html>","details":{"links":[{"text":"click me","url":"http://192.168.0.40:8899/innerpages/11_1f2e4.php"}],"forms":[],"jsLinkFeedback":true}},"elasped":531,"ok":1,"msgType":"domSteady","signature":"==lXlKfYWch7H9VdJgPCmJ=="}


{"action":"element.triggered","events":["click"],"keyChain":["root","body/center[1]/a[2]"],"childFrames":[{"headers":[{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},{"name":"Referer","value":"http://192.168.0.40:8899/pages/11.php"},{"name":"User-Agent","value":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1"}],"id":1,"method":"GET","time":"2017-02-02T21:14:30.560Z","url":"http://192.168.0.40:8899/innerpages/11_2d3ff.php","fromMainFrame":true,"navigationType":"Other"}],"msgType":"domChanged","signature":"==lXlKfYWch7H9VdJgPCmJ=="}

The first one is the response for the initial GET request. The second one seems to be a click on one of the links:

      <a id="link" href="" target="body">click me</a>
      <a href="javascript:window.open('../innerpages'+'/11_2d3ff.php', 'windowopen', 'resizable=yes,width=500,height=400');">click me 2</a>

My questions are:

Did you guys run gryffin against WIVET? Any results you can share?
Why is only one of the links clicked?
Why am I not seeing the HTTP request to 11_2d3ff.php in my proxy?
In the JSON printed in stdout I see:

"details":{"links":[{"text":"click me","url":"http://192.168.0.40:8899/innerpages/11_1f2e4.php"}]

If the crawler did not click on the second link, how was that URL extracted?

Is there something I'm doing wrong? I'm using phantomjs 2.1.1

Can you please add steps to run which is in WIP

Could you please add steps to run gryffin

gryffin-standalone: Timeout when rendering js

Steps I do to reproduce:

nsqlookupd -verbose=true
nsqd --max-msg-size=2313820682 --lookupd-tcp-address=127.0.0.1:4160
go run cmd/gryffin-distributed/main.go --storage=memory seed http://mysite.com

Output:

{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://mysite.com"}
2015/09/30 13:27:54 INF    1 (127.0.0.1:4150) connecting to nsqd
Seed http://mysite.com injected.
2015/09/30 13:27:54 INF    1 stopping
2015/09/30 13:27:54 INF    1 exiting router

go run cmd/gryffin-distributed/main.go --storage=memory crawl

Output:

2015/09/30 13:29:12 INF    2 [seed/primary] querying nsqlookupd http://127.0.0.1:4161/lookup?topic=seed
2015/09/30 13:29:12 INF    2 [seed/primary] (ano:4150) connecting to nsqd
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://mysite.com"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://mysite.com"}
{"Service":"PhantomjsRenderer.Do","Msg":"[Timeout] Terminating the crawl process.","Method":"GET","Url":"http://mysite.com"}
2015/09/30 13:30:19 INF    2 [seed/primary] querying nsqlookupd http://127.0.0.1:4161/lookup?topic=seed

Am I missing something ?

Incorrect urls from forms with GET method

Noticed strange 404 errors in the logs after launch a gryffin. Looked closer and found bug in form.toScan:

func (f *form) toScan(parent *gryffin.Scan) *gryffin.Scan {
    m := strings.ToUpper(f.Method)
    u := f.Url
    var r io.Reader
    if m == "POST" {
        r = ioutil.NopCloser(strings.NewReader(f.Data))
    } else {
        u += "&" + f.Data
    }

    [...]
}

As you can see if form method not a POST we just appends data fields to url (assuming that it ends with query string). I think this is very strange, for example:

Have a page with various forms with method="GET":

 $ http -b https://www.buglloc.com/static/gryffin/forms.html
<html>
<body>
    <form name="with-query" action="?query=foo&amp;bar=baz" method="GET">
        <input type="hidden" name="how" value="r" />
    </form>
    <form name="empty-query" action="?" method="GET">
        <input type="hidden" name="how" value="r" />
    </form>
    <form name="empty-action" action="" method="GET">
        <input type="hidden" name="how" value="r" />
    </form>
</body>
</html>

Launch gryffin, filter by crawl tasks
Actual results:

 $ go run cmd/gryffin-standalone/main.go https://www.buglloc.com/static/gryffin/forms.html | grep '"CrawlAsync"' 
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html","JobID":"6322FB059F4F8DB86217918ECA18B500"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html\u0026bar=baz\u0026how=r\u0026query=foo","JobID":"6322FB059F4F8DB86217918ECA18B500"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html?\u0026how=r","JobID":"6322FB059F4F8DB86217918ECA18B500"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html\u0026how=r","JobID":"6322FB059F4F8DB86217918ECA18B500"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html\u0026","JobID":"6322FB059F4F8DB86217918ECA18B500"}

Expected:

 $ go run cmd/gryffin-standalone/main.go https://www.buglloc.com/static/gryffin/forms.html | grep '"CrawlAsync"'
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html","JobID":"3843AB23E9C79A39BEBB678516A9747B"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html?bar=baz\u0026how=r\u0026query=foo","JobID":"3843AB23E9C79A39BEBB678516A9747B"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"https://www.buglloc.com/static/gryffin/forms.html?how=r","JobID":"3843AB23E9C79A39BEBB678516A9747B"}

Fuzzer integration issues

Hey guys,

I've noticed a few issues with the way that the fuzzers are currently integrated:

Spawning and monitoring is taking place via an STDOUT channel, which really isn't meant to be a programming interface but a user one[1]. UI output can change at any time and it really shouldn't be something to depend on for communicating data.
1. Keeping the entire output buffer in memory instead of processing it as it arrives and then releasing it can be a problem too, depending on the amount of output.
2. Output seems to be the sole source of data, instead for example also parsing a generated report at the end, this severely limits the amount of information that can be reported to the user.
Fuzzers are being cold-started for every resource, this will introduce a significant latency.

I can't speak for the rest but Arachni is built to be a distributed/integrable system, so you can solve all of these issues by requesting warmed up scanner processes from a Dispatcher (maintains a pool of initialized scanners, no spawn latency) and all integration (scanner dispatch, scan management and monitoring) happens via an RPC API.

[1] UNIX philosophy excluded.

is the project still maintained?

found there is no update in recent years. the documents like installation or usage are not clear. so is this project still maintained?

Fragment identifier treated as data fields for forms with GET method

I can be wrong, but i think that getForm in extractor.js builds a incorrect url for forms with GET method:

// process any parameters given as part of the url
qPos = url.indexOf('?');
if (qPos > 0 && qPos !== url.length - 1) { // latter case means last char not ?
    // url's params will be later combined with those collected in form, and used for deduplication
    urlparams = url.substring(qPos + 1).split('&');
    // url's params are considered of hidden types
    urlparams.forEach(function(param){
        if (param = param.split('=')[0])
            dataType[param] = 'hidden';
    });

    // for GET method, transfer url's params to values for deduplication
    if (method === 'get') {
        values = urlparams.concat(values);
        url = url.substring(0, qPos);
    }
}

Steps to Reproduce:

Have a some page with forms:

 $ http -b https://www.buglloc.com/static/gryffin/forms-fragment.html 
<html>
<body>
    <form name="fragment-only" action="#foo=bar" method="get">
        <input type="hidden" name="how" value="r" />
    </form>
    <form name="with-query" action="?foo=bar#bar=baz&any=foo" method="get">
        <input type="hidden" name="how" value="r" />
    </form>
</body>
</html>

Run phantomjs and looks at crawled forms:

  $ phantomjs ./renderer/resource/render.js https://www.buglloc.com/static/gryffin/forms-fragment.html | head -1 |  python -mjson.tool
{
    "response": {
        "headers": {[...]},
        "contentType": "text/html; charset=utf-8",
        "status": 200,
        "url": "https://www.buglloc.com/static/gryffin/forms-fragment.html",
        "body": "<html><head></head><body>\n\t<form name=\"fragment-only\" action=\"#foo=bar\" method=\"get\">\n\t\t<input type=\"hidden\" name=\"how\" value=\"r\">\n\t</form>\n\t<form name=\"with-query\" action=\"?foo=bar#bar=baz&amp;any=foo\" method=\"get\">\n\t\t<input type=\"hidden\" name=\"how\" value=\"r\">\n\t</form>\n\n\n\n</body></html>",
        "details": {
            "links": [],
            "forms": [
                {
                    "data": "how=r",
                    "dataType": {
                        "how": "hidden"
                    },
                    "method": "get",
                    "url": "https://www.buglloc.com/static/gryffin/forms-fragment.html#foo=bar"
                },
                {
                    "data": "any=foo&foo=bar#bar=baz&how=r",
                    "dataType": {
                        "any": "hidden",
                        "foo": "hidden",
                        "how": "hidden"
                    },
                    "method": "get",
                    "url": "https://www.buglloc.com/static/gryffin/forms-fragment.html"
                }
            ],
            "jsLinkFeedback": true
        }
    },
    "elasped": 549,
    "ok": 1,
    "msgType": "domSteady",
    "signature": "==lXlKfYWch7H9VdJgPCmJ=="
}

In first case (form action without query string) everything is fine:

"data": "how=r"
"method": "get"
"url": "https://www.buglloc.com/static/gryffin/forms-fragment.html#foo=bar"

But in second fragment treated as data fields:

"data": "any=foo&foo=bar#bar=baz&how=r"
"method": "get"
"url": "https://www.buglloc.com/static/gryffin/forms-fragment.html"

I think expected results for second form is:

"data": "foo=bar&how=r"
"method": "get"
"url": "http://bus-win.my/test.html#bar=baz&any=foo"

runtime error: invalid memory address or nil pointer dereference

Hi all,
@yukinying @bararchy @harisec sorry for my delayed response...as "gryffin-standalone: Timeout when rendering js" topic is now closed, I've opened this new one, because for me at least, after running "go get -v -u github.com/yahoo/gryffin/..." I am still facing this nil pointer issue.
Here is the stack trace:

$GOPATH/bin/gryffin-standalone http://zero.webappsecurity.com/
=== Running Gryffin ===
{"Service":"Main","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Poke","Msg":"Poking","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"CrawlAsync","Msg":"Started","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"PhantomjsRenderer.Do","Msg":"Running: render.js","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"Fingerprint","Msg":"Computed","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"IsDuplicatedPage","Msg":"Unique Page","Method":"GET","Url":"http://zero.webappsecurity.com/"}
{"Service":"SQLMap.Scan","Msg":"Run as [sqlmap --batch --timeout=2 --retries=3 --crawl=0 --disable-coloring -o --text-only -v 0 --level=1 --risk=1 --smart --fresh-queries --purge-output --os=Linux --dbms=MySQL --delay=0.1 --time-sec=1 -u http://zero.webappsecurity.com/]","Method":"GET","Url":"http://zero.webappsecurity.com/"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x8 pc=0x529b5f]

goroutine 26 [running]:
github.com/yahoo/gryffin/fuzzer/sqlmap.(*Fuzzer).Fuzz(0xc820047f88, 0xc8200d3290, 0x0, 0x7f8023f973a0, 0xc820155520)
/usr/local/go/src/src/github.com/yahoo/gryffin/fuzzer/sqlmap/sqlmap.go:84 +0x98f
main.linkChannels.func2.2(0xc82002a118, 0xc82000f590)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:102 +0x30
created by main.linkChannels.func2
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:104 +0xfd

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc82000f59c)
/usr/local/go/src/runtime/sema.go:43 +0x26
sync.(*WaitGroup).Wait(0xc82000f590)
/usr/local/go/src/sync/waitgroup.go:126 +0xb4
main.linkChannels(0xc8200d3290)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:144 +0x263
main.main()
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:176 +0x40a

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 5 [chan receive (nil chan)]:
github.com/yahoo/gryffin.NewGryffinStore.func1(0x0)
/usr/local/go/src/src/github.com/yahoo/gryffin/session.go:41 +0x8f
created by github.com/yahoo/gryffin.NewGryffinStore
/usr/local/go/src/src/github.com/yahoo/gryffin/session.go:44 +0x157

goroutine 6 [chan receive]:
main.linkChannels.func1(0xc82001c360, 0xc82001c3c0, 0xc82000f590, 0xc82001c300)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:55 +0x68
created by main.linkChannels
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:89 +0x17e

goroutine 7 [chan receive]:
main.linkChannels.func2(0xc82001c3c0, 0xc82000f590)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:92 +0x68
created by main.linkChannels
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:107 +0x1aa

goroutine 8 [chan receive]:
main.linkChannels.func3(0xc82001c300, 0xc82001c360)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:111 +0x68
created by main.linkChannels
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:122 +0x1d6

goroutine 18 [syscall]:
syscall.Syscall6(0x3d, 0x4cac, 0xc82004db84, 0x0, 0xc8200605a0, 0x0, 0x0, 0x728620, 0xc82001cc60, 0x3)
/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
syscall.wait4(0x4cac, 0xc82004db84, 0x0, 0xc8200605a0, 0x90, 0x0, 0x0)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:172 +0x72
syscall.Wait4(0x4cac, 0xc82004dbcc, 0x0, 0xc8200605a0, 0xc82002a1c8, 0x0, 0x0)
/usr/local/go/src/syscall/syscall_linux.go:256 +0x55
os.(_Process).wait(0xc82000b820, 0x15, 0x0, 0x0)
/usr/local/go/src/os/exec_unix.go:22 +0x105
os.(_Process).Wait(0xc82000b820, 0x0, 0x0, 0x0)
/usr/local/go/src/os/doc.go:45 +0x2d
os/exec.(_Cmd).Wait(0xc8200868c0, 0x0, 0x0)
/usr/local/go/src/os/exec/exec.go:380 +0x211
github.com/yahoo/gryffin/renderer.(_PhantomJSRenderer).Do(0xc820118060, 0xc8200d3290)
/usr/local/go/src/src/github.com/yahoo/gryffin/renderer/phantomjs.go:234 +0x76e
created by github.com/yahoo/gryffin.(*Scan).CrawlAsync
/usr/local/go/src/src/github.com/yahoo/gryffin/gryffin.go:340 +0x9b

goroutine 25 [runnable]:
main.linkChannels.func2.1(0xc82002a118, 0xc82000f590)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:95
created by main.linkChannels.func2
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:99 +0xd1

goroutine 15 [IO wait]:
net.runtime_pollWait(0x7f8023f96640, 0x72, 0xc82000e1a0)
/usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(_pollDesc).Wait(0xc82004ea70, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(_pollDesc).WaitRead(0xc82004ea70, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(_netFD).Read(0xc82004ea10, 0xc8200f3000, 0x1000, 0x1000, 0x0, 0x7f8023f91050, 0xc82000e1a0)
/usr/local/go/src/net/fd_unix.go:232 +0x23a
net.(_conn).Read(0xc82002a130, 0xc8200f3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
net/http.noteEOFReader.Read(0x7f8023f96d10, 0xc82002a130, 0xc8200d34f8, 0xc8200f3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1370 +0x67
net/http.(_noteEOFReader).Read(0xc82000b280, 0xc8200f3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
:126 +0xd0
bufio.(_Reader).fill(0xc82001c900)
/usr/local/go/src/bufio/bufio.go:97 +0x1e9
bufio.(_Reader).Peek(0xc82001c900, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:132 +0xcc
net/http.(_persistConn).readLoop(0xc8200d34a0)
/usr/local/go/src/net/http/transport.go:876 +0xf7
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:685 +0xc78

goroutine 20 [chan receive (nil chan)]:
main.linkChannels.func1.2(0xc820118060, 0xc82000f590, 0xc82001c300, 0xc82002a110)
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:76 +0x5c
created by main.linkChannels.func1
/usr/local/go/src/src/github.com/yahoo/gryffin/cmd/gryffin-standalone/main.go:85 +0x189

goroutine 16 [select]:
net/http.(_persistConn).writeLoop(0xc8200d34a0)
/usr/local/go/src/net/http/transport.go:1009 +0x40c
created by net/http.(_Transport).dialConn
/usr/local/go/src/net/http/transport.go:686 +0xc9d

goroutine 21 [syscall]:
syscall.Syscall(0x0, 0x6, 0xc820132001, 0x3dff, 0x20, 0x20, 0x6cf120)
/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x6, 0xc820132001, 0x3dff, 0x3dff, 0x0, 0x0, 0x0)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 +0x5f
syscall.Read(0x6, 0xc820132001, 0x3dff, 0x3dff, 0xc820150e70, 0x0, 0x0)
/usr/local/go/src/syscall/syscall_unix.go:160 +0x4d
os.(_File).read(0xc82002a1a8, 0xc820132001, 0x3dff, 0x3dff, 0x411069, 0x0, 0x0)
/usr/local/go/src/os/file_unix.go:211 +0x53
os.(_File).Read(0xc82002a1a8, 0xc820132001, 0x3dff, 0x3dff, 0xc8201552c0, 0x0, 0x0)
/usr/local/go/src/os/file.go:95 +0x8a
encoding/json.(_Decoder).refill(0xc82005b860, 0x0, 0x0)
/usr/local/go/src/encoding/json/stream.go:152 +0x287
encoding/json.(_Decoder).readValue(0xc82005b860, 0x1, 0x0, 0x0)
/usr/local/go/src/encoding/json/stream.go:128 +0x41b
encoding/json.(_Decoder).Decode(0xc82005b860, 0x6b3240, 0xc820150f30, 0x0, 0x0)
/usr/local/go/src/encoding/json/stream.go:57 +0x159
github.com/yahoo/gryffin/renderer.(_PhantomJSRenderer).extract(0xc820118060, 0x7f8023f97200, 0xc82002a1a8, 0xc8200d3290)
/usr/local/go/src/src/github.com/yahoo/gryffin/renderer/phantomjs.go:129 +0x14b
created by github.com/yahoo/gryffin/renderer.(*PhantomJSRenderer).Do
/usr/local/go/src/src/github.com/yahoo/gryffin/renderer/phantomjs.go:231 +0x72e

goroutine 22 [select]:
github.com/yahoo/gryffin/renderer.(_PhantomJSRenderer).wait(0xc820118060, 0xc8200d3290)
/usr/local/go/src/src/github.com/yahoo/gryffin/renderer/phantomjs.go:165 +0x17d
created by github.com/yahoo/gryffin/renderer.(_PhantomJSRenderer).Do
/usr/local/go/src/src/github.com/yahoo/gryffin/renderer/phantomjs.go:232 +0x760

I hope this sheds some light, because I really want to see my install working ...:)

Efficiency of page deduplication

(Issue #6 focuses on drawbacks for the test and audit phase. The focuses on applicability to crawling.)

This approach doesn't look very robust to improve crawling against real-world web apps. It requires a link to be requested before the crawler can decide whether the requested link was redundant.

The comparison also seems overly sensitive to tags that do not affect structure or navigation. For example, false positives can come from text nodes with large variations in formatting tags like <b> and <i>, pages that display tables with different amounts of rows, or articles with varying numbers of comments (where a comment may be in its own <div>)

For the flickr examples, the distance doesn't seem to consistently reflect duplicate content. For example, it appears to produce a 100% match for a user's /albums/ and /groups/ content, even though the /groups/ clearly points to additional navigation links that would be important to crawl.

How does the dedupe work for pages that dynamically create content? Is the HTML taken from the HTTP response, or from the version rendered in a browser? If it renders from a browser, at what point is the page considered settled versus an "infinite scroll" or dynamically refreshing page?

Are link+page combinations labeled with an authentication state? The content for a link can change significantly depending on whether the user is logged in.

yahoo / gryffin Goto Github PK

gryffin's Introduction

ARCHIVED

Gryffin (beta)

Coverage

Crawl Coverage

Page Deduplication

DOM Rendering and Navigation

Scan Coverage

Scale

Operating Gryffin

Pre-requisites

Installation

Run

TODO

Talks and Slides

Credits

Licence

gryffin's People

Contributors

Stargazers

Watchers

Forkers

gryffin's Issues

github.com/yahoo/gryffin

Performance

Coverage

Recommend Projects

Recommend Topics

Recommend Org