disclose / diosts Goto Github PK
View Code? Open in Web Editor NEWA Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.
License: MIT License
A Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.
License: MIT License
We should decide on and add a LICENSE and add a simple copyright/license header in each file.
Can we rename the "program_name" field to "security_txt_domain" denoting the domain the security.txt was retrieved from?
"program_name" in diodb is tied to the company, organization, or business unit responsible for the policy and intake channel which is a looser coupling than the data returned by diosts.
Looking at the merged data, I'm thinking that we should consider outputting the diosts data to a separate data store to allow for the differences between security.txt rendering of this information and that which we collect in the diodb.
The securitytxt.org standard has recently been updated to make "Expiry" a mandatory field, and the format adjusted to ISO 8601. Details are on the front page of the securuitytxt.org website.
Conditions:
Running the script using the following list of domains on Ubuntu 18.04 LTS (go version go1.10.4 linux/amd64)
facebook.com
google.com
youtube.com
twitter.com
instagram.com
linkedin.com
microsoft.com
apple.com
wikipedia.org
plus.google.com
Result:
panic: runtime error: index out of range
goroutine 7 [running]:
github.com/disclose/diosts/internal/pkg/discloseio.FromSecurityTxt(0xc4201a8700, 0x7e9d16)
/root/go/src/github.com/disclose/diosts/internal/pkg/discloseio/discloseio.go:71 +0x3a1
github.com/disclose/diosts/internal/app/run.(*Writer).Start.func1(0xc42010e300)
/root/go/src/github.com/disclose/diosts/internal/app/run/writer.go:50 +0x195
created by github.com/disclose/diosts/internal/app/run.(*Writer).Start
/root/go/src/github.com/disclose/diosts/internal/app/run/writer.go:34 +0x5c
Expected:
Normal completion.
Condition:
go version go1.11.6 linux/arm
cat top10milliondomains.txt | ~/go/bin/diosts -t 100 2>diosts.log >securitytxt.json
Source: https://www.domcop.com/files/top/top10milliondomains.csv.zip
Expected result:
Completion of task
Result:
goroutine 116 [running]:
github.com/disclose/diosts/pkg/securitytxt.baseDomain(0x2706707, 0x12, 0x12, 0x7)
/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:209 +0x10c
github.com/disclose/diosts/pkg/securitytxt.checkRedirect(0x2dc8f80, 0x2838978, 0x1, 0x2, 0x2c436e0, 0x27)
/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:190 +0x12c
net/http.(*Client).checkRedirect(0x207ab40, 0x2dc8f80, 0x2838978, 0x1, 0x2, 0x0, 0x600001)
/usr/lib/go-1.11/src/net/http/client.go:416 +0x4c
net/http.(*Client).do(0x207ab40, 0x2dc9380, 0x0, 0x0, 0x0)
/usr/lib/go-1.11/src/net/http/client.go:607 +0x738
net/http.(*Client).Do(0x207ab40, 0x2dc9380, 0x2126f30, 0x27, 0x0)
/usr/lib/go-1.11/src/net/http/client.go:509 +0x24
net/http.(*Client).Get(0x207ab40, 0x2126f30, 0x27, 0x2126f30, 0x27, 0x0)
/usr/lib/go-1.11/src/net/http/client.go:398 +0x7c
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetBody(0x20787d0, 0x2126f30, 0x27, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:143 +0x94
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetDomainBody(0x20787d0, 0x2304f50, 0x7, 0x1)
/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:114 +0x15c
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetSecurityTxt(0x20787d0, 0x2304f50, 0x7, 0x0, 0x0, 0x0)
/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:88 +0xcc
github.com/disclose/diosts/internal/app/run.(*WorkerPool).work(0x207ab60, 0x208e200)
/home/pi/go/src/github.com/disclose/diosts/internal/app/run/workerpool.go:52 +0xac
created by github.com/disclose/diosts/internal/app/run.(*WorkerPool).Run
/home/pi/go/src/github.com/disclose/diosts/internal/app/run/workerpool.go:34 +0x54
The main function of diosts is to provide authoritative updates to diodb based on data it parses from security.txt files.
For each URL processed by diosts it would be ideal if diosts (or another small service that consumes diosts output, and potential the output of other similar scrapers) could:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.