Code Monkey home page Code Monkey logo

diosts's Issues

New diosts output fields

  • "source" = "diosts-$version" (This will identify the script version and where the information came from in light of the broader diodb data corpus, as well as any other bots we add in the future)
  • "last_update" = Timestamp that the script was run
  • "contact_email" = Contact email if present, and separate from "contact_url"
  • "retrieval_url" = The specific and full URL that the security.txt data was retrieved from (This will be used for is-alive garbage collection checks later)

Rename the "program_name" field to "security_txt_domain"

Can we rename the "program_name" field to "security_txt_domain" denoting the domain the security.txt was retrieved from?

"program_name" in diodb is tied to the company, organization, or business unit responsible for the policy and intake channel which is a looser coupling than the data returned by diosts.

Looking at the merged data, I'm thinking that we should consider outputting the diosts data to a separate data store to allow for the differences between security.txt rendering of this information and that which we collect in the diodb.

panic: runtime error: index out of range

Conditions:
Running the script using the following list of domains on Ubuntu 18.04 LTS (go version go1.10.4 linux/amd64)

facebook.com
google.com
youtube.com
twitter.com
instagram.com
linkedin.com
microsoft.com
apple.com
wikipedia.org
plus.google.com

Result:

panic: runtime error: index out of range

goroutine 7 [running]:
github.com/disclose/diosts/internal/pkg/discloseio.FromSecurityTxt(0xc4201a8700, 0x7e9d16)
	/root/go/src/github.com/disclose/diosts/internal/pkg/discloseio/discloseio.go:71 +0x3a1
github.com/disclose/diosts/internal/app/run.(*Writer).Start.func1(0xc42010e300)
	/root/go/src/github.com/disclose/diosts/internal/app/run/writer.go:50 +0x195
created by github.com/disclose/diosts/internal/app/run.(*Writer).Start
	/root/go/src/github.com/disclose/diosts/internal/app/run/writer.go:34 +0x5c

Expected:
Normal completion.

panic: runtime error: index out of range

Condition:
go version go1.11.6 linux/arm
cat top10milliondomains.txt | ~/go/bin/diosts -t 100 2>diosts.log >securitytxt.json
Source: https://www.domcop.com/files/top/top10milliondomains.csv.zip

Expected result:
Completion of task

Result:

goroutine 116 [running]:
github.com/disclose/diosts/pkg/securitytxt.baseDomain(0x2706707, 0x12, 0x12, 0x7)
	/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:209 +0x10c
github.com/disclose/diosts/pkg/securitytxt.checkRedirect(0x2dc8f80, 0x2838978, 0x1, 0x2, 0x2c436e0, 0x27)
	/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:190 +0x12c
net/http.(*Client).checkRedirect(0x207ab40, 0x2dc8f80, 0x2838978, 0x1, 0x2, 0x0, 0x600001)
	/usr/lib/go-1.11/src/net/http/client.go:416 +0x4c
net/http.(*Client).do(0x207ab40, 0x2dc9380, 0x0, 0x0, 0x0)
	/usr/lib/go-1.11/src/net/http/client.go:607 +0x738
net/http.(*Client).Do(0x207ab40, 0x2dc9380, 0x2126f30, 0x27, 0x0)
	/usr/lib/go-1.11/src/net/http/client.go:509 +0x24
net/http.(*Client).Get(0x207ab40, 0x2126f30, 0x27, 0x2126f30, 0x27, 0x0)
	/usr/lib/go-1.11/src/net/http/client.go:398 +0x7c
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetBody(0x20787d0, 0x2126f30, 0x27, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:143 +0x94
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetDomainBody(0x20787d0, 0x2304f50, 0x7, 0x1)
	/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:114 +0x15c
github.com/disclose/diosts/pkg/securitytxt.(*DomainClient).GetSecurityTxt(0x20787d0, 0x2304f50, 0x7, 0x0, 0x0, 0x0)
	/home/pi/go/src/github.com/disclose/diosts/pkg/securitytxt/domain.go:88 +0xcc
github.com/disclose/diosts/internal/app/run.(*WorkerPool).work(0x207ab60, 0x208e200)
	/home/pi/go/src/github.com/disclose/diosts/internal/app/run/workerpool.go:52 +0xac
created by github.com/disclose/diosts/internal/app/run.(*WorkerPool).Run
	/home/pi/go/src/github.com/disclose/diosts/internal/app/run/workerpool.go:34 +0x54

Allow for comparison and automatic PR against diodb

The main function of diosts is to provide authoritative updates to diodb based on data it parses from security.txt files.

For each URL processed by diosts it would be ideal if diosts (or another small service that consumes diosts output, and potential the output of other similar scrapers) could:

  • Check if the insertion of a new object into diodb is appropriate,
  • Check if the updating of a key pair for an existing diodb object is appropriate, and
  • Formulate and push a PR to the diodb repo for review on a per object basis, vs in a batch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.