The bulktracker from bsiegert

Fix code scanning alert - Overly permissive regular expression range

Tracking issue for:

https://github.com/bsiegert/BulkTracker/security/code-scanning/7

Package page has broken link to package location

https://releng.netbsd.org/bulktracker/pkg/718432 links to http://pkgsrc.se/joyent/percona56-client, but since the package is local to joyent, the latter page results in a 404 error.

Try go1.12

https://groups.google.com/forum/#!topic/google-appengine-go/MFTqqCSl-po

Set api_version: go1.9 in app.yaml.

http-only links in page templates

BulkTracker/templates/pkg_info.html

Line 7 in 7ad7e84

<a href="http://pkgsrc.se/{{.Category}}{{.Dir}}">{{.Category}}{{.Dir}}</a></dd>

BulkTracker/templates/start_page_lead.html

Line 3 in 7ad7e84

the <a href="http://www.NetBSD.org/">NetBSD</a> package collection.

BulkTracker/templates/start_page_lead.html

Line 6 in 7ad7e84

    
           <a class="btn btn-primary btn-lg" role="button" href="http://www.pkgsrc.org/">Learn more about pkgsrc</a>

Would be nice to use scheme-relative URLs here, or HTTPS URLs.

Clean up older builds in the background

Criteria are to be defined.

The idea would be to have an HTTP endpoint to launch the cleanup. This can be triggered from a cronjob, for example. Or perhaps we have an internal trigger.

Use CloudMailin for incoming emails

write handler that speaks the cloudmailin format
unsubscribe App Engine inbound mail service
subscribe cloudmailin
remove App Engine inbound mail handler

Fix code scanning alert - Incomplete string escaping or encoding

Tracking issue for:

https://github.com/bsiegert/BulkTracker/security/code-scanning/1

Provide a hermetic testing script

There should be a way to run an end-to-end test (or something) that adds a build to the database and does not require net access.

Name the branch on package build page

The page for a package build, such as https://releng.netbsd.org/bulktracker/pkg/17257784, names the:

package path
package name
build status
platform
compiler
build date
builder

But it does not name the branch that the package was built on. It would be nice if it displayed that too.

Improve multi-column package display

There used to be a multicolumn category selector on the front page. I replaced this with the selector widget. It turns out (IMHO) that this is not a good replacement after all.

One problem was always the multi-column display and the way it rendered differently in browsers. Maybe the solution in fine is to simply go with CSS3 multi-column layout.

column-width: 12em

'All builds' page shows error instead of listing all builds

https://releng.netbsd.org/bulktracker/builds shows this error message:

DataTables warning: table id=DataTables_Table_0 - Ajax error. For more information about this error, please see http://datatables.net/tn/7

That URL provides very nice instructions, telling me to look for the HTTP call that failed. In my case, it was:

https://releng.netbsd.org/bulktracker/undefinedjson/allbuilds/?_=1700781773151

The word undefined looks suspicious, and when I remove it, I get the actual JSON data.

Migrate off App Engine by January 2024

The Go runtime we use on App Engine classic will stop accepting new deploys on January 24, 2024.

There should be a production instance using the new SQL backend at that time. Then the App Engine version can forward to the new site.

The NetBSD Foundation has graciously accepted to host the instance! 🎉

Requirements from http://wiki.netbsd.org/projects/project/bulktracker/

track commits (additions, updates), show reports about which packages have not been bulk-built after the update on which platform
summary of broken packages, per platform (similar to #3)

Home page has unnormalized links to categories

https://releng.netbsd.org/bulktracker/ links to https://releng.netbsd.org/bulktracker//archivers/, this URL contains a double slash in the path. While it may be harmless, it doesn't look right.

Improve representation of failures for packages not attempted

Packages won't be built at all if a dep fails. I'd like to see more information about why as that is the point for bulk-foo.

An example with not enough info is

https://bulktracker.appspot.com/pkg/ag1zfmJ1bGt0cmFja2VyciILEgVidWlsZBiAgICzkv6bCwwLEgNwa2cYgICA0_nBgQsM

Track number of Datastore API calls for page renders

Some queries use manual joins in application code, and they are probably calling the Datastore APIs way too often.

Consider adding some custom metrics and pushing them to the Stackdriver API.

Fix Travis CI config

The CI tests are currently failing each time. Need to investigate and fix.

App Engine version: build stats broken since 2023-03-04

All the recent builds on https://bulktracker.appspot.com/builds show

0 failed / 0 indirect-failed / 0 ok

This was broken in the deployment on 20230304t165135 and working in the one before, on 20220716t175343.

Link from bulk build page to original report

Sometimes there is more information in the machine-readable bulk report than bulktracker shows. How do I get back to the original machine-readable bulk report, or at least back to the mail-index page referencing it, from a bulk build page like https://releng.netbsd.org/bulktracker/build/645?

(I don't remember offhand whether bulktracker scrapes the page or reads the machine-readable report, but either way it would be nice to get a link to the machine-readable report.)

Cross-correlate single package

The "grid" code is a mess and should be rewritten from scratch.

However, an easy and useful case is tracking all build results of a single package. This should be easy.

Included jQuery is vulnerable

https://snyk.io/vuln/npm:jquery?lh=3.3.1
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11358

Found by looking at Lighthouse audits, thanks Chrome!

Stop paginating

I pulled up a query, and got 10 results, and then buttons for pages. This breaks searching with browser find, and is generally awkward. I am guessing this is just a default trendy pattern and not intended.

This ticket asks to either just stop paginating, to add a "1000" option, and to add an "all" option and make it default. Or at least make 1000 defaullt.

Build status chart flickers when hovering over the infobox

On https://releng.netbsd.org/bulktracker/build/42, hover over the build status infobox:

When the mouse cursor is both over the pie chart circle and in the infobox rectangle, the infobox flickers.

Index results by build id and pkg id

The following index dramatically speeds up common queries:

CREATE INDEX results_i_build_pkg ON results (build_id, pkg_id);

It increases the size of the database by about 1/3, but compare, e.g., the GetResultsInCategory query (bottleneck of https://releng.netbsd.org/bulktracker/build/645/meta-pkgs):

Without index:

% time sqlite3 ./BulkTracker.db.~1~ "select r.*, p.* from results r join pkgs p on (r.pkg_id == p.pkg_id) where p.category = 'meta-pkgs/' and r.build_id = 645"
...
sqlite3 ./BulkTracker.db.~1~   2.32s user 0.40s system 99% cpu 2.732 total

With index:

% time sqlite3 ./BulkTracker.db "select r.*, p.* from results r join pkgs p on (r.pkg_id == p.pkg_id) where p.category = 'meta-pkgs/' and r.build_id = 645"
...
sqlite3 ./BulkTracker.db   0.01s user 0.01s system 98% cpu 0.015 total

GetSingleResultByPkgName (bottleneck of https://releng.netbsd.org/bulktracker/pkg/17227701):

Without index:

% time sqlite3 ./BulkTracker.db.~1~ "select r.*, p.* from results r, pkgs p where r.build_id == 644 and r.pkg_id == p.pkg_id and r.pkg_name = 'libreoffice-24.2.1.2nb2'"                           
17225614|644|7533|libreoffice-24.2.1.2nb2|2||3|7533|misc/|libreoffice
sqlite3 ./BulkTracker.db.~1~   2.31s user 0.46s system 99% cpu 2.801 total

With index:

% time sqlite3 ./BulkTracker.db "select r.*, p.* from results r, pkgs p where r.build_id == 644 and r.pkg_id == p.pkg_id and r.pkg_name = 'libreoffice-24.2.1.2nb2'"
17225614|644|7533|libreoffice-24.2.1.2nb2|2||3|7533|misc/|libreoffice
sqlite3 ./BulkTracker.db   0.02s user 0.01s system 97% cpu 0.024 total

Given the amount of CPU time mollari is spending in bulktracker, I think this couple hundred megabytes of space is worth it.

SQL queries?

Can I do SQL queries against the bulktracker database, or can I download a snapshot to load into sqlite3 for my own queries?

Upgrade to new App Engine Go SDK

If you're currently using the bare appengine packages
(that is, not these ones, imported via google.golang.org/appengine),
then you can use the aefix tool to help automate an upgrade to these packages.

Run go get google.golang.org/appengine/cmd/aefix to install it.

Feature request: query by MAINTAINER

Thanks for BulkTracker! As a person maintaining dozens of packages, I wish BulkTracker could help me tell at a glance whether anything I'm specifically responsible for is currently broken anywhere.

Upgrade to a Supported jQuery Release

Everything before jQuery 3 is no longer supported. There are probably vulnerabilities in the ancient version that BulkTracker uses.

add Migrate Plugin
verify nothing breaks
when safe, remove Migrate Plugin

Read xz compressed bulk reports

Apparently, @jperkin now writes xz compressed bulk build reports. BulkTracker should support them, or we will not have any data from those builds.

Use sensible datastore batch sizes

From StackDriver Insights:

Increase the value of query batch size to reduce the number of datastore.next() calls

URL: /builds

sort categories down first

The list of categories reads across a line and then the next line. This is very unintuitive scanning as I expect columns like ls. Maybe it's just me, but I think aligning with ls sorting would be good.

Link from package build to bulk build it was part of

How do I get from https://releng.netbsd.org/bulktracker/pkg/17257784 to https://releng.netbsd.org/bulktracker/build/645?

It would be nice if the package build had a link back to the bulk build page. (And also the category listing like https://releng.netbsd.org/bulktracker/build/645/meta-pkgs/bulk-test-essential, even though I see I can recover it from the URL.)

Change xz compression library

Use https://pkg.go.dev/github.com/ulikunitz/xz instead of the xi2 one.

it has an actual sensible license (3-clause BSD)
it has had actual development since 2017

Inconsistent logic for results.failed_deps

When ingesting reports, bulktracker stores results.failed_deps as a space-separated string:

BulkTracker/bulk/bulk.go

Line 207 in 7ad7e84

pkgs[i].FailedDeps = strings.Join(f, " ")

But when generating pages, bulktracker parses results.failed_deps as a comma-separated string:

BulkTracker/pages/bulktracker.go

Line 238 in 7ad7e84

failedDeps := strings.Split(res.FailedDeps, ",")

This mismatch means that a result with more than one failed dependency shows nothing, because the space-separated string doesn't appear in any results.pkg_name, like in this example:

https://releng.netbsd.org/bulktracker/pkg/17257784

This should show mutt-2.2.13 and libreoffice-24.2.1.2nb2, but instead it's blank right now.

All that said, it might be better to use a separate table to obviate the need for formatting and parsing and make queries easier to write (with indices to speed up both directions -- find who's breaking a package, find which packages are broken by a package):

CREATE TABLE failed_deps (
    dependent INTEGER REFERENCES results,
    dependency INTEGER REFERENCES results,
    PRIMARY KEY(dependent, dependency)
) WITHOUT ROWID;
CREATE INDEX failed_deps_i_dependent ON failed_deps (dependent);
CREATE INDEX failed_deps_i_dependency ON failed_deps (dependency);

Use direct HTTP Get instead of urlfetch transport

Ideally, this should be done in a way that falls back to urlfetch, at least initially, to not break fetching of builds completely.

The "new" go112 runtime should support direct calls to http.Get just fine, I believe.

Replace delay functions with Cloud Tasks invocations

https://godoc.org/cloud.google.com/go/cloudtasks/apiv2

The existing queue still exists (name: projects/bulktracker/locations/us-central1/queues/default). I get to rewrite the logic from the delay package, yay!

No new build summary records since 2014-11-13

As reported by a user: The latest build shown on the landing page is from 2014-11-13 (which, incidentally, is when the latest push happened). However, pkg entities (individual package results) are still being written. Find out what happened.

Build result page has broken links to categories

On https://releng.netbsd.org/bulktracker/build/42, the 'Categories' list links to https://releng.netbsd.org/build/42/archivers/. This URL is missing the /bulktracker path.

Add a way to run under a subdirectory

The intended URL for the TNF instance of BulkTracker is http://releng.NetBSD.org/bulktracker/.

Support running under a URL other than /. This means fixing all the links to be relative.

Daily Datastore quota limit reached

BulkTracker was updated to use "Cloud Firestore in Datastore Mode". Ever since then (I think), the app has been mostly unavailable.

The reason is that the "daily quota is exceeded", example:

2021/12/02 20:28:07 rpc error: code = ResourceExhausted desc = Quota exceeded.

Checking the Quotas page in Pantheon (https://console.cloud.google.com) says that there are no daily quotas, so this is both unexpected and concerning. I wish I knew what sort of daily quota I am exceeding.

For now, I disabled the Task Queue in Cloud Tasks, which should stop ingestion of new build results.

I probably should do a bulk deletion of data that is older than, say, one year. But even so:

WTF is the type of quota the app is exceeding and why?
Where can I track that particular quota?
Is this due to ingestion of new data?

And of course: How to mitigate and fix?

I had a private email conversation that https://bulktracker.appspot.com/pkgresults/ucon64 shows results for Firefox instead. While this particular bug is fixed, you only get results if you have category and name: https://bulktracker.appspot.com/pkgresults/emulators/ucon64.

Strictly speaking, the category is redundant, so it should work without.

Improve handling of bulk build reports

The red error bar saying "try recreating the index" is a pretty bad UI.

store the last fetch error in the database and surface it
use a custom queue to increase the time between fetches
for extra points: make the reindex show the status interactively.

Replace appengine log with logrus + Stackdriver formatter

From https://stackoverflow.com/questions/67554711/replacement-for-go-app-engine-google-golang-org-appengine-log-package-with-lo:

import (
        log "github.com/sirupsen/logrus"
        stackdriver "github.com/TV4/logrus-stackdriver-formatter" // adapter
)

log.SetFormatter(stackdriver.NewFormatter())
log.SetLevel(log.DebugLevel) // default is Info

log.Debug("This is debug")
log.Info("This is info")
log.Error("This is error")
log.Warn("This is warn")

Check limits on all datastore queries

Unbounded datastore queries can be dangerous. http://bulktracker.appspot.com/build/ag1zfmJ1bGt0cmFja2VychILEgVidWlsZBiAgIDAuJ-TCgw used to have one that returned over 8000 results.

Check the limits on all datastore queries. Add some where missing.

Category page has broken links to packages

https://releng.netbsd.org/bulktracker/pkgtools/

This page links to https://releng.netbsd.org/bulktracker/pkgtools//pkglint. Due to the //, the target URL generates this error message:

Failed to decode the package name from the URL.

Use a select2 element (w/ AJAX) for selecting packages

write an endpoint for select2 AJAX autocomplete -- ideally doing full string match on category/pkgname.
nuke the submenu thing on the home page, as it breaks in Chrome when the window is reasonably wide
replace text box with select2

Package page has broken link to build results

https://releng.netbsd.org/bulktracker/pkgtools/pkglint

The links in the 'Package Name' column have the form:

https://releng.netbsd.org/pkg/1507506

This form is missing the /bulktracker. Same for the 'Build Timestamp' column.

bsiegert / bulktracker Goto Github PK

bulktracker's People

Contributors

Stargazers

Watchers

Forkers

bulktracker's Issues

Recommend Projects

Recommend Topics

Recommend Org