The news from social-protocols

"New" Page

This would have the same ordering as the HN new page, of course.

Plot follow-up tasks

Proper error handling
Show "1" at the top of the rank axis
Horizontal dotted line at 1.0 (upvote-Rate plot)
Show story information at the top of the page: the same information shown on the front page: title, author, submission time, score, estimated upvote rate, HN Rank, QN Rank, etc.
Vertical axis of first chart should always go from 1 to 90
Label of last chart should be "Upvote Rate"
Chart titles. Include story name? Include description, eg. "HN Rank vs QN Rank"
Include orange page header on top of page.
Sans-serif font in charts?
Definition of Expected Upvotes and Upvote Rate and or link to docs. Maybe a * in the charts where we use those words in the title or legend, and then below the chart: "* [concise definition of expected upvotes]" or "* see the About page for the definition of expected upvotes".
HTML Title and H!: Include story title and maybe id. Maybe: "Story Stats for 12345: Story Title"
Make images fit on mobile screen.
LRU cache plots

Layout Issue on Mobile (Increasing Left-Margin)

See the attached screenshot from my iPhone. Notice how the titles aren't quite aligned on the left:

Take time of day / day of week into account for calculating expectedUpvotes

Score Page with Plots

Make "X quality" into a link to a score page that shows stats and history of the story. Ideas:

Graph of rank over time
Graph of upvotes/attention/quality over time

Add github link to this repo to frontpage

Serve a prerendered, compressed frontpage directly from memory

Requesting the frontpage while a rankcrawler update is in progress is delayed, because sqlite delays that query. We should serve a prerendered page from memory instead.

cumulativeAttention calculation should take a time interval

Set Cache-control header for static files

Improve Penalty and Resubmission Time Calculation

The penalty field is populated with the higher of the most recently calculated penalty, and the value of the penalty field for the previous crawl. However, if the story did not appear in the previous crawl (e.g. it dropped out of the top 90), but does appear in this crawl, then it will not find the previously calculated penalty. This is because when looking for data from the previously crawl, we select on stories where:

sampleTime = (select max(sampleTime) from dataset where sampleTime != (select max(sampleTime) from dataset))

but this value actually is different for each story. The same problem exists for the resubmission time calculation.

"Diff" or "Compare" Page

A version of our "top" page that shows:

The rank that each story has on the hntop page
A "change" icon like in this page: https://www.tiobe.com/tiobe-index/
An "overriden" or "rejected" section with stories in the HN top page but not the QN top page.
Comparative stats: average quality, average age, average attention, sitewide expected upvotes

The difference in sitewide expected upvotes would be one way to measure comparative value created. This would be the sum of (quality*expectedUpvotesAtRank) for all stories on the page. If an upvote is a proxy of value for users, this is a measure of "net value": value per unit of attention consumed.

Crawler is only getting the first 30 stories on the new page.

The problem is we are crawling:

https://news.ycombinator.com/newest?p=2
https://news.ycombinator.com/newest?p=3

But for the new page it should be:

https://news.ycombinator.com/newest?n=31
https://news.ycombinator.com/newest?n=61

Final Edits to Readme

Remove Causal Model discussion to "Further Improvements" section. Clarify description of "Hypothetical Upvotes".

Add age penalty

How much does cumulative attention correlate with age?

Plot estimated penalties

Basic SEO Stuff (Essential)

Might as well try to capture a little SEO traffic.

Put "Hacker News" in title (e.g. "Quality News: Hacker News Rankings")
Put story title and "Hacker News" in title of /stats pages (e.g. "Hacker News Story Stats: What's New in Python 3.11")
Breadcrumbs. E.g. https://ourdomain > hacker news story stats > what's new in Python 3.11
301 redirect fly.io address to https://news.social-protocols.org/

Ask HN urls are wrong

It's only item?id=33600715, without a url.

Out Of Storage strategy

Since our database is growing indefinitely, we need a strategy to deal with limited storage.

At first, we could just delete old data from the database.
Later see how we can store the huge aggregated dataset.

Reverse Engineer and Apply Penalties

This blog post has some ideas: https://www.righto.com/2013/11/how-hacker-news-ranking-really-works.html

Strange rank data from New page

I am seeing strange newRank data on stories such as this one (newRank going from 31 to 1), then from 60 to >91.

https://social-protocols-news.fly.dev/stats?id=33608886

Replace old Terminology in code

quality -> upvoteRate
attention -> expectedUpvotes

Include Attention in Dataset Table

This will be useful for analysis, as well as creating graphs that show history vs. upvotes.

Add a license

Should we just use MIT?

Serve compressed static assets

Suspected false penalties

I see stories being penalized that don't seem like they should be penalized. Such as this one: a YC launch.

https://social-protocols-news.fly.dev/stats?id=33593456

Frontpage History Table

Table with aggregate stats (average age, score, weighted average quality) for each minute both the HN and QN front page.

Make Links under stories Identical to HN

Username links to user page
Age links to Item page

Also related, when there are 0 comments, link should say "discuss" instead of "zero comments"

SEO Stuff (Not Essential)

Put "Hacker News Rankings" or something in the page content (under Quality News)
Put story title in slug of /stats pages (e.g. /stats/whats-new-in-python-311-31888624)
Structured Data?
Meta descriptions and keywords. Include "ycombinator" and "hacker news" in both. And keywords like "hacker news stats python 3.11"
Setup Google search console and webmaster tools

Properly handle flagged and deleted stories.

Deal with flagged stories

Once high quality stories disappear from the official pages, they don't receive any more rank information and therefore don't accumulate more attention.

Cache-Control Header for Frontpage

We can sync these headers with our minutely page generation pattern. The crawl happens every minute on the minute, though it can take a few seconds. Roughly we can tell browsers to cache each page until, say, 10 seconds after the minute mark.

Make margins on mobile more similar to original HN

The larger margins in iphone result in fewer stories on the feont page.

Show upvote button, which links to comments page on HN

Include QN rank in dataset table.

Link page title to frontpage

Add case studies to Readme

Include example of over-ranked and under-ranked story with charts.

Update expectedUpvotes coefficients using causal model

expectedUpvoteShare (in deltaExpectedUpvotes.go) returns the share of upvotes historically received on average at each rank. We want instead the share of upvotes that the average story would receive if we decided to show it at that rank.

This is different for the same reason that the probability that a hospitalized person will die is not the same as the probability that you will die if you choose to visit the hospital. The effect of rank on upvote rate is confounded by the fact that highly upvoted stories are more likely to be placed at rank 1, in the same way that the effect of hospitals on death is confounded by the fact that people who are dying are more likely to be placed in a hospital.

Show original HN ranking

Custom gravity URL parameter

Twitter Cards

About Page

Can use contents of writeup.md

Use moving average upvoteRate estimate

The general idea is to put more weight on more recent data when calculating upvote rate. For example, we could use the last N units of attention (expectedUpvotes). We don't want to just use the last N datapoints, because if a story is receiving little attention those datapoints provide little information about the true upvote rate. On other hand is a story is at rank 1, the number of upvotes during 10 minutes is probably a good estimate of the true upvote rate.

Add favicon, manifest

https://realfavicongenerator.net/

serve the static files from a directory and have them cached in the browser for a week.

https://zetcode.com/golang/http-serve-static-files/

Detect and deal with second chance pool

track page views in backend for statistics

database migrations, instead of deleting the storage volume

Remove Upvote Link

[10:33 AM, 11/16/2022] Jonathan Warden: I think we should remove the upvote button. On HN that button only appears if users are logged in. I think that showing will frustrate some users especially if they are not logged in. It also takes up space.
[10:33 AM, 11/16/2022] Felix: ok, I'm fine with that.

social-protocols / news Goto Github PK

news's People

Stargazers

Watchers

Forkers

news's Issues

Recommend Projects

Recommend Topics

Recommend Org