Code Monkey home page Code Monkey logo

news's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

news's Issues

"New" Page

This would have the same ordering as the HN new page, of course.

Plot follow-up tasks

  • Proper error handling
  • Show "1" at the top of the rank axis
  • Horizontal dotted line at 1.0 (upvote-Rate plot)
  • Show story information at the top of the page: the same information shown on the front page: title, author, submission time, score, estimated upvote rate, HN Rank, QN Rank, etc.
  • Vertical axis of first chart should always go from 1 to 90
  • Label of last chart should be "Upvote Rate"
  • Chart titles. Include story name? Include description, eg. "HN Rank vs QN Rank"
  • Include orange page header on top of page.
  • Sans-serif font in charts?
  • Definition of Expected Upvotes and Upvote Rate and or link to docs. Maybe a * in the charts where we use those words in the title or legend, and then below the chart: "* [concise definition of expected upvotes]" or "* see the About page for the definition of expected upvotes".
  • HTML Title and H!: Include story title and maybe id. Maybe: "Story Stats for 12345: Story Title"
  • Make images fit on mobile screen.
  • LRU cache plots

Score Page with Plots

Make "X quality" into a link to a score page that shows stats and history of the story. Ideas:

  • Graph of rank over time
  • Graph of upvotes/attention/quality over time

Improve Penalty and Resubmission Time Calculation

  • The penalty field is populated with the higher of the most recently calculated penalty, and the value of the penalty field for the previous crawl. However, if the story did not appear in the previous crawl (e.g. it dropped out of the top 90), but does appear in this crawl, then it will not find the previously calculated penalty. This is because when looking for data from the previously crawl, we select on stories where:

    sampleTime = (select max(sampleTime) from dataset where sampleTime != (select max(sampleTime) from dataset))

but this value actually is different for each story. The same problem exists for the resubmission time calculation.

"Diff" or "Compare" Page

A version of our "top" page that shows:

  1. The rank that each story has on the hntop page
  2. A "change" icon like in this page: https://www.tiobe.com/tiobe-index/
  3. An "overriden" or "rejected" section with stories in the HN top page but not the QN top page.
  4. Comparative stats: average quality, average age, average attention, sitewide expected upvotes

The difference in sitewide expected upvotes would be one way to measure comparative value created. This would be the sum of (quality*expectedUpvotesAtRank) for all stories on the page. If an upvote is a proxy of value for users, this is a measure of "net value": value per unit of attention consumed.

Final Edits to Readme

Remove Causal Model discussion to "Further Improvements" section. Clarify description of "Hypothetical Upvotes".

Basic SEO Stuff (Essential)

Might as well try to capture a little SEO traffic.

  • Put "Hacker News" in title (e.g. "Quality News: Hacker News Rankings")
  • Put story title and "Hacker News" in title of /stats pages (e.g. "Hacker News Story Stats: What's New in Python 3.11")
  • Breadcrumbs. E.g. https://ourdomain > hacker news story stats > what's new in Python 3.11
  • 301 redirect fly.io address to https://news.social-protocols.org/

Out Of Storage strategy

Since our database is growing indefinitely, we need a strategy to deal with limited storage.

At first, we could just delete old data from the database.
Later see how we can store the huge aggregated dataset.

Frontpage History Table

Table with aggregate stats (average age, score, weighted average quality) for each minute both the HN and QN front page.

SEO Stuff (Not Essential)

  • Put "Hacker News Rankings" or something in the page content (under Quality News)
  • Put story title in slug of /stats pages (e.g. /stats/whats-new-in-python-311-31888624)
  • Structured Data?
  • Meta descriptions and keywords. Include "ycombinator" and "hacker news" in both. And keywords like "hacker news stats python 3.11"
  • Setup Google search console and webmaster tools

Deal with flagged stories

Once high quality stories disappear from the official pages, they don't receive any more rank information and therefore don't accumulate more attention.

Cache-Control Header for Frontpage

We can sync these headers with our minutely page generation pattern. The crawl happens every minute on the minute, though it can take a few seconds. Roughly we can tell browsers to cache each page until, say, 10 seconds after the minute mark.

Update expectedUpvotes coefficients using causal model

expectedUpvoteShare (in deltaExpectedUpvotes.go) returns the share of upvotes historically received on average at each rank. We want instead the share of upvotes that the average story would receive if we decided to show it at that rank.

This is different for the same reason that the probability that a hospitalized person will die is not the same as the probability that you will die if you choose to visit the hospital. The effect of rank on upvote rate is confounded by the fact that highly upvoted stories are more likely to be placed at rank 1, in the same way that the effect of hospitals on death is confounded by the fact that people who are dying are more likely to be placed in a hospital.

Use moving average upvoteRate estimate

The general idea is to put more weight on more recent data when calculating upvote rate. For example, we could use the last N units of attention (expectedUpvotes). We don't want to just use the last N datapoints, because if a story is receiving little attention those datapoints provide little information about the true upvote rate. On other hand is a story is at rank 1, the number of upvotes during 10 minutes is probably a good estimate of the true upvote rate.

Remove Upvote Link

[10:33 AM, 11/16/2022] Jonathan Warden: I think we should remove the upvote button. On HN that button only appears if users are logged in. I think that showing will frustrate some users especially if they are not logged in. It also takes up space.
[10:33 AM, 11/16/2022] Felix: ok, I'm fine with that.

Misc. UI Changes For Consistency with HN

  • Highlight "hntop" in white on the hntop page (the original HN does the same for all pages but the front page)
  • Show "Discuss" instead of "0 Comments" when there are no comments
  • "N minutes ago" should link to the story page
  • Username should link to user page
  • Show domain after story title
  • Mobile/smalls viewports:
    • slightly larger right margin (just a few pixels it appears to me)
    • wrap after a |
  • Do we want the same margins around the page content (grey background) as the original
  • when you hover over a relative time (e.g. 2 days ago) on HN, it shows the iso timestamp.
  • usernames for new users appear in green.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.