dailybruin / kerckhoff Goto Github PK
View Code? Open in Web Editor NEWFlatpage/static site manager for the Daily Bruin (proxy for Google Drive)
Home Page: https://kerckhoff.dailybruin.com
Flatpage/static site manager for the Daily Bruin (proxy for Google Drive)
Home Page: https://kerckhoff.dailybruin.com
When a package is deleted, it should not delete the google drive folder.
Also, when you press the delete button, it should ask for confirmation that you want to delete this package.
If a fetch fails on the frontend, the package model still has processing=True, which causes it to seem like the package is always fetching. Reset the processing field on a failed request (timeout, bad request, etc).
Tried setting up Kerckhoff for the first time and I got this error after running docker-compose up
:
I noticed that webpack-stats.json
is in both .gitignore
and .dockerignore
. I'm not familiar with it, but I would assume that it needs to be autogenerated by webpack? Why then was it not autogenerated on my machine during setup? I'm guessing what's happening is it's looking for webpack-stats.json
before webpack runs?
For this flatpage, external sites needed to upload a video to s3. They requested a way for kerckhoff to upload videos to s3, similar to how it does for images.
The one difference is that the video should be "streamable". I believe in s3, there should be an option called publically playable which should allow video streaming. But I haven't looked into this.
YAML is hard to write for non-technical people, and is prone to parsing errors that are hard to debug and track down.
Short term solution: Use AML to parse frontmatter
Long term solution: Enforce and extend metadata schemas on PackageSet level
Thanks @yyc for the suggestion!
right now assets are downloaded before we calculate a MD5 to determine if they have changed which is not cheap on bandwidth. one idea is to create a .kerckhoff-meta
to keep track of this in the google drive, and the api also offers some interesting metadata extensions to make this easier.
Turns out google drive store last edited values - we should use those
Also fill the article.aml with some example aml. You can find some good example aml in the package repo dev
When sarthak uploaded a gif to kerckhoff, it did not play
https://drive.google.com/file/d/1IEH2hYZWveh3P-ru7NuIToVUMklluk7Y/view
Turned into
Sarthak (the 2020-21 external sites editor) said that there is a setting in aws
For the new PR #89
Update footer to only have relevant links
currently, images are all kept in memory, so if you process a lot of images at the same time, the server just crashes. you should refactor this so only a few images are kept in memory and others are in disk. so we won't encounter out of memory issues.
When generating a "Drive folder id" from a "Drive folder url," the current parser does not handle the ?open
or ?share
sections of the Google Sharing URL properly and they get put into the ID.
the current library we're using for parsing AML doesn't handle a number of edge cases properly. the spec for archieml is actually rather straightforward and it might actually be helpful to write a parser ourselves using something like https://github.com/dabeaz/sly or ply
This is cool and I want to contribute but I don't know how ๐ข. Can you add a readme with details on how to get this set up on my computer?
The build hook should trigger be able to call some api endpoint whenever the AML changes.
The purpose of it is to rebuild static site pages whenever the data changes. As of now, the static sites just fetch from kerckhoff every time the user loads them. However, if we build the static sites with the data, the frontend would not have to do a fetch and the page would load faster and be more efficient overall. And even if kerckhoff crashes, the static site will still work.
his is something experiemental we were thinking for design.dailybruin.com
Currently, prints are used, but they won't be displayed in Kubernetes logs. We should replace print statements with logger.log
Example:
https://drive.google.com/drive/u/4/folders/1mta9hfI-HmzWQSJJOTlqJ1NofDIXetDi
In Google drive, we have the file IMG_7274.JPG
present, and it is the cover image according to metadata of article.md
. However, in https://kerckhoff.dailybruin.com/api/packages/prime-old/still-breathing-still-building/ , the entry is not present in images.s3
.
The old AML parser has some weird bugs with certain edge cases.
Tianyu made a better AML parser. We have integrated that into new kerckhoff but not old kerckhoff. Here is the link the code: dailybruin/kerckhoff-server#23
If you log into kerckhoff.dailybruin.com, you will see "internal server error".
What shows up on user side:
They put a photo in google drive but it never shows up on kerckhoff.dailybruin.com
In the rancher logs, it prints these lines, even for the photos which aren't showed to be uploaded in kerckhoff.dailybruin.com:
4/28/2021 7:21:46 PM2021-04-29 02:21:46,031 kerckhoff INFO BakurMadini.jpg has not been modified since last fetch.
4/28/2021 7:21:46 PM2021-04-29 02:21:46,031 kerckhoff INFO CarlKing.jpg has not been modified since last fetch.
4/28/2021 7:21:46 PM2021-04-29 02:21:46,034 kerckhoff INFO BrandonMcLelland.jpg has not been modified since last fetch.
4/28/2021 7:21:46 PM2021-04-29 02:21:46,034 kerckhoff INFO AngelinaQuint.jpg has not been modified since last fetch.
4/28/2021 7:21:46 PM2021-04-29 02:21:46,034 kerckhoff INFO AngelinaQuint.jpg has not been modified since last fetch.
4/28/2021 7:21:46 PM2021-04-29 02:21:46,035 kerckhoff INFO ArtharvaKulkarni.jpg has not been modified since last fetch.
Right now, the files in the Kerckhoff Google Drive are owned by whoever created the package. However, when they leave, the media emails get deleted, so these files are lost. Using an admin email to create these files would prevent this issue.
Currently, if a lot of packaging requests are sent to Kerkhoff, the peak CPU usage exceeds Kubernetes resource limit as in last night, which causes worker timeout:
We can add a task scheduler so large packaging such as for prime can be done asynchronously and can improve our availability.
Also #82 is needed. Currently, if a fetch fails, we cannot re-fetch it easily.
Some packages take very long to fetch, making it seem like the fetch has failed even though it's still going on. Add a progress bar, or maybe something that indicates the progress of the fetch (e.g. fetching image 1 of 4) on the frontend so that the user knows not to refresh the page.
In the same vein as #82 (issues with slow/failing fetches)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.