Comments (16)
I can increase the resolution of the images too when I do this, since they won't need to be small enough to not take up too much space any more.
I can use the til.simonwillison.net
S3 bucket for this.
from til.
Images are currently generated by shot-scraper
run from this Python script:
Lines 15 to 36 in e2e4819
from til.
Huh... those are PNGs. I bet they'd be a lot smaller if they were JPEGs, and even retina JPEGs might be smaller while still displaying well.
from til.
Ran this locally:
datasette . --get /sqlite/multiple-indexes > generate.html
Then:
shot-scraper shot generate.html -w 800 -h 400 --retina
Got this 216KB image:
Tried a JPEG too - quality 80 was almost as big, but this got a smaller image (159KB):
shot-scraper shot generate.html -w 800 -h 400 --retina --quality 60
from til.
Biggest question to decide is how to tell if an image has been created in S3 or not.
I'm tempted to do it based on the filename: use the shot hash as that name, do a quick list-files operation to see what files exist already, create the ones that don't.
from til.
That should run in GitHub Actions and generate JPEGs for every post and upload them to S3.
https://github.com/simonw/til/actions/runs/4842339363/jobs/8629221973
from til.
It's working...
% s3-credentials list-bucket til.simonwillison.net
[
{
"Key": "0cf1e455f161435a4aea07480c27da89.jpg",
"LastModified": "2023-04-30 03:54:06+00:00",
"ETag": "\"c1ef69673fda4ebf1cd1cfa41d8dc255\"",
"Size": 90039,
"StorageClass": "STANDARD"
},
{
"Key": "1447c8cdd4caa68e5514a1bb5b9f9f49.jpg",
"LastModified": "2023-04-30 03:54:12+00:00",
"ETag": "\"4adfdd03def8e54c651451f5b56e43b9\"",
"Size": 111841,
"StorageClass": "STANDARD"
},
{
"Key": "14e4b902d5511a639a6c8d1e91d3dabb.jpg",
"LastModified": "2023-04-30 03:54:35+00:00",
"ETag": "\"2d3e29f3eaca62ba688c04a82d923fba\"",
"Size": 118002,
"StorageClass": "STANDARD"
},
from til.
Generated image example: http://s3.amazonaws.com/til.simonwillison.net/f19a4a99ca28b20786ed7e35d8f9a8e7.jpg
from til.
To see how many are done:
% s3-credentials list-bucket til.simonwillison.net | jq length
43
410 total.
from til.
Partial logs from that GitHub Actions run:
Stored 96126 byte JPEG for github-actions_grep-tests.md shot hash 3e71efb58ec2d72ce37d6c93d7ace74e
Stored 70990 byte JPEG for github-actions_commit-if-file-changed.md shot hash 3b4a2012993962434fc8f5853cf5396b
Stored 72935 byte JPEG for bash_loop-over-csv.md shot hash d06963c31326ae773a8e7face614668c
from til.
It finished. All 410 images should be there now.
from til.
This query shows all the images on one page:
select
json_object(
'img_src',
'https://s3.amazonaws.com/til.simonwillison.net/' || shot_hash || '.jpg',
'width',
400
) as img
from
til
https://til.simonwillison.net/tils
I scrolled through and they all look good. This one was a favourite: https://s3.amazonaws.com/til.simonwillison.net/990ce33b65e40356be0035f185b3484c.jpg
from til.
Last steps:
- Remove the
datasette-media
plugin and configuration - Delete the old cached images
- Update the template to reference the new ones (oh no! That's going to require regenerating them all since the template hash will change)
from til.
Oops broke it:
Traceback (most recent call last):
File "generate_screenshots.py", line 92, in <module>
generate_screenshots(root)
File "generate_screenshots.py", line 55, in generate_screenshots
shot_html_hash.update(filepath.read_text().encode("utf-8"))
File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1236, in read_text
with self.open(mode='r', encoding=encoding, errors=errors) as f:
File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1222, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/pathlib.py", line 1078, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/til/til/main/templates/row.html'
from til.
That's deployed now.
from til.
Wrote this up as a TIL: https://til.simonwillison.net/shot-scraper/social-media-cards
from til.
Related Issues (20)
- macos/atuin - --disable-up-arrow option
- Use DuckDB to convert parquet to JSON
- Dolly not using GPU
- Search for recent comments on an HN article
- `jc` for git logs to json
- Deploys failing - greater than 50MB HOT 8
- Social media cards missing syntax highlighting HOT 4
- Social media cards should not display raw markdown HOT 6
- Related content using embeddings HOT 7
- Less margin/padding on smaller screens HOT 6
- Fragment links on headings HOT 8
- Avoid retaking all screenshots on publish unless strictly necessary HOT 6
- GitHub Markdown API changed and broke my links HOT 6
- CSS textarea trick intermittently fails on iPhone HOT 6
- Issues with OpenAI Streaming code HOT 1
- Build failed with GitHub ratelimit error on Markdown HOT 2
- Broken image bug HOT 7
- Get JSR @datasette/table to work HOT 3
- SQLite timestamps with floating point seconds
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from til.