standardebooks / web Goto Github PK
View Code? Open in Web Editor NEWThe source code for the Standard Ebooks website.
Home Page: https://standardebooks.org
License: Creative Commons Zero v1.0 Universal
The source code for the Standard Ebooks website.
Home Page: https://standardebooks.org
License: Creative Commons Zero v1.0 Universal
The sync-ebooks script has a two-step process: first, update any existing directories, then clone any new repositories.
In the update portion, it updates each directory by doing a git fetch
. As I'm sure you know, but I didn't (I've learned enough git
to use it for SE, but I've never seen or needed a fetch
), fetch gets the changes but doesn't update the working directory; you have to do a subsequent git merge
to do that.
I discovered this because I ran a sync tonight and didn't get any new changes. Or at least it looked that way to me. After doing some investigation, I found that I had the changes, I just had to do a git merge
in every directory to see them. Which seems… inefficient.
My first question is: is that for purposes of bare repositories, i.e. should only a fetch be done when updating a server?
My next question is: for the general use case of keeping a set of WD clones, wouldn't we want the script to do both the fetch
and the merge
(or a pull
)? Is there any reason not to do that? (From what I've read, using fetch/merge is "safer" than doing a "git pull", although if we're going to do both automatically I don't know that it makes a difference.)
The pagination buttons in collections page redirects to ebooks page.
https://standardebooks.org/collections/modern-librarys-100-best-novels
I tend to view pagination as an anti-pattern: a way of solving the website’s lack of resources at the expense of usability. Now obviously there are use cases for pagination (large numbers of results, expense of generating each result, cost of rendering each result to the user) but I don’t think any apply here.
The cost to the user of pagination is the inability to find in page, and the wait for each further page to load instead of just scrolling down. A typical alternative would be infinite scroll (automatic or “Click to load more results”) but I don’t even think we need that here.
At the moment the library is <250 books. As an optimistic estimate, this may grow to ~1000 titles over the next 5 years. The rendering cost for each result is pretty much all in the images, and they’re a fixed size for each one. So I’d like to suggest the following course of action:
width
and height
attributes to the images so that the browser knows the aspect ratio for layout (width can still be overridden with CSS but the browser can calculate aspect ratio from that for offscreen layout).Thoughts? Obviously I’m happy to PR this if people want it.
Hello,
The Google mailing list is inaccessible for me, giving the error:
This group either doesn't exist, or you don't have permission to access it. If you're sure this group exists, contact the owner of the group and ask them to give you access.
As an alternative to Google's data harvesting tool mailing list you might like to consider:
🙂
I was surprised that it was not possible to filter by language of book (english, spanish, german....)
OPDS entries are ordered. For example, in the main link from the first OPDS feed (All standard ebooks), we see the following first few books:
The Moon Pool
30 octobre 2017 18 h 17
A group of adventurers search for their friends, who were lost while exploring the otherworldly secrets of a monument discovered on a chain of island ruins.
The Mysterious Affair at Styles
13 novembre 2017 19 h 34
Hercule Poirot solves the mystery of a murder in an English country manor.
The Secret Adversary
29 octobre 2017 23 h 13
Tommy and Tuppence try to solve a mystery, only to find themselves embroiled in schemes of murder, millionares, and diplomats.
That's all fine and dandy, but notice how the dates are completely out of order?
It would make much more sense to order those by date. This way this entry could reflect the sort order in the main webpage: by date.
Or we could sort the entries in alphabetical order. This could be a separate feed too.
Another feed could categorize entries by author: the first level would list authors and then books would be listed after.
For example, the Gutenberg OPDS feed offers to sort by popularity, "latest" or "random". The Manybooks OPDS feed is even better organized; you can see "New titles", "Authors", "Titles" and "Genres" categorizations.
I stumbled upon the web page, and started looking for information on which languages are supported. That is, must all books on there be in english, and if so, what are the sources for the books that were not originally written in english but are still present?
Having information on this on the site would be nice 🙂
(And thanks for the great work done here!)
a697175 updated the regex for the “everyone” interactive-sr from /\v([Ee])very one(\s+of)@\!/\1veryone/
to /\v([Ee]ach and )@*<\!([Ee])very one(\s+of)@\!/\1veryone/
. This seems to have broken the command on the default Mac install of vim with the following error (2nd line repeated for each file):
Error detected while processing command line:
E59: invalid character after @
The default report for vim --version
(on macOS 11 Big Sur) is:
VIM - Vi IMproved 8.2 (2019 Dec 12, compiled Nov 23 2020 06:06:21)
macOS version
Included patches: 1-850
Compiled by [email protected]
plus a big listing of compiled features.
I’m a vim newbie and have no idea how to start debugging this. Is it something you’ve seen before?
We are Interaction Design Students and currently have a project where we aim to find ways to improve the User Experience of Standard Ebooks.
Is it possible to make the source code of the website available?
Hello! Awesome project, thanks for your hard work in creating quality ebooks!
I've found this bit of advice on the website, but it's not entirely true. I use Calibre to transfer KEPUBs to my Kobo, and my Kobo correctly identifies them as KEPUB. I don't know why/where this came from, but I can imagine someone adding .kepub.epub
files to Calibre without success, because Calibre doesn't recognise these as KEPUB -- it does, however, recognise .kepub
. And so, whenever I download a KEPUB from Standard Ebooks, I just remove the .epub
extension, add it to Calibre, and send it to my Kobo.
web/www/help/how-to-use-our-ebooks.php
Line 67 in eed3355
Currently, https://standardebooks.org/tags/science-fiction yields no results. This is the url that the tags link on book pages uses. Other tags work fine. I suspect this is because science fiction is the only two word tag.
Also, https://standardebooks.org/tags/science+fiction (changing the - to a + in the tag name) seems to work.
It would be nice to have a link to the author page (for instance https://standardebooks.org/ebooks/maurice-leblanc/) on the pages for the specific books (https://standardebooks.org/ebooks/maurice-leblanc/the-eight-strokes-of-the-clock/alexander-teixeira-de-mattos for example). The authors name is already listed under the title, so it may be a good idea to just make that a hyperlink. That would also mirror the way it works in the search results, so it should be intuitive.
Search engines are indexing the text and dropping people onto chapters or other parts of books. For example, I just ended up on https://standardebooks.org/ebooks/evelyn-underhill/practical-mysticism/text/halftitlepage
Do we want/need some sort of mini navigation to rescue people who can’t use the back button to proceed?
I’d be happy to do this, but a quick check to see if it’s wanted first.
schema.org defines book microdata: https://schema.org/Book . This is used by search engines to help categorise and display data. We can add this (probably most easily as JSON-LD) to each book page to help people find us.
From the Standard Ebooks Manual of Style, 8.2.11.2
The names of publications, music, and art that can stand alone are italicized; additionally, the names of transport vessels are italicized. These include, but are not limited to: …
It seems reasonable that all of the items listed to be italicized also appear in the SE vocabulary so that they can be semanticated (assuming they're not in EPUB® 3 or Z39.98), but there are a few missing: se:name.music.album
, se:broadcast.radio-show
, and se:name.publication.ballad
(or possibly se:name.music.ballad
).
Further, some clarification may be in order. 8.2.11.3 lists names that should be enquoted, but there is no indication if they should be semanticated or what that would look like (“<span epub:type="se:name.music.song">Happy Birthday</span>”
?). Some of these listed enquotable names (song, short-story, novella) are in the vocabulary, but not all.
In short, three proposals:
At the moment it’s just number of words with a scaling factor.
On this page:
https://github.com/standardebooks/web/blob/master/www/contribute/accepted-ebooks.php
you write: "This includes obscure histories that are not otherwise notable and have been superceded by modern research..."
Superceded is a misspelled. It should be: superseded.
See https://standardebooks.org/contribute/typography#quotation-marks
According to TeX book (and https://en.wikipedia.org/wiki/Thin_space, which refers to it), it should be thin space,  
See also https://theeditorsblog.net/2017/04/14/quotes-within-quotes/
The publication date of a book it's a very useful information to select a book to read.
Would be very helpful to show it in the books section and permit to order by it.
The sponsor ebook donation buttons on https://standardebooks.org/donate all go to "Page not found" on Fractured Atlas.
I guess you’ve got some Github API integration to rebuild the library for every release, but without it I’m finding it difficult (even with the README documentation on expected formats) to build the expected heirarchy of ebooks and ebook data. Would it be possible to add a script that, given a folder containing a set of cloned SE repos could build the expected tree of data and potentially copy it into place?
Open question: do we want to opt our users out of Google tracking?
https://spreadprivacy.com/block-floc-with-duckduckgo/ for more info, but the gist is Google trialing a new system that attempts to track small groups of users rather than individuals, by assigning them on-device into cohorts, potentially preserving user privacy.
A more cynical reading is that Apple and Mozilla have / are blocking 3rd party cookies, leaving the ad industry up shit creek, and Google is desperately trying to do the minimum possible to look like a reasonable actor, without breaking their ad business.
Anyway, we can with the Permissions-Policy: interest-cohort=()
header choose to opt our users out of FLoC tracking, meaning that standardebooks.org won’t be used as a targeting mechanism for the cohort generation. This sounds like a good idea to me, but thoughts?
Currently, each entry has four acquisition links, but since the link to EPUB and the link to EPUB 3 have the same type, it's hard for the client or user to tell which is which.
Would it be possible to add an ttile
attribute to the links containing short labels, like "EPUB", "EPUB 3", "Kobo", "Kindle"? I don't think the spec mentions this, but I've seen some OPDS feeds do this.
If there's a title
, clients that support title
s can provide a more useful way for the user to select a link, instead of showing two links with identical types.
In the other repo, I opened an issue on loading the book files and was pointed to the excellent page at https://standardebooks.org/help/how-to-use-our-ebooks. I would not have found that page otherwise. I suggest (without doing a PR because I don't know where y'all would want to put it) that a pointer to how-to-use-our-ebooks be made more prominent on the web site.
The sync-ebooks script excludes the tools
, web
, manual
, and sublime-text-se-plugin
repositories from syncing. It should also exclude standard-blackletter
.
Hello,
Thank you for your great work, I love the initiative.
In here you're saying that Epub files are compatible withAll devices and apps except Amazon Kindle and Kobo
.
I'm curious where this comes from, I've personally been using .epub
files straight from Gutenberg or other publisher's sites with various Kobo devices for a long time and never encountered a problem, the Kobo official store will also let you download files as Epub.
Are most files from publishers actually kepub
files with a .epub
extension ?
Thanks
We could potentially add metadata markup to our search results and book pages up using the schema.org Book metadata: https://schema.org/Book . This would allow search engines and other automated systems to accurately identify and consume our data. Thoughts?
I have spotted a few more:
https://standardebooks.org/help/how-to-use-our-ebooks#transferring-to-your-ereader
this page describes copying over the thumbnail files to the system folder on the kindle. slight inaccuracy here for at least windows 10. The system folder isn't a hidden folder (at least on the 10th generation kindle) but a "protected operating system file." To properly see it, you must uncheck "Hide protected operating system files" from the file explorer advanced options.
Unfortunately switching from perl to sed has broken macOS compatibility, as --in-place
is a GNU thing. Short of requiring mac users to brew install gnu-sed
there don’t seem many clean options (see https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux).
What was the rationale for switching in the first place?
I think it would be great to have a Vagrant/Docker file that automatically sets up a test server, because that would really make things easier for newcomers. Especially with small changes like #16, setting up the server is more work than the actual contribution itself. The Vagrant/Docker setup files would also provide more detailed and complete instructions on how to set up the server (at least on Ubuntu) than the Readme.
I have just looked at your excellent manual for typography. It would be great to work more closely with one another to maximize the aesthetic outcome for both projects. Would you mind reviewing Foliate in terms of how well Standard Ebooks are rendered and raise any usability issues that you may come across in this process? You would probably notice any formatting and typography issues much quicker and your feedback on this topic would be greatly appreciated.
Often I see a nice cover in the search results and would like to look at it more closely in the browser, as opposed to my monochrome e-ink reader. But although the full cover is accessible from the web and linked from the OPDS feed (e.g., The Red House Mystery), it’s not exposed on the website proper. In fact, although I can right‐click → “View Image” in the search results to view a small thumbnail, on the book’s page itself there is no link to the full cover except the Kindle thumbnail, which won’t display in the browser anyway.
Sorry I’m not confident enough in my CSS ability to come up with a pull request. I tried wrapping the existing cover crop (the <img>
labeled “The cover for the Standard Ebooks edition of such‐and‐such”) in a <a>
but it seemed to mess up the design in a way I couldn’t comprehend.
These are the headers I see when viewing https://standardebooks.org/images/covers/f-scott-fitzgerald_the-great-gatsby-3ea4090f-cover.avif:
HTTP/1.1 200 OK
Date: Thu, 11 Mar 2021 05:47:48 GMT
Server: Apache
Strict-Transport-Security: max-age=15768000
Upgrade: h2,h2c
Connection: Upgrade, close
Last-Modified: Wed, 10 Feb 2021 23:33:53 GMT
Accept-Ranges: bytes
Content-Length: 5298
X-UA-Compatible: IE=edge
X-Frame-Options: sameorigin
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Referrer-Policy: no-referrer-when-downgrade
ETag: "e6d0-14b2-5bb03d60a53e0-gzip"
Content-Security-Policy: default-src 'self';
Without a Content-Type
, it shows up as text in Chromium when viewed directly (with, e.g., “Open image in new tab”).
The https://readium.org/about/applications.html/
link in the following line currently 404s:
web/www/help/how-to-use-our-ebooks.php
Line 21 in 670f524
readium.org
, but didn't find an equivalent page. The closest was https://readium.org/awesome-readium/
.There are several series / collections available on the site, e.g. the Sherlock Holmes and Arsène Lupin stories. Would be good to make these available under individual URLs.
Proposal:
Part 1 should be fairly easily accomplished, so in the best agile tradition I’ll start with that.
Given the frequency of his involvement, should we add a snippet to create-draft
to prefill it in the transcribers section?
The Standard Ebooks website doesn't render correctly in some E-Reader browsers. For instance the Kobo Libra H20:
Since the Kobo can download ebooks directly through the browser it would be great if the site was easier to navigate. Would it be possible to create a static "lite" version of the site, similar to the mobile websites that used to be common?
In a post on the mailing list, a visitor suggested allowing the user to sort the Browse Ebooks page by reading ease. In addition, I also suggest allowing the user to sort by word count.
I took a look at the EBook class to see if there were any other properties that would be good for sorting, but I didn't really see any others. I considered suggesting reading time, but it looks like this is simply word count divided by 275 words per minute for all books, so sorting by reading time would be identical to sorting by word count. (Although, users may not realize that. I always assumed that reading time was calculated on both word count and reading ease!)
To keep things simple, I think it's sufficient to have a single sort direction for these new criteria:
The content on each ebook on the website is very detailed and interesting.. It would be nice if the blurb on the OPDS was as detailed, or indeed used the same text.
https://github.com/standardebooks/web/tree/master/config/ssl
Surely this should be generated when needed and not stored in git?
https://standardebooks.org/collections should show all of the collections on the site.
(from this comment https://news.ycombinator.com/item?id=25788256)
Building upon #60 and a request for Foliate to make the ebook selection process easier for users, linking to johnfactotum/foliate#443
In the Catalogs component of Foliate, the epub selection is a bit unclear. According to the Foliate developer, these labels are provided by the OPDS acquisition feed. Why is the first one recommended over the second one? Both entries also show epub+zip in the popover. What makes it more compatible than the other?
Maybe it would be better to phrase it differently and to specify what devices they are for (e.g. recommended for desktop viewers). I think the current implementation is confusing for a user (at least for me it is). Could this be made clearer in the menu entries?
Actually, it’s currently broken on Firefox Nightly due to a bug there, but even after the fix for that the combination on no Content-Type and nosniff headers will apparently cause problems:
https://bugzilla.mozilla.org/show_bug.cgi?id=1547076#c7
I guess an easy fix would be to specify Content-Type text/plain?
There are a number of broken links to https://standardebooks.org/contribute/semantics (which no longer exists) in the step-by-step guide. They should be replaced with equivalent links in the new manual.
It would be great to have the option to search the Standard Ebooks OPDS feed. Foliate has the capability to perform searches. Would you consider implementing this so that we can increase ebook discovery in both projects? For now, the option to search the SE feed has been deactivated in Foliate until implemented on your side.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.