Comments (17)
If the data quota is limiting us, we can do the following.
After merging a PR,
- Jenkins will copy all files to depot
- provide an unique URI to this dataset
- replaces all occurrence of this dataset in the markdown files with the URI
- replaces the dataset with a markdown file that embeds this URI/image so that it can be displayed on github
- some git magic to get rid of the original files
- commits everything to master
from galaxy-hub.
As an update we got in touch with github and got additional storage for free as an academic organisation.
from galaxy-hub.
We're using LFS. Jenkins does not like LFS, though, so that needs to be fixed.
from galaxy-hub.
LFS is currently being used for images. We did that because many of the images from the wiki could be losslessly compressed a great deal more (and we didn't think about this until after we'd copied the originals
I think there's a good argument for leaving most images in the repo (I think only a few are bigger than 1 meg) and for loading more in. Images in other websites go away, causing our older pages to look bad.
However, attachments like PDFs, PowerPoints, Word Docs and so on can be enormous, and we do have a lot of them. Wherever we put these, I don't think we need versioning for them. (If we do need versioning, then GitHub seems like a good place, even if the attachment is huge.)
We could use commercial services (know of any?). We could also use something like https://depot.galaxyproject.org/ (@natefoo, @InitHello) . Either way, we need to work out a method for people to upload the files.
Do we want to continue uploading large files (anywhere) as a best practice?
Or should we link to large files using the sites they are already on, whenever possible?
I used to do this with the old wiki because then our Google custom searches would also search the attachments. That is worth a lot, IMO. However, I stopped doing that for things that were already on the web a couple of years ago. It's work (mostly time) to upload them and it is a lot of storage.
My answer to this question depends on how much pain an upload mechanism would cost us, both in terms of time and money.
from galaxy-hub.
I'm not sure I understand the argument against LFS (or if you are even making one). It works, people can upload images using it, images don't change a lot so having history probably isn't a big deal... What am I missing?
from galaxy-hub.
@jxtx I think we're definitely set on using LFS (and, well, we already are), the question is just what to do with very large files that we've attached to the wiki in the past, files which range upwards of 100s of megabytes.
from galaxy-hub.
Separate repo?
from galaxy-hub.
Looking into GitHub, it appears to limit repositories to 1GB each. I can't find a user/project limit. So, in theory, as long as our attachments sum to < 1 GB, this might be possible. (Hmm. We might actually approach that limit after a while.)
I can see a couple of ways to implement this:
-
Have links in the markup point to the actual files on GitHub. The links would be long ugly, and be basically cut and pasted into the page, but they would work when viewing in GitHub and in the web site, and on locally generated websites (once the attachment is committed).
That might look like:
[Get the Starforge Logo](https://raw.githubusercontent.com/galaxyproject/starforge/master/docs/starforge_logo.png)
Renders in GitHub as
Get the Starforge Logo -
Have links to attachments use a special syntax that is then replaced by the build process with the full link to the file in GitHub. In an ideal world, that syntax might render a link to a page that explains how to add an attachment.
[text to display in rendered page](/ATTACH?/Root/Relative/Path/AttachmentRepo/file.pdf)
Renders in GitHub as
text to display in rendered page
Inside the repo on GitHub clicking on that would land you on the /ATTACH page in that repo, which could explain how this whole thing works. However, on a local install that would point to what may appear to be a random link buried deep in GitHub.
If we go this route, I'd favor the first option: just paste in the full URL and be done.
from galaxy-hub.
Hmm. We might actually approach that limit after a while.)
I take that back. I bet we are already past that limit.
from galaxy-hub.
Hmm. Is it possible to support uploading of large files to this repo, and then automatically move them to, for example, S3, drop the uploaded file from GitHub, store the URL where the uploaded file was, and then have Metalsmith replace the in-repo URL with the S3 url at render time? Uploaded large files would effectively become redirects to S3.
Accessing the attachment in a rendered site would just work. Accessing the attachment from within the GitHub UI would take you to the attachment page, which relatively quickly would be replaced with text displaying the S3 URL. The problem might be how does a local install of Metalsmith deal with an attachment that has not been committed and pushed to GitHub yet?
The cost of storing 10GB on S3 is about $2.76 / year.
from galaxy-hub.
I'd vote on using the standard links directly in the markup (i.e., [link text](https://some.url)
) and keeping the files wherever is most convenient for the author. The standard links allow people to quickly hop on the Github in-browser editor and see the output in the Preview panel vs. having to wait for a build.
For the storage backend, S3 sounds like a reasonable option; Jetstream's object store is another option that won't cost anything but possibly isn't as persistent as S3 for long-term future, and LSF is available for content that wants to be versioned.
from galaxy-hub.
Yep. I like leaving this up to the author. Right now there are a lot of broken links, but eventually we'll be able to easily make check
and verify external contents (and flag when they need to be updated), etc.
from galaxy-hub.
I don't know enough about Jenkins to comment on the details, but this sounds like a great solution to me.
from galaxy-hub.
A possible issue here is that if we put all the attachments inside the repo, then it becomes a very large repo, consisting mostly of attachments. I don't think we want that.
from galaxy-hub.
@tnabtaf what kind of attachments your are thinking about. Slides in general should be hosted elsewhere and just linked ...
from galaxy-hub.
@bgruening We may be talking past each other here. Slides are an example of what I think this issue is about. How do we support storing them, without causing the repo to bloat? I'm happy to link to slides that are already online, but if people don't want to get in the slide publishing business, I also want to give them a place to put slides / documents that are useful to the community.
Also, anything stored in our attachments dump (wherever that ends up being) can be included in the custom searches.
Almost all of our images are small and I favor keeping them in the hub.
from galaxy-hub.
Closing this issue. To recap discussions had by myself, @tnabtaf, and many others here and elsewhere (mailing lists, IRC, chat, etc), we're going to use LFS for smallish files (< 10Mb). Large files like videos and slides will be hosted elsewhere at the author's discretion. Most of this was @tnabtaf posting things previously, and he's going to use depot.galaxyproject.org or some similarly accessible resources for storage.
from galaxy-hub.
Related Issues (20)
- Finish porting .eu content - fix auto-imported posts
- Finish porting .eu content - missing pages
- List formatting weirdness HOT 2
- Ensure google analytics not shown to EU viewers
- Galactic Career Center - Link to the Bioinformatics training officer not working HOT 8
- Search parameters are redirected HOT 11
- Add funding info for EuroScienceGateway HOT 3
- ERR_OSSL_EVP_UNSUPPORTED when building the site locally HOT 9
- Update social media information HOT 2
- respect prefers-reduced-motion with snowflakes HOT 1
- Dependancy error `fs.rmSync is not a function` HOT 4
- SyntaxError: JSON.parse: unexpected non-whitespace character after JSON data at line 158 column 2 of the JSON data HOT 1
- Suggestion to merge News and Blog HOT 2
- Malformatted feed for .EU
- VarScan error: index file older than data file and fail to parse certain regions HOT 2
- Advertise Smörgåsbord on UseGalaxy.*, Galaxy Project homepage. HOT 5
- [Feature Request] Track galaxy server locations HOT 2
- remove admin/tools/docker
- Old footnote syntax does not work (from imported blog posts)
- More controlled vocabulary for the public servers list
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from galaxy-hub.