Code Monkey home page Code Monkey logo

Comments (17)

bgruening avatar bgruening commented on May 30, 2024 1

If the data quota is limiting us, we can do the following.
After merging a PR,

  • Jenkins will copy all files to depot
  • provide an unique URI to this dataset
  • replaces all occurrence of this dataset in the markdown files with the URI
  • replaces the dataset with a markdown file that embeds this URI/image so that it can be displayed on github
  • some git magic to get rid of the original files
  • commits everything to master

from galaxy-hub.

bgruening avatar bgruening commented on May 30, 2024 1

As an update we got in touch with github and got additional storage for free as an academic organisation.

from galaxy-hub.

dannon avatar dannon commented on May 30, 2024

We're using LFS. Jenkins does not like LFS, though, so that needs to be fixed.

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

LFS is currently being used for images. We did that because many of the images from the wiki could be losslessly compressed a great deal more (and we didn't think about this until after we'd copied the originals

I think there's a good argument for leaving most images in the repo (I think only a few are bigger than 1 meg) and for loading more in. Images in other websites go away, causing our older pages to look bad.

However, attachments like PDFs, PowerPoints, Word Docs and so on can be enormous, and we do have a lot of them. Wherever we put these, I don't think we need versioning for them. (If we do need versioning, then GitHub seems like a good place, even if the attachment is huge.)

We could use commercial services (know of any?). We could also use something like https://depot.galaxyproject.org/ (@natefoo, @InitHello) . Either way, we need to work out a method for people to upload the files.

Do we want to continue uploading large files (anywhere) as a best practice?

Or should we link to large files using the sites they are already on, whenever possible?

I used to do this with the old wiki because then our Google custom searches would also search the attachments. That is worth a lot, IMO. However, I stopped doing that for things that were already on the web a couple of years ago. It's work (mostly time) to upload them and it is a lot of storage.

My answer to this question depends on how much pain an upload mechanism would cost us, both in terms of time and money.

from galaxy-hub.

jxtx avatar jxtx commented on May 30, 2024

I'm not sure I understand the argument against LFS (or if you are even making one). It works, people can upload images using it, images don't change a lot so having history probably isn't a big deal... What am I missing?

from galaxy-hub.

dannon avatar dannon commented on May 30, 2024

@jxtx I think we're definitely set on using LFS (and, well, we already are), the question is just what to do with very large files that we've attached to the wiki in the past, files which range upwards of 100s of megabytes.

from galaxy-hub.

natefoo avatar natefoo commented on May 30, 2024

Separate repo?

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

Looking into GitHub, it appears to limit repositories to 1GB each. I can't find a user/project limit. So, in theory, as long as our attachments sum to < 1 GB, this might be possible. (Hmm. We might actually approach that limit after a while.)

I can see a couple of ways to implement this:

  1. Have links in the markup point to the actual files on GitHub. The links would be long ugly, and be basically cut and pasted into the page, but they would work when viewing in GitHub and in the web site, and on locally generated websites (once the attachment is committed).

    That might look like:
    [Get the Starforge Logo](https://raw.githubusercontent.com/galaxyproject/starforge/master/docs/starforge_logo.png)
    Renders in GitHub as
    Get the Starforge Logo

  2. Have links to attachments use a special syntax that is then replaced by the build process with the full link to the file in GitHub. In an ideal world, that syntax might render a link to a page that explains how to add an attachment.

    [text to display in rendered page](/ATTACH?/Root/Relative/Path/AttachmentRepo/file.pdf)
    Renders in GitHub as
    text to display in rendered page
    Inside the repo on GitHub clicking on that would land you on the /ATTACH page in that repo, which could explain how this whole thing works. However, on a local install that would point to what may appear to be a random link buried deep in GitHub.

If we go this route, I'd favor the first option: just paste in the full URL and be done.

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

Hmm. We might actually approach that limit after a while.)

I take that back. I bet we are already past that limit.

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

Hmm. Is it possible to support uploading of large files to this repo, and then automatically move them to, for example, S3, drop the uploaded file from GitHub, store the URL where the uploaded file was, and then have Metalsmith replace the in-repo URL with the S3 url at render time? Uploaded large files would effectively become redirects to S3.

Accessing the attachment in a rendered site would just work. Accessing the attachment from within the GitHub UI would take you to the attachment page, which relatively quickly would be replaced with text displaying the S3 URL. The problem might be how does a local install of Metalsmith deal with an attachment that has not been committed and pushed to GitHub yet?

The cost of storing 10GB on S3 is about $2.76 / year.

from galaxy-hub.

afgane avatar afgane commented on May 30, 2024

I'd vote on using the standard links directly in the markup (i.e., [link text](https://some.url)) and keeping the files wherever is most convenient for the author. The standard links allow people to quickly hop on the Github in-browser editor and see the output in the Preview panel vs. having to wait for a build.
For the storage backend, S3 sounds like a reasonable option; Jetstream's object store is another option that won't cost anything but possibly isn't as persistent as S3 for long-term future, and LSF is available for content that wants to be versioned.

from galaxy-hub.

dannon avatar dannon commented on May 30, 2024

Yep. I like leaving this up to the author. Right now there are a lot of broken links, but eventually we'll be able to easily make check and verify external contents (and flag when they need to be updated), etc.

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

I don't know enough about Jenkins to comment on the details, but this sounds like a great solution to me.

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

A possible issue here is that if we put all the attachments inside the repo, then it becomes a very large repo, consisting mostly of attachments. I don't think we want that.

from galaxy-hub.

bgruening avatar bgruening commented on May 30, 2024

@tnabtaf what kind of attachments your are thinking about. Slides in general should be hosted elsewhere and just linked ...

from galaxy-hub.

tnabtaf avatar tnabtaf commented on May 30, 2024

@bgruening We may be talking past each other here. Slides are an example of what I think this issue is about. How do we support storing them, without causing the repo to bloat? I'm happy to link to slides that are already online, but if people don't want to get in the slide publishing business, I also want to give them a place to put slides / documents that are useful to the community.

Also, anything stored in our attachments dump (wherever that ends up being) can be included in the custom searches.

Almost all of our images are small and I favor keeping them in the hub.

from galaxy-hub.

dannon avatar dannon commented on May 30, 2024

Closing this issue. To recap discussions had by myself, @tnabtaf, and many others here and elsewhere (mailing lists, IRC, chat, etc), we're going to use LFS for smallish files (< 10Mb). Large files like videos and slides will be hosted elsewhere at the author's discretion. Most of this was @tnabtaf posting things previously, and he's going to use depot.galaxyproject.org or some similarly accessible resources for storage.

from galaxy-hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.