Code Monkey home page Code Monkey logo

Comments (29)

alexbilbie avatar alexbilbie commented on September 13, 2024

I don't like this for three reasons:

  1. You're going to clutter the filesystem of your webserver with images and folders
  2. It won't work if you've got distributed webservers (only one webserver would have the cached image)
  3. The cache filesystem + caching in league/flystem (with Redis/Memcache) should (at least this is the case in my testing) result in a speedy response to the second request.

There is a genuine flood problem however if a second request is made for a single image whilst the first request for that image is still processing:

   Client 1             Client 2             Server                      

+------+-------+     +------+-------+    +------+------+                 
       |                    |                   |                        
       |    First           |                   |                        
       |    request         |                   |                        
       |                    |                   |                        
       | +------------------------------------> | +-------+              
       |                    |                   |         |              
       |                    |   Second          |         |              
       |                    |   request         |         |              
       |                    | +---------------> | +----+  |  Processing  
       |                    |                   |      |  |  first       
       |                    |                   |      |  |  request     
       |   Return first     |                   |      |  |              
       |   response         |                   |      |  |              
       | <------------------------------------+ | <-------+              
       |                    |                   |      |                 
       |                    |    Return second  |      |  Duplicated work
       |                    | <--response-----+ | <----+  by server      
       +                    +                   +                        

This could be mitigated by something along the lines of the following:

  1. First request received by the server
  2. Server checks if cached image is available
  3. Check for "processing flag" for that image
  4. "processing flag" isn't present
  5. Set "processing flag" for that image
  6. Start generating new image
  7. Second request received by the server
  8. Second request check if cached image is available
  9. Cached image isn't available so check if "processing flag" has been set
  10. "processing flag" is set so check every second if flag has been lifted, when it has return cached image
  11. Server finishes rendering image for first request, new image is cached
  12. Server removes processing flag
  13. Return cached image for first request

Now there would be a chance that the first request dies so this "processing flag" should have a TTL of a few seconds. The server should update the TTL whilst rendering so that the flag doesn't disappear.

from glide.

reinink avatar reinink commented on September 13, 2024

Yes, the flooding issue @alexbilbie notes is real. I'm personally a little less concerned about outputting images with PHP. Yes, it's certainly not as fast at Apache or Nginx, but still fine for most projects. And if you have a really high traffic site, then working out the flooding issue is probably more of a priority.

Having said that, I know that you can use Nginx's X-accel functionality to still output the image using Nginx. This is done by basically passing the file path to Nginx using headers in your PHP script. Nginx then picks up those headers and takes care of the rest.

I believe Apache has similar functionality, called X-Sendfile.

from glide.

reinink avatar reinink commented on September 13, 2024

@alexbilbie Curious how you saw the "processing flag" being set. Something like Redis of MySQL?

from glide.

alexbilbie avatar alexbilbie commented on September 13, 2024

I'd use Redis or Memcached

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

Personally I think that a caching layer BETWEEN the client and glide would be better. Thinking about either a CDN or something like Varnish. This solves the too-much-files problem mentioned by @alexbilbie. However this still leaves the flood question open.

These kind of performance related issues should be collected under one, where all possible problems can be addressed, so that issues mentioning one segment of the problem can be closed.

from glide.

reinink avatar reinink commented on September 13, 2024

I really like your idea @alexbilbie. I'm thinking of creating a standalone limiter library that can handle this. Essentially it has to be able to limit tasks across all users. So unlike many other rate limiters out there, this would be an application layer limiter. Really rough, it would look something like this:

// Create limiter
// Possible adapters: Disk, APC, Memcached, Redis, MySQL
$limiter = new Limiter(new APCAdapter());

// Check if cache exists
if (!$glide->cacheFileExists()) {

    // Add task to the limiter
    $limiter->addTask(function () use ($glide, $request) {
        $glide->makeImage($request);
    });
}

// Output the image
$glide->outputImage($request);

This would all occur synchronously, and the limiter would restrict the tasks at an application level, using a flag (set via the adapter) and the sleep() function. The limiter could have an option to determine how often the flag is checked. Like:

$limiter->setRefresh(500); // in milliseconds

from glide.

reinink avatar reinink commented on September 13, 2024

@sagikazarmark By the way I think Varnish is an excellent idea as well for caching purposes. The less PHP does, the faster. But yeah, it doesn't really solve the bigger flooding problem.

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

@reinink Look at the proposed library metaphore which solves a similar use case.

from glide.

reinink avatar reinink commented on September 13, 2024

@sagikazarmark Almost! Good thinking, except that still works a little different. What we need for Glide is something that literally delays (sleeps) a request, where metaphore on the other hand returns old cached content. Since Glide doesn't have any available cache, this won't really work.

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

Didn't say it is perfect, however the two cases are very similar, probably that package could be improved to solve several use cases related to dogpile effect, like this one.

from glide.

reinink avatar reinink commented on September 13, 2024

Good point.

from glide.

reinink avatar reinink commented on September 13, 2024

/cc @sobstel for comment. :)

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

// Possible adapters: Disk, APC, Memcached, Redis, MySQL

I suggest you avoiding implementing this logic zillions+1 time. ;) We talked about @sobstel that his package could also use BlackBox which is going to be an abstraction layer for storage. Hope we can make it something standard-like.

from glide.

sobstel avatar sobstel commented on September 13, 2024

@reinink metaphore handles this case, there's NoStaleCache callback which could be used to delay a request.

https://github.com/sobstel/metaphore#no-stale-cache

$cache->onNoStaleCache(function (NoStaleCacheEvent $event) {
    sleep(100);
});

But I think that for high-traffic app, you should simply let first request generate an image, and other requests should get 404 header until it's generated. If you delay other requests, your web server will quickly die flooded by too many hanging requests.

So it should be like this:

$cache->onNoStaleCache(function (NoStaleCacheEvent $event) {
    header('HTTP/1.1 404 File Not Found');
});

from glide.

reinink avatar reinink commented on September 13, 2024

If you delay other requests, your web server will quickly die flooded by too many hanging requests.

How real is this problem? For example, if you had a web server with 100 delayed requests like this, would it crash?

The reason I ask is because return a 404 response is not really an option in this sort of setup, since image are generated on the fly. Seeing a whole bunch of broken image links isn't really ideal.

from glide.

sobstel avatar sobstel commented on September 13, 2024

@reinink It really depends on a server you have and use scenario. We had a delay strategy for regular cache for high-traffic site and it made web servers going down. The whole problem is that if you have one request to page with 100 images, it makes possibly 100 hanging requests. You'll hit max connections limit quickly and next requests might be simply completely dropped.

from glide.

reinink avatar reinink commented on September 13, 2024

Yeah, when I was considering this strategy I hadn't really thought about the max connections limit. Hmm....

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

Obviously no stale cache also needs some sort of storage to make sure the cache is being generated or is completely missing. What if we add a max connection to be handled by the cache strategy itself. So it can return with timed out or something like that when 20-30 requests are incoming at the same time.

from glide.

sobstel avatar sobstel commented on September 13, 2024

I do believe it's more wider problem that cannot be solved easily by this library itself (like using before-mentioned varnish or anything else that suits someone's current setup and/or infrastructure).

Metaphore will save you from dogpile effect as it says, so if many users want same content, you won't have multiple processes generating same content (but just one), which - when big traffic hits - it will cause snowball effect killing your server. It's just better to send 404 then. When you have high-load website you cannot afford waiting (hanging web server requests) or wasting resources (many processes generating the same).

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

@sobstel you are probably right. But if you have a high-load website, your environment is probably powerful enough to generate the content quickly. If not, you obviously have to either scale your environment or choose another strategy (like varnish).

from glide.

reinink avatar reinink commented on September 13, 2024

@sobstel That all makes sense. I sort of feel if you have a really heavy traffic website, then using something like Glide may not be appropriate. Or maybe it even still is, but it might make more sense to actually setup Glide on it's own web server. That way if it happens to go down, it won't take the whole website down with it.

If it was setup on it's own server you could increase the max connections limit, while having something in place to prevent dog-piling (like your library).

And I agree with using Varnish in front of Glide in a situation like this, but that doesn't really help with the initial generating of the images, which would bypass Varnish.

from glide.

sobstel avatar sobstel commented on September 13, 2024

@sagikazarmark it's not that easy I'm afraid. You can be prepared for high load, but sometimes you get 100x traffic more unexpectedly, even if you have automated scripts to scale quickly in the cloud, it still might be an issue. But, I think it's out of scope of this library. When doing high-load apps there are so many challenges that you usually need to handle specifically for app. It's just not responsibility of lib like this.

@reinink I agree 100%.

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

@sobstel Of course there is a point where you cannot use a library like this anymore. But it is very hard to find that point, so starting with a solution like this and migrating to a distributed caching system when needed is the path I would choose.

@reinink It seems to me that Varnish is smart enough to handle the situation. It will queue the rest of the requests for a while and only passes them to the backend if it turns out it cannot be cached. (Reference, but not the best though)

from glide.

reinink avatar reinink commented on September 13, 2024

Came account this package by @davedevelopment recently, which might help with throttling as well:

https://github.com/davedevelopment/stiphle

Doesn't look like it's in active development anymore though.

from glide.

davedevelopment avatar davedevelopment commented on September 13, 2024

'tis true it's not really active, purely because I'm no longer using it and it served it purpose as is. If it is something you want to pursue, ping me and I'll be happy to help explain things or help with a new implementation etc.

from glide.

alexbilbie avatar alexbilbie commented on September 13, 2024

Stiphle looks interesting but I think we need something slightly more specific that releases requests when a lock is removed.

When I get some free time I will try and create a proof of concept.

from glide.

reinink avatar reinink commented on September 13, 2024

Hey all, I'm going to close this ticket. Thanks for participating in this discussion and sharing your ideas on how to avoid server flooding.

One other solution I recently thought of is eager manipulations. See #72 for more on that.

Overall, I'm just not sure this is a problem that Glide shouldn't try to solve itself.

from glide.

bramus avatar bramus commented on September 13, 2024

A bit late to the party, but here's how I implemented this:

A script that resizes images using Glide is mounted onto /img-resized/$width/$height/$crop/$filename.$ext. The script uses a custom ResponseFactory StreamResponseFactory which does nothing but returning the stream (which itself contains the resized image data).

The stream returned by the StreamResponseFactory is then:

  1. dumped onto the filesystem into /img-resized/$width/$height/$crop/$filename.$ext
  2. returned as a response

Since the file is dumped onto disk at the location/path where it was originally requested, a successive request to the same /img-resized/$width/$height/$crop/$filename.$ext will return said dumped version directly via the webserver instead of going through another PHP + Glide cycle :)

from glide.

sagikazarmark avatar sagikazarmark commented on September 13, 2024

A common problem with this approach is that you are not protected against the dogpile effect: what if there are 10 simultaneous requests to the same image?

Do you have some kind of rate limiting for the requests? Because if you expose Glide directly you are also exposed to attacks which sends thousands of conversion requests (which of course only work if you have a lot of images).

from glide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.