Describe the bug This is a discussion ticket related to ongoing is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR) about dspace-angular HOT 15 OPEN

tdonohue commented on August 10, 2024

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR)

from dspace-angular.

Comments (15)

pnbecker commented on August 10, 2024 4

A person with some Angular experience (not a complete expert) asked me why we are making so many small requests to the backend instead of making less but larger requests to the backend. One example could be the submission form where we send one request for each section of the form and one request for each vocabulary and even more requests, instead of having one request for all information we need to build up the form.

from dspace-angular.

tdonohue commented on August 10, 2024 4

Today, I've moved a number of the performance issues described above into separate tickets. They are not directly related to this SSR issue (or only very loosely related), but obviously are performance issues themselves which require fixing:

If there are other obvious areas where DSpace is (inefficiently) making many requests, then I'd recommend others create additional tickets as well. While these issues are all semi-related, they all may need to be solved separately, as the solutions may not be identical in all cases.

from dspace-angular.

mwoodiupui commented on August 10, 2024 1

We tried Apache HTTPD mod_cache here and ran into indiscriminate sharing of Shibboleth tickets across distinct users' sessions. I haven't had time to investigate, so it might be due to unrelated changes that were hastily backed out at the same time that we disabled the external cache. If the cache was the culprit, I do not know whether a smarter configuration would fix that.

from dspace-angular.

atarix83 commented on August 10, 2024 1

@tdonohue @pnbecker

A person with some Angular experience (not a complete expert) asked me why we are making so many small requests to the backend instead of making less but larger requests to the backend. One example could be the submission form where we send one request for each section of the form and one request for each vocabulary and even more requests, instead of having one request for all information we need to build up the form.

Any multiple request done during the submission form are not relevant for this problem since the submission form is not rendered during the SSR
https://github.com/DSpace/dspace-angular/blob/main/src/app/submission/edit/submission-edit.component.ts#L150

Anyway I agree with the fact that we should optimize the number of request to the server when is possible.
I reckon that when we have multiple requests to the same endpoint referring to different dspace objects, we could try to group them in a single one.
Authorizations endpoint is one example where we have a search method which take multiple uuids for the same authorization feature request (https://github.com/4Science/Rest7Contract/blob/main/authorizations.md#objects). This search method should be used where we handle a set of dspace objects like in the search component.

We did something in this regarding on DSpace-CRIS with the item endpoint (https://github.com/4Science/Rest7Contract/blob/main-cris/items.md#findallbyids) to avoid to have multiple request to the item endpoint just to request for embedded objects as done in the search for thumbnails or other objects :

This approach is helpful not only for the SSR but also for the search functionality in general because it allows to speed up the loading of search results.

More in general as well as the high number of REST requests, the SSR can introduce overhead, especially for resource-intensive apps that require heavy server-side processing or complex data computations. In such cases, SSR may strain server resources and slow down rendering.
To mitigate these issues, we should profile and optimize resource-intensive code by using a node monitoring tool.

from dspace-angular.

paulo-graca commented on August 10, 2024 1

Thank you @tdonohue. I agree, I'm proposing a big change that, perhaps, we can't add it easily to DS8 and DS7. I've also added a comment to #3231, I agree that it should be configurable. My only point is that it matters in terms of SEO to have Search Results in SSR available for crawlers and also to Internet Archives out there.

from dspace-angular.

paulo-graca commented on August 10, 2024

I want to share that we felt the same described behavior when we went live with our first deployed DS7.

Currently, we are still experiencing constant CPU demand when comparing it to the previous version. Mainly on the Angular processes due to crawling or robots activity.

We use Apache mod_proxy in front of the service. We have created some Apache rules to redirect requests to 404 page not found:
/simple-search?query= and /handle/HANDLE_PREFIX/ID/simple-search?query=, avoiding angular to render the result for that particular broken URLs.

Like:

<Location "/simple-search">
  Order Deny,Allow
  Deny from all
  Allow from 127.0.0.1
  ErrorDocument 404 "Page not found"
  RewriteEngine On
  RewriteRule .* - [R=404,L]
</Location>

<LocationMatch "/handle/(.*)/(.*)/simple-search">
  Order Deny,Allow
  Deny from all
  Allow from 127.0.0.1
  ErrorDocument 404 "Page not found"
  RewriteEngine On
  RewriteRule .* - [R=404,L]
</LocationMatch>

Also, we use Apache mod_evasive, with some strict restrictions to the traffic (we are still fine tunning it) and we intend to try out some of the above suggestions (like the inlineCriticalCss), and tune even more the values of angular cache.

from dspace-angular.

mwoodiupui commented on August 10, 2024

The built-in memory cache isn't even shared across processes on the same server.

The Redis approach is interesting to me because, while Redis is another in-memory cache, it can be shared across processes.

from dspace-angular.

hutattedonmyarm commented on August 10, 2024

We (Universität Konstanz) also have made some customizations.

We've opted to use varnish (instead of redis/mod_cache, or built-in cache). This has a few advantages for us:

The cache is shared between all instances
We run a multilingual repo and the built-in cache does not distinguish by language
Finer control over what gets cached when
Things like static assets don't even need to touch the node processes most of the time. This is especially useful as long as the frontend is under high load

We've also disabled SSR for logged-in users, becasue they don't need it and it saves resources for us

4Science has updated DSpace-CRIS to exclude the Search component from SSR. This search component is not necessary for SEO, so it may make sense to disable SSR for this component.

We did something similar, but only for the "related publications list" on person entity pages, as that took quite a while to load. We may extend this to all searches in the future though

We extended the robots.txt to disallow additional robots we don't want

from dspace-angular.

michdyk commented on August 10, 2024

We provide documentation that you can find helpful. There is the solution we used to replace Server-Side rendered pages with Redis in the case study on the Jagiellonian University Repository:
Replace the build-in Server-Side Rendered pages with Redis.pdf

from dspace-angular.

hutattedonmyarm commented on August 10, 2024

We've been wondering about it as well. Same with /api/authz/authorizations/. So many requests. Same thing with search facets. Loading the configuration, facets, facet values all in separate requests

from dspace-angular.

tdonohue commented on August 10, 2024

There was some good discussion in today's Dev Mtg about areas where we should consider "bundling" many smaller REST Requests into a single (or fewer) requests. They include:

Submission form (as noted by @pnbecker above. Created new ticket: #3162)
Authorization checks (as noted by @hutattedonmyarm above. Created new ticket #3161)
Search facets (as noted by @hutattedonmyarm above. Created new ticket #3163)
Configuration checks (as noted in DSpace/DSpace#9056)

Those in attendance agreed that we should work on efficiency of these requests. Ideally we should bundle requests better and send fewer requests per page.

That said, this sort of work requires help/volunteers/investigation. I'll work on creating some tickets for each of these, as I think they are each separate issues which will require often individual solutions.

(These many requests are semi-related to performance of SSR as they likely can contribute to the slowness. However, these are less likely to be "quick fixes", as some may require a new backend endpoint to be created and refactor the UI code.)

from dspace-angular.

hutattedonmyarm commented on August 10, 2024

There's another one: Browing the landing page of the demo and clicking to go to one of the communities also loads the data twice:

The first one is a 302 redirect, but already contains the community object. Then the actual object is requested again.
#458 also loads data twice.

Not only the number of requests, also the size. E.g. clicking on a publication to go to its page requests the whole "owningCollection" object in a separate request. It's unclear what for and if needed it should be embedded in the item instead of a separate request

from dspace-angular.

mwoodiupui commented on August 10, 2024

May I suggest that removing redundant requests, and bundling related requests, are important and should be addressed, but that we may need a second ticket for issues that are unrelated to SSR.

from dspace-angular.

paulo-graca commented on August 10, 2024

«4Science has updated DSpace-CRIS to exclude the Search component from SSR. This search component is not necessary for SEO, so it may make sense to disable SSR for this component.»

I tend to disagree here. Search results are relevant for SEO since you can have faceted results that give relevance in some thematics to your repository/site. Imagine that your repository approaches the "horses" thematic. You could have faceted results per specie and that will give more relevance to your repository in the "horse" thematic and each specie itself. If you pay attention to your repository traffic, you will see several bots trying combinations of search results. If we remove the search from SSR, it would not be possible for crawlers to index them, nor promoting you site in some specific thematic.

I think there is room for improvement regarding the search results....
Performing a single request to a search result page, retrieving just one result (one single item per page), our system performs 38 different database queries:

8 from resourcepolicy table
7 from collection table
6 from item table
4 from metadatavalue table
3 from public.community table
1 from handle table
1 from epersongroup table
1 from relationship_type table
1 from relationship table
1 from community2collection table
1 from community2community table
1 from bundle2bitstream table
1 from fileextension table
...

If I change it to retrieve 10 records per page, then, the system performs more than 500 database queries to the database. This is not scalable. What if we just use Solr and retrieve data stored on Solr for search results?

There is a trade-off here. More time spent on indexing documents, versus more time and performance degradation to retrieve them.
I propose a big change to DSpace here: DSpace/DSpace#9736, by extending the concept of virtual metadata in DSpace. In practice, we could store structured data in Solr and retrieve it later on, for search results, avoiding to perform extra DB queries for additional or related data.

from dspace-angular.

tdonohue commented on August 10, 2024

@paulo-graca : I think your comment may (partially) belong over on #3231, which is the ticket created specifically for this task.

As I've clarified on that ticket, the reason I noted that Search results are "not necessary for SEO" is because we already tell crawlers in our robots.txt file to ignore the search page & facets: https://github.com/DSpace/dspace-angular/blob/main/src/robots.txt.ejs#L11 So, it's odd to have this component available in SSR when (well behaving) crawlers are being told not to use it.

That said, I do agree that we may want to make this configurable, so that sites can choose to allow search results via SSR. Overall the goal of #3231 is to see if there is "quick fix" performance improvement at least until we can determine ways to improve performance of search in general.

I agree with you that performance of search is NOT ideal, but that also seems like a larger redesign / discussion. Any larger redesign would not help the performance of sites on 7.x or 8.x (as it wouldn't be possible to backport). However, a small change to optionally exclude the search.component from SSR could be backported to 7.x/8.x, as it might be considered a performance fix for those platforms.

UPDATE: Also, it's worth noting we have a separate ticket that is about smaller fixes to search performance: #3163 However, this ticket is more about decreasing the number of REST API requests (from the UI) to perform a single search...which hopefully should decrease the database queries as well.

from dspace-angular.

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR) about dspace-angular HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent