Comments (15)
A person with some Angular experience (not a complete expert) asked me why we are making so many small requests to the backend instead of making less but larger requests to the backend. One example could be the submission form where we send one request for each section of the form and one request for each vocabulary and even more requests, instead of having one request for all information we need to build up the form.
from dspace-angular.
Today, I've moved a number of the performance issues described above into separate tickets. They are not directly related to this SSR issue (or only very loosely related), but obviously are performance issues themselves which require fixing:
If there are other obvious areas where DSpace is (inefficiently) making many requests, then I'd recommend others create additional tickets as well. While these issues are all semi-related, they all may need to be solved separately, as the solutions may not be identical in all cases.
from dspace-angular.
We tried Apache HTTPD mod_cache here and ran into indiscriminate sharing of Shibboleth tickets across distinct users' sessions. I haven't had time to investigate, so it might be due to unrelated changes that were hastily backed out at the same time that we disabled the external cache. If the cache was the culprit, I do not know whether a smarter configuration would fix that.
from dspace-angular.
A person with some Angular experience (not a complete expert) asked me why we are making so many small requests to the backend instead of making less but larger requests to the backend. One example could be the submission form where we send one request for each section of the form and one request for each vocabulary and even more requests, instead of having one request for all information we need to build up the form.
Any multiple request done during the submission form are not relevant for this problem since the submission form is not rendered during the SSR
https://github.com/DSpace/dspace-angular/blob/main/src/app/submission/edit/submission-edit.component.ts#L150
Anyway I agree with the fact that we should optimize the number of request to the server when is possible.
I reckon that when we have multiple requests to the same endpoint referring to different dspace objects, we could try to group them in a single one.
Authorizations endpoint is one example where we have a search method which take multiple uuids for the same authorization feature request (https://github.com/4Science/Rest7Contract/blob/main/authorizations.md#objects). This search method should be used where we handle a set of dspace objects like in the search component.
We did something in this regarding on DSpace-CRIS with the item endpoint (https://github.com/4Science/Rest7Contract/blob/main-cris/items.md#findallbyids) to avoid to have multiple request to the item endpoint just to request for embedded objects as done in the search for thumbnails or other objects :
This approach is helpful not only for the SSR but also for the search functionality in general because it allows to speed up the loading of search results.
More in general as well as the high number of REST requests, the SSR can introduce overhead, especially for resource-intensive apps that require heavy server-side processing or complex data computations. In such cases, SSR may strain server resources and slow down rendering.
To mitigate these issues, we should profile and optimize resource-intensive code by using a node monitoring tool.
from dspace-angular.
Thank you @tdonohue. I agree, I'm proposing a big change that, perhaps, we can't add it easily to DS8 and DS7. I've also added a comment to #3231, I agree that it should be configurable. My only point is that it matters in terms of SEO to have Search Results in SSR available for crawlers and also to Internet Archives out there.
from dspace-angular.
I want to share that we felt the same described behavior when we went live with our first deployed DS7.
Currently, we are still experiencing constant CPU demand when comparing it to the previous version. Mainly on the Angular processes due to crawling or robots activity.
We use Apache mod_proxy in front of the service. We have created some Apache rules to redirect requests to 404 page not found:
/simple-search?query=
and /handle/HANDLE_PREFIX/ID/simple-search?query=
, avoiding angular to render the result for that particular broken URLs.
Like:
<Location "/simple-search">
Order Deny,Allow
Deny from all
Allow from 127.0.0.1
ErrorDocument 404 "Page not found"
RewriteEngine On
RewriteRule .* - [R=404,L]
</Location>
<LocationMatch "/handle/(.*)/(.*)/simple-search">
Order Deny,Allow
Deny from all
Allow from 127.0.0.1
ErrorDocument 404 "Page not found"
RewriteEngine On
RewriteRule .* - [R=404,L]
</LocationMatch>
Also, we use Apache mod_evasive, with some strict restrictions to the traffic (we are still fine tunning it) and we intend to try out some of the above suggestions (like the inlineCriticalCss), and tune even more the values of angular cache.
from dspace-angular.
The built-in memory cache isn't even shared across processes on the same server.
The Redis approach is interesting to me because, while Redis is another in-memory cache, it can be shared across processes.
from dspace-angular.
We (Universität Konstanz) also have made some customizations.
We've opted to use varnish (instead of redis/mod_cache, or built-in cache). This has a few advantages for us:
- The cache is shared between all instances
- We run a multilingual repo and the built-in cache does not distinguish by language
- Finer control over what gets cached when
- Things like static assets don't even need to touch the node processes most of the time. This is especially useful as long as the frontend is under high load
We've also disabled SSR for logged-in users, becasue they don't need it and it saves resources for us
4Science has updated DSpace-CRIS to exclude the Search component from SSR. This search component is not necessary for SEO, so it may make sense to disable SSR for this component.
We did something similar, but only for the "related publications list" on person entity pages, as that took quite a while to load. We may extend this to all searches in the future though
We extended the robots.txt to disallow additional robots we don't want
from dspace-angular.
We provide documentation that you can find helpful. There is the solution we used to replace Server-Side rendered pages with Redis in the case study on the Jagiellonian University Repository:
Replace the build-in Server-Side Rendered pages with Redis.pdf
from dspace-angular.
We've been wondering about it as well. Same with /api/authz/authorizations/
. So many requests. Same thing with search facets. Loading the configuration, facets, facet values all in separate requests
from dspace-angular.
There was some good discussion in today's Dev Mtg about areas where we should consider "bundling" many smaller REST Requests into a single (or fewer) requests. They include:
- Submission form (as noted by @pnbecker above. Created new ticket: #3162)
- Authorization checks (as noted by @hutattedonmyarm above. Created new ticket #3161)
- Search facets (as noted by @hutattedonmyarm above. Created new ticket #3163)
- Configuration checks (as noted in DSpace/DSpace#9056)
Those in attendance agreed that we should work on efficiency of these requests. Ideally we should bundle requests better and send fewer requests per page.
That said, this sort of work requires help/volunteers/investigation. I'll work on creating some tickets for each of these, as I think they are each separate issues which will require often individual solutions.
(These many requests are semi-related to performance of SSR as they likely can contribute to the slowness. However, these are less likely to be "quick fixes", as some may require a new backend endpoint to be created and refactor the UI code.)
from dspace-angular.
There's another one: Browing the landing page of the demo and clicking to go to one of the communities also loads the data twice:
The first one is a 302 redirect, but already contains the community object. Then the actual object is requested again.
#458 also loads data twice.
Not only the number of requests, also the size. E.g. clicking on a publication to go to its page requests the whole "owningCollection" object in a separate request. It's unclear what for and if needed it should be embedded in the item instead of a separate request
from dspace-angular.
May I suggest that removing redundant requests, and bundling related requests, are important and should be addressed, but that we may need a second ticket for issues that are unrelated to SSR.
from dspace-angular.
«4Science has updated DSpace-CRIS to exclude the Search component from SSR. This search component is not necessary for SEO, so it may make sense to disable SSR for this component.»
I tend to disagree here. Search results are relevant for SEO since you can have faceted results that give relevance in some thematics to your repository/site. Imagine that your repository approaches the "horses" thematic. You could have faceted results per specie and that will give more relevance to your repository in the "horse" thematic and each specie itself. If you pay attention to your repository traffic, you will see several bots trying combinations of search results. If we remove the search from SSR, it would not be possible for crawlers to index them, nor promoting you site in some specific thematic.
I think there is room for improvement regarding the search results....
Performing a single request to a search result page, retrieving just one result (one single item per page), our system performs 38 different database queries:
- 8 from resourcepolicy table
- 7 from collection table
- 6 from item table
- 4 from metadatavalue table
- 3 from public.community table
- 1 from handle table
- 1 from epersongroup table
- 1 from relationship_type table
- 1 from relationship table
- 1 from community2collection table
- 1 from community2community table
- 1 from bundle2bitstream table
- 1 from fileextension table
- ...
If I change it to retrieve 10 records per page, then, the system performs more than 500 database queries to the database. This is not scalable. What if we just use Solr and retrieve data stored on Solr for search results?
There is a trade-off here. More time spent on indexing documents, versus more time and performance degradation to retrieve them.
I propose a big change to DSpace here: DSpace/DSpace#9736, by extending the concept of virtual metadata in DSpace. In practice, we could store structured data in Solr and retrieve it later on, for search results, avoiding to perform extra DB queries for additional or related data.
from dspace-angular.
@paulo-graca : I think your comment may (partially) belong over on #3231, which is the ticket created specifically for this task.
As I've clarified on that ticket, the reason I noted that Search results are "not necessary for SEO" is because we already tell crawlers in our robots.txt file to ignore the search page & facets: https://github.com/DSpace/dspace-angular/blob/main/src/robots.txt.ejs#L11 So, it's odd to have this component available in SSR when (well behaving) crawlers are being told not to use it.
That said, I do agree that we may want to make this configurable, so that sites can choose to allow search results via SSR. Overall the goal of #3231 is to see if there is "quick fix" performance improvement at least until we can determine ways to improve performance of search in general.
I agree with you that performance of search is NOT ideal, but that also seems like a larger redesign / discussion. Any larger redesign would not help the performance of sites on 7.x or 8.x (as it wouldn't be possible to backport). However, a small change to optionally exclude the search.component from SSR could be backported to 7.x/8.x, as it might be considered a performance fix for those platforms.
UPDATE: Also, it's worth noting we have a separate ticket that is about smaller fixes to search performance: #3163 However, this ticket is more about decreasing the number of REST API requests (from the UI) to perform a single search...which hopefully should decrease the database queries as well.
from dspace-angular.
Related Issues (20)
- Managing Community Hierarchy: reuse the existing code to the UI
- Enable edition/customization of “News” and the default “Non-Exclusive License” HOT 2
- Possibility to edit custom metadata - backend HOT 1
- Improve/refactor Administrator Reports (DS8 Beta feature)
- Improve the usability and design of the file editing area and the permissions themselves
- Export matadata in other bibliographic formats besides .csv (.ris, .enl, .bib) HOT 2
- Enable edition/customization of “Privacy Policy” and “End User Agreement”
- UX improvement of file permissions (Requires discussion and suggestions) HOT 1
- Error when Editing Groups: 'admin.access-control.groups.form.members-list.notification.failure' and 404 Not Found Exceptions in DSpace 7.6 with DSpace CRIS HOT 1
- Crosswalk for Highwire Press (Google Scholar) citation_ metadata tags should be configurable HOT 1
- Discussion: Do we need a UI/UX guide for developers? HOT 3
- dspace-angular-dspace-8.0 i18n add traditional Chinese! HOT 1
- Discussion: should we use http forwards for requests like /handle/<prefix>/<suffix>? HOT 3
- Faulty submission form: arrange more than 3 lines HOT 3
- Cannot use ordinary file browser/selector in Metadata Import (CSV) HOT 4
- Mapped collection themes are never displayed when viewing an item HOT 1
- Consider update of ng-dynamic-forms library to latest version HOT 1
- Consider excluding Search component from Angular SSR (Server Side Rendering) HOT 2
- There needs to be a field to add alt text for logos when adding collections HOT 2
- URI link on simple item page always opens in a new page/tab
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dspace-angular.