ukwa / ukwa-ui Goto Github PK

View Code? Open in Web Editor NEW

0.0 11.0 6.0 173.62 MB

A new user interface for the UK Web Archive

License: BSD 3-Clause "New" or "Revised" License

Java 35.49% CSS 2.39% JavaScript 27.02% HTML 8.23% Dockerfile 0.09% Shell 0.13% SCSS 26.65%

web-archiving web-archives

ukwa-ui's Introduction

UKWA UI

User interface for the UK Web Archive

How to Run the Code

Install a git client of your choice.
Clone this repository.

You can then run the code using the run-against-docker.sh. This helper script uses Docker Compose to start up two Solr services and then runs the UI using Maven.

There are scripts to populate the Solr services with example data, but the example data is not stored in this repository and is currently available on upon request.

Running from IntelliJ

Alternatively, you can run code from IntelliJ, and configure it to use the same Solr services. Note that the username and password are not needed to run against local services.

Install IntelliJ IDEA.
Open the code in IntelliJ IDEA.
Go to Run -> Edit Configurations.
Click + in the upper-left corner of the dialog and add a Maven configuration.
On the "Parameters" tab, enter "spring-boot:run" in the "Command Line" field.
Specify Solr credentials:
- On the "Runner" tab, click on three dots next to "Environment variables".
- Сlick the "plus" button to add two new variables: SOLR_USERNAME and SOLR_PASSWORD.
- Click OK to close the dialog.
Click OK to close the dialog.
Go to Run -> Run (Shift-F10). Wait for the build to finish.
Navigate to http://localhost:8080 in your browser.

Supporting content translation

For the accessibility statements, most of the work was done by converting the original DOCX to HTML:

pandoc -f docx -t html --ascii acc-cy.docx > acc-cy.html

The ascii flag ensures special characters are HTML/ampersand-escaped rather than relying on file character encoding.

Properties files need a bit more care, but can be mapped via spreadsheets if needed.

ukwa-ui's People

Contributors

Watchers

Forkers

jasonwebber-bl min2ha jsdelivrbot archivo-lt uk-gov-mirror ldbiz

ukwa-ui's Issues

Various malformed links arising from " being using in the messages files.

As raised by Somaya, some links are coming out double-quoted. e.g. https://beta.webarchive.org.uk/en/ukwa/noresults

Looking at the source, it seems the messages files that contain 'over escaped' HTML fragments. e.g.

https://github.com/ukwa/ukwa-ui/blob/master/src/main/resources/i18n/messages.properties#L73

Which creates a link to https://beta.webarchive.org.uk/en/ukwa/"cookies" when the browser tries to make sense of the HTML.

Maintain WCT-era links under the new website

The WCT-era links, especially for Targets and Collections, need to be supported by the new website. The WCT IDs are stored in W3ACT but are not yet exposed in the same way.

Some of these look rather complicated, e.g. https://www.webarchive.org.uk/ukwa/target/28180500/collection/26312782/source/collection because the latter part of the URL is being used to store the context the target is being viewed in.

Others are complicated because the paging of other parameters: https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/collection

In practice, it should suffice to match the primary ID (discarding the remainder of the URL).

We should also review the main navigational URLs and provides mappings in the new site. e.g.

etc.

Current version of ukwa-ui does not provide pages for Targets. As the design settles down, I suggest we plan to augment the ukwa-ui app with a statically-generated site that covers Targets and Collections, and reduce ukwa-ui over time to just the dynamic search part.

Should we provide some kind of embed for web archive instances?

Thinking about using web archives for storytelling, it occurs to me that one way this is done for other media is via embeds, like YouTube videos embedded in a blog, or Twitter Cards so more info can be embedded in Twitter conversations. We should consider adding some kind of embedded version of a web page, e.g. an autogenerated thumbnail API and oEmbed hook?

Inconsistent naming of language options

The language labels (English/Welsh/Scottish) appear to be displayed inconsistently in the different languages.

E.g. I would expect the label to be in the language of that to be selected.

So on an English or Scottish page, "Welsh" would be displayed as "Cymraeg", not "Welsh", say.

However on the English and Scottish pages this is so, whereas on the Welsh page:

"English" is displayed as "Saesneg" - the Welsh for "English". Should be "English".

Scottish"is displayed as "Gaeleg yr Alban" - the Welsh for "Scottish Gaelic". Should be whatever the Scottish Gaelic of "Scottish Gaelic" is. Presumably "Gàidhlig" as that's on the English page.

Import Advanced Search UI and logic from Shine

We need to add an advanced search page, and the proposal is to base this on the corresponding page in Shine. Can you work with Chris P. to pursue this?

Direct to earliest holding instead of latest

Currently, links to playback put a timestamp of 29991231999999 in the URL to direct to the latest copy. This seems to expose dead sites far too easily.

As our users are usually interested in the past, it makes more sense to default to showing our earliest copy rather than the latest, when we have no other timestamp information to go on.

So, can we use {playback-prefix}10001231999999/{url} instead?

Do we need to add a random part to avoid locking problems? e.g. {playback-prefix}10001231######/{url} ?

Error 500 on link search

enter http://henryjacksonsociety.org/ into search box and get 'error 500 server error'

enter henryjacksonsociety.org/ and it works fine (the website expected is the top result

Project does not build, fails during the test phase

One of a set of questions that arose during an internal code review.

This codebase appears to have no functioning tests. There is one test, but it does not appear to test anything, and breaks the build. Please remove or improve the test.

Search within Special Collections appears not to work

Example:

https://dev.webarchive.org.uk/en/ukwa/collection/1107

Black and Asian Britain

First site in the list:
A.M. Qattan Foundation

Search on "Qattan" in the Special Collection Search Box under the Collection Title

No results.

(I've checked that its not because the site above is in Arabic; another example is "Evaristo" which is in content of the Title of the same name further down the collection)

Another example is searching on "Brexit" within the Brexit collection.

Don't know if its a configuration, indexing or coding issue.

Should link to a BETA Wayback with consistent theming

We should link to a Beta OpenWayback that uses the new theme/colours.

Mindy has been looking at this. See ukwa/waybacks#1

A Beta Wayback service is now running on the access server, but needs connecting up to https://beta.webarchive.org.uk/wayback (as defined in https://github.com/ukwa/ukwa-access-services/blob/master/docker-compose.yml#L12)

Survey Link font needs to be thinned and text is out of alignment

See design for issue #31

The fix is not quite right but the exact rendering has been deferred to fix in this issue.

Add BL Staff LAN to access list

Staff browsers are connecting directly to the HTTPS website rather than going via the staff proxy. So, to offer staff the right link we need to whitelist the whole staff LAN address range.

I'm not sure how this works for the IPv6-based users coming in on their laptops, so this might not resolve the whole problem.

Minor Translation Tweaks

LIst of (possible?) omissions:

Main Heading/Banner: UK Web Archive and sub heading
Selected Filter information
The nominate items

Basically I've identified those from looking at the missing tags in the properties file, hope that's enough to go on.

Advanced Search - Fields and Labels are out of alignment

All users to see which other archives hold copies of a particular URL

NOTE: This is not necessarily part of this website engine, and is a proposal held here because it's a part of the front-end workflow.

We need to provide a URL resolution endpoint (for e.g. Document Harvester or other links in the catalogues) that bounces the user to the right end point. i.e. given this:

https://www.webarchive.org.uk/access/resolve/20170403111155/https://www.gov.uk

The resolver should look it up in the Open Access Wayback and then, if it's a 451, redirect to the appropriate LDL Reading Room system if the user is on a known Reading Room IP. If we have it, it should redirect to Open Access Wayback. If we don't have it, we could offer links to holdings in other web archives (via Memento). This latter functionality could be embedded in the Wayback page, but it's probably simpler to take the Mementos component and either update it (and run it standalone) or merge the logic into this website.

For extra credit, the system should check if the item is available in the Reading Room playback system and be apologetic if the item is known but not yet available.

In the meantime, we could just use Apache to redirect the user to re-written URLs based on IP.

Search box not accepting characters on mobile

Home page search box wouldn't accept characters when typed.

Also, the tooltip indicating the empty field is misplaced, it points above the field, not to it.

Moto g4, Android 7, Chrome

Tablet rendering: main page subtitle overlaps

"Special Collections" overlaps the text.

https://dev.webarchive.org.uk/en/ukwa/collection

Samsung galaxy Tab 10 inch screen SM-T520
Android 4.4.2
Chrome browser

Why does the code re-implement a Solr client rather than use solrj?

One of a set of questions that arose during an internal code review.

Although solrj is on the classpath and in the pom.xml, it does not appear to be used anywhere. The code itself however re-implements a lot of the SolrJ client logic, like building HTTP requests, setting query parameters, escaping values and so on.

Why did you re-invent this logic rather than re-using SolrJ?

(see https://cwiki.apache.org/confluence/display/solr/Using+SolrJ for more details on using SolrJ)

Filter is buggy when switching languages

Home Page >
Enter a search term (no filter) >
n results returned >
Change language (e.g. change from English to Welsh) >

Issue 1: 0 results returned for "" (keyword has been reset) >

Enter a keyword and search >

Issue 2: 0 results returned for "" (but a keyword has been supplied) >

Reset filters >
Enter search term >
n results returned

Issue 3: Why do filters need to be reset? We haven't set them!

Code has very little logging

(Initially raised in #9)

@anjackson, previously we had this requirement for logging:

Detailed specification of the back-end URL’s being called should be available in the logs on every request, at DEBUG level.
Overall configuration/settings/etc. should be reported at INFO on start up.
Any error cases should be reported with stack traces at ERROR level. Exceptions should not be swallowed silently.

We have implemented 1 and 3.

What is implied by 2? Shall we just dump the contents of application.properties to the log file? After removing passwords of course.

When you say "Code has very little logging", how much logging do you expect? It currently logs Solr queries as per (1) above and stack traces as per (3). What else would you like logged?

Empty Collections/Sub Collections shouldn't be displayed.

Or number of Titles should be displayed with the Collection Title.

Example

https://dev.webarchive.org.uk/en/ukwa/collection/1035
then
https://dev.webarchive.org.uk/en/ukwa/collection/1037
is empty

Missing archived date

There was a glitch (which I can't recreate at the mo) whereby a collection target did not have an "archived date".

Is this something we need to account for? E.g. when filtering search results by date, shall we provide a "no date" option?

Miscellanous issues arising during code review

One of a set of questions that arose during an internal code review.

Some minor issues that arose in general:

Code has very little logging
Code has no tests
Code has very few comments, making large code blocks difficult to understand. A bit more vertical spacing and some comments about what the major sections were doing would be good.

in the Java:

Can we use server-relative URLs throughout? Rather that needing the replace-with-https code and other manipulations involving absolute URLs?
There are quite a lot of hard-coded 'magic numbers/strings' that should really be declared in static properties. e.g. here
Hard-coded per-facet logic means adding a new facet is relatively cumbersome.
Almost no JavaDoc - it would be useful to have some on major public API components.
Often uses a 'passing mutable state' approach when an 'immutable state management' style would be cleaner. e.g. here

in the templates:

there seems to be a lot of repeated HTML fragments, e.g. per facet, that would be better done as sub-templates (e.g. this also makes adding a new facet cumbersome)
there seems to be rather a lot of programmatic logic in the JSPs that might be better done in helper methods or ameliorated by more re-use of template code/sub-templates.

and in the JavaScript:

Also quite a lot of hard-coded 'magic numbers'.
Why are events being swallowed in the JavaScript? (various event propagation blocks)
Some repeated jQuery calls should be replace for performance reasons (e.g. these repeated calls to get .collection-description)
the code implements it's own accordion sidebar logic - is this neccessary when e.g. bootstrap components are available?
Date range validation does not ensure the start date is before the end date.

The new site needs a BETA banner

Could we implement something like the attached mock-up to make it clear that this is the beta? It’s also an opportunity to retain the survey link after they’ve dismissed the popup.

(also the old Access Tool needs a banner pointing users to the new site, but that must be done after this ticket.)

Filter Help Tooltips have redundant suffix or are placeholders

If you hover over the filter Help symbols (the Question Mark symbols, not the chevrons), the tooltip displayed is suffixed "tooltip", e.g. "Access facet help tooltip"

The last word shouldn't be there.

I suspect these are intended as placeholders for more useful tooltips though.

Advanced Search - The Advanced Search form should replace the simple search

Currently, switching to Advanced Search adds the right fields, but also requires the user to enter a standard query in the 'Enter search phrase...' box. The Advanced Search form should replace that box, or at least render that box optional.

Secondarily, the search is sometimes returning unexpected results, but i think it's because the Solr endpoint is defaulting to OR logic when searching rather than AND. As indicated here this can be changed by extending the query with &q.op=AND or by changing the Solr server configuration.

Add Dockerfile and setup a Docker-based deployment

As discussed, we need this component set up to build via Docker Hub and deployable via environment variables. etc.

Advanced Search - advanced options hidden after search results returned

The search display reverts to basic - UX query: is this ok?

Replace logo for NLS

The NLS logo in the footer needs replacing (i've emailed the new jpg to Mindy and Lee)

Change Survey Link

We (the BL) have changed accounts with surveymonkey. Can you replace the old link with this one:

https://www.surveymonkey.co.uk/r/ukwasurvey01

Raised by @JasonNotOnHereYet

Links to reading rooms on 'Resource only available in a reading room' page all broken

Links to all the reading rooms are broken

Minor tweaks to tidy translated fields and banner

Placeholder for commits below:

5191ceb
9321968

Localised pages should direct user to the localised Wayback

Wayback Landing Page is always English.

Example:

From the Welsh version of:

Special Collections > Climate Change Debates > 10:10uk

https://dev.webarchive.org.uk/cy/ukwa/collection/369 >
https://dev.webarchive.org.uk/cy/ukwa/wayback/OA/29991231999999/http://www.1010uk.org/

We land on

https://www.webarchive.org.uk/wayback/archive/20170630224615/https://1010uk.org/

It should be

https://www.webarchive.org.uk/wayback/archive-cy/20170630224615/https://1010uk.org/

Advanced Search - some criteria lack end quotes when displayed

ReCaptcha is not localised

Example:

https://dev.webarchive.org.uk/gd/ukwa/contact

Scots page, reCaptcha in English.

I don't know if there are country codes for Welsh and Scots, if not, don't worry about hacking it, this can be closed.

Single field parsing – search data could be separated by different symbols

What symbols users should use when specify search data in single field, in order to be transformed into field list and passed to SOLR query?

Search across Collections and Web Sites as well as resources and facet on type

The idea that the general search should simultaneously search across both the Collections and Targets Solr Service as well as the resource-level full-text index. The type facet could then be used to filter the faceted results.

Depends on whether the schemas are able to be cross-searched.

FAQ Page Issues

The following comments were noted in review:

• The FAQ page is too full on, too many sparse titles all over the page, it should be presented as an expandable list
• it's formatting is out of sync with the standard links in the footer.
• And when you click on the individual links, it can be quite jumpy, it doesn't look good

https://dev.webarchive.org.uk/en/ukwa/info/faq

Update Cambridge logo

We need to update the Cambridge logo.

Switch to default collection images on the server side?

One of a set of questions that arose during an internal code review.

This part of common.js appears to be modifying image tags when there is no image given.

Surely this would be better done server-side? Switching to the default image if none is set? Or is there a reason this can't be done?

Move Solr spec and Docker Hub build into this project

Later on, we should move the Solr setup into this repo and close down that old one. We should then fix up the Docker Hub build.

Improve error handling

One of a set of questions that arose during an internal code review.

When an error occurs, such as one of the Solr back-ends being unavailable, a response is given that is generic and unhelpful.

All error pages should be themed as normal.

Also, ideally, the codebase should have a development mode where stack traces are reported through the UI, and a production mode where the stack traces are collapsed/hidden and a user-friendly message is returned. The production mode should also email errors to a configurable email account.

Advanced Search - facets drop below

When you click on Advanced Search, the facets drop down the page with the rest of it. I understand why this is happening from a technical point of view, but I don't think its good UX:

a) the facets are less accessible
b) a load of white space is introduced.

Before:

After:

Special Collection Paging via URL Incorrect

Page after the last one specified in url results in a page with ...
1. ... no indication its actually not the one specified, or that the one specified doesn't exist.
2. ... has 2 sets of page buttons! OR
3. ... has no buttons, twice...

Examples:

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=3
gives no indication the page specified doesn't exist (there are only 2 pages).

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=4
Shows 2 sets of buttons

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=10
Shows the back arrow (with no page buttons) twice.

Lumped together as I'm assuming its the same underlying or related cause(s).

Advanced Search - direct url link not available

If I click on Advanced Search, the url stays the same. Not necessarily a bug (because it displays and hides a frame within the current page), just questioning this from a UX point of view.

Non-English translated content indicating encoding issues.

Example:

...webarchive.org.uk/cy/ukwa/info/nominate

Question mark appearing in the middle of words.

"Diogelwch wefan o?r DG"
"hail-gynllunio?n sylweddol"

etc.

Advanced Search - Spelling Error

Jack Russell, not Jack Russel

More maintainable method for knowing whether the user can access Ericom

Currently we use a hard-coded IP address mapping table to work out whether to present the user with links to Ericom. This is a pain to maintain over time.

Alternatively, could we actually test for access in the web page itself? i.e. default to no-access and then use JavaScript to check which (if any) of the secure playback gateways are visible to the user. If they can see one of them, change the page to offer direct access (probably storing the result in a cookie to avoid re-checking over and over again).

Add support for pulling highlights and collection images from W3ACT

Once ukwa/ukwa-manage#25 and dependent tasks are complete, we will need to update the the UI to make use of them.

NLS link in footer is broken

The link to the NLS site in the footer goes to https://www.uk/ not https://www.nls.uk/. Please fix that!

ukwa / ukwa-ui Goto Github PK

ukwa-ui's Introduction

UKWA UI

How to Run the Code

Running from IntelliJ

Supporting content translation

ukwa-ui's People

Contributors

Watchers

Forkers

ukwa-ui's Issues

Recommend Projects

Recommend Topics

Recommend Org