Code Monkey home page Code Monkey logo

ukwa-ui's Introduction

UKWA UI

User interface for the UK Web Archive

How to Run the Code

  • Install a git client of your choice.
  • Clone this repository.

You can then run the code using the run-against-docker.sh. This helper script uses Docker Compose to start up two Solr services and then runs the UI using Maven.

There are scripts to populate the Solr services with example data, but the example data is not stored in this repository and is currently available on upon request.

Running from IntelliJ

Alternatively, you can run code from IntelliJ, and configure it to use the same Solr services. Note that the username and password are not needed to run against local services.

  • Install IntelliJ IDEA.
  • Open the code in IntelliJ IDEA.
  • Go to Run -> Edit Configurations.
  • Click + in the upper-left corner of the dialog and add a Maven configuration.
  • On the "Parameters" tab, enter "spring-boot:run" in the "Command Line" field.
  • Specify Solr credentials:
    • On the "Runner" tab, click on three dots next to "Environment variables".
    • Сlick the "plus" button to add two new variables: SOLR_USERNAME and SOLR_PASSWORD.
    • Click OK to close the dialog.
  • Click OK to close the dialog.
  • Go to Run -> Run (Shift-F10). Wait for the build to finish.
  • Navigate to http://localhost:8080 in your browser.

Supporting content translation

For the accessibility statements, most of the work was done by converting the original DOCX to HTML:

pandoc -f docx -t html --ascii acc-cy.docx > acc-cy.html

The ascii flag ensures special characters are HTML/ampersand-escaped rather than relying on file character encoding.

Properties files need a bit more care, but can be mapped via spreadsheets if needed.

ukwa-ui's People

Contributors

anjackson avatar bzaar avatar gilhoggarth avatar jasonwebber-bl avatar ldbiz avatar matejdrabik avatar min2ha avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ukwa-ui's Issues

Various malformed links arising from " being using in the messages files.

As raised by Somaya, some links are coming out double-quoted. e.g. https://beta.webarchive.org.uk/en/ukwa/noresults

Looking at the source, it seems the messages files that contain 'over escaped' HTML fragments. e.g.

https://github.com/ukwa/ukwa-ui/blob/master/src/main/resources/i18n/messages.properties#L73

Which creates a link to https://beta.webarchive.org.uk/en/ukwa/"cookies" when the browser tries to make sense of the HTML.

Maintain WCT-era links under the new website

The WCT-era links, especially for Targets and Collections, need to be supported by the new website. The WCT IDs are stored in W3ACT but are not yet exposed in the same way.

Some of these look rather complicated, e.g. https://www.webarchive.org.uk/ukwa/target/28180500/collection/26312782/source/collection because the latter part of the URL is being used to store the context the target is being viewed in.

Others are complicated because the paging of other parameters: https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/collection

In practice, it should suffice to match the primary ID (discarding the remainder of the URL).

We should also review the main navigational URLs and provides mappings in the new site. e.g.

etc.

Current version of ukwa-ui does not provide pages for Targets. As the design settles down, I suggest we plan to augment the ukwa-ui app with a statically-generated site that covers Targets and Collections, and reduce ukwa-ui over time to just the dynamic search part.

Should we provide some kind of embed for web archive instances?

Thinking about using web archives for storytelling, it occurs to me that one way this is done for other media is via embeds, like YouTube videos embedded in a blog, or Twitter Cards so more info can be embedded in Twitter conversations. We should consider adding some kind of embedded version of a web page, e.g. an autogenerated thumbnail API and oEmbed hook?

Inconsistent naming of language options

The language labels (English/Welsh/Scottish) appear to be displayed inconsistently in the different languages.

E.g. I would expect the label to be in the language of that to be selected.

So on an English or Scottish page, "Welsh" would be displayed as "Cymraeg", not "Welsh", say.

However on the English and Scottish pages this is so, whereas on the Welsh page:

"English" is displayed as "Saesneg" - the Welsh for "English". Should be "English".

Scottish"is displayed as "Gaeleg yr Alban" - the Welsh for "Scottish Gaelic". Should be whatever the Scottish Gaelic of "Scottish Gaelic" is. Presumably "Gàidhlig" as that's on the English page.

Direct to earliest holding instead of latest

Currently, links to playback put a timestamp of 29991231999999 in the URL to direct to the latest copy. This seems to expose dead sites far too easily.

As our users are usually interested in the past, it makes more sense to default to showing our earliest copy rather than the latest, when we have no other timestamp information to go on.

So, can we use {playback-prefix}10001231999999/{url} instead?

Do we need to add a random part to avoid locking problems? e.g. {playback-prefix}10001231######/{url} ?

Project does not build, fails during the test phase

One of a set of questions that arose during an internal code review.

This codebase appears to have no functioning tests. There is one test, but it does not appear to test anything, and breaks the build. Please remove or improve the test.

Search within Special Collections appears not to work

Example:

https://dev.webarchive.org.uk/en/ukwa/collection/1107

Black and Asian Britain

First site in the list:
A.M. Qattan Foundation

Search on "Qattan" in the Special Collection Search Box under the Collection Title

No results.

(I've checked that its not because the site above is in Arabic; another example is "Evaristo" which is in content of the Title of the same name further down the collection)

Another example is searching on "Brexit" within the Brexit collection.

Don't know if its a configuration, indexing or coding issue.

Related Cards listed on
https://trello.com/c/sQ1KwFI2/212-snapshots-vs-titles-search-considerations

Add BL Staff LAN to access list

Staff browsers are connecting directly to the HTTPS website rather than going via the staff proxy. So, to offer staff the right link we need to whitelist the whole staff LAN address range.

I'm not sure how this works for the IPv6-based users coming in on their laptops, so this might not resolve the whole problem.

Minor Translation Tweaks

LIst of (possible?) omissions:

Main Heading/Banner: UK Web Archive and sub heading
Selected Filter information
The nominate items

Basically I've identified those from looking at the missing tags in the properties file, hope that's enough to go on.

All users to see which other archives hold copies of a particular URL

NOTE: This is not necessarily part of this website engine, and is a proposal held here because it's a part of the front-end workflow.

We need to provide a URL resolution endpoint (for e.g. Document Harvester or other links in the catalogues) that bounces the user to the right end point. i.e. given this:

https://www.webarchive.org.uk/access/resolve/20170403111155/https://www.gov.uk

The resolver should look it up in the Open Access Wayback and then, if it's a 451, redirect to the appropriate LDL Reading Room system if the user is on a known Reading Room IP. If we have it, it should redirect to Open Access Wayback. If we don't have it, we could offer links to holdings in other web archives (via Memento). This latter functionality could be embedded in the Wayback page, but it's probably simpler to take the Mementos component and either update it (and run it standalone) or merge the logic into this website.

For extra credit, the system should check if the item is available in the Reading Room playback system and be apologetic if the item is known but not yet available.

In the meantime, we could just use Apache to redirect the user to re-written URLs based on IP.

Search box not accepting characters on mobile

Home page search box wouldn't accept characters when typed.

Also, the tooltip indicating the empty field is misplaced, it points above the field, not to it.

Moto g4, Android 7, Chrome

Why does the code re-implement a Solr client rather than use solrj?

One of a set of questions that arose during an internal code review.

Although solrj is on the classpath and in the pom.xml, it does not appear to be used anywhere. The code itself however re-implements a lot of the SolrJ client logic, like building HTTP requests, setting query parameters, escaping values and so on.

Why did you re-invent this logic rather than re-using SolrJ?

(see https://cwiki.apache.org/confluence/display/solr/Using+SolrJ for more details on using SolrJ)

Filter is buggy when switching languages

Home Page >
Enter a search term (no filter) >
n results returned >
Change language (e.g. change from English to Welsh) >

Issue 1: 0 results returned for "" (keyword has been reset) >

Enter a keyword and search >

Issue 2: 0 results returned for "" (but a keyword has been supplied) >

Reset filters >
Enter search term >
n results returned

Issue 3: Why do filters need to be reset? We haven't set them!

Code has very little logging

(Initially raised in #9)

@anjackson, previously we had this requirement for logging:

  1. Detailed specification of the back-end URL’s being called should be available in the logs on every request, at DEBUG level.
  2. Overall configuration/settings/etc. should be reported at INFO on start up.
  3. Any error cases should be reported with stack traces at ERROR level. Exceptions should not be swallowed silently.

We have implemented 1 and 3.

What is implied by 2? Shall we just dump the contents of application.properties to the log file? After removing passwords of course.

When you say "Code has very little logging", how much logging do you expect? It currently logs Solr queries as per (1) above and stack traces as per (3). What else would you like logged?

Missing archived date

There was a glitch (which I can't recreate at the mo) whereby a collection target did not have an "archived date".

image

Is this something we need to account for? E.g. when filtering search results by date, shall we provide a "no date" option?

Miscellanous issues arising during code review

One of a set of questions that arose during an internal code review.

Some minor issues that arose in general:

  • Code has very little logging
  • Code has no tests
  • Code has very few comments, making large code blocks difficult to understand. A bit more vertical spacing and some comments about what the major sections were doing would be good.

in the Java:

  • Can we use server-relative URLs throughout? Rather that needing the replace-with-https code and other manipulations involving absolute URLs?
  • There are quite a lot of hard-coded 'magic numbers/strings' that should really be declared in static properties. e.g. here
  • Hard-coded per-facet logic means adding a new facet is relatively cumbersome.
  • Almost no JavaDoc - it would be useful to have some on major public API components.
  • Often uses a 'passing mutable state' approach when an 'immutable state management' style would be cleaner. e.g. here

in the templates:

  • there seems to be a lot of repeated HTML fragments, e.g. per facet, that would be better done as sub-templates (e.g. this also makes adding a new facet cumbersome)
  • there seems to be rather a lot of programmatic logic in the JSPs that might be better done in helper methods or ameliorated by more re-use of template code/sub-templates.

and in the JavaScript:

The new site needs a BETA banner

Could we implement something like the attached mock-up to make it clear that this is the beta? It’s also an opportunity to retain the survey link after they’ve dismissed the popup.

untitled-1

(also the old Access Tool needs a banner pointing users to the new site, but that must be done after this ticket.)

Filter Help Tooltips have redundant suffix or are placeholders

If you hover over the filter Help symbols (the Question Mark symbols, not the chevrons), the tooltip displayed is suffixed "tooltip", e.g. "Access facet help tooltip"

The last word shouldn't be there.

I suspect these are intended as placeholders for more useful tooltips though.

Advanced Search - The Advanced Search form should replace the simple search

Currently, switching to Advanced Search adds the right fields, but also requires the user to enter a standard query in the 'Enter search phrase...' box. The Advanced Search form should replace that box, or at least render that box optional.

Secondarily, the search is sometimes returning unexpected results, but i think it's because the Solr endpoint is defaulting to OR logic when searching rather than AND. As indicated here this can be changed by extending the query with &q.op=AND or by changing the Solr server configuration.

Replace logo for NLS

The NLS logo in the footer needs replacing (i've emailed the new jpg to Mindy and Lee)

Localised pages should direct user to the localised Wayback

FAQ Page Issues

The following comments were noted in review:

• The FAQ page is too full on, too many sparse titles all over the page, it should be presented as an expandable list
• it's formatting is out of sync with the standard links in the footer.
• And when you click on the individual links, it can be quite jumpy, it doesn't look good

https://dev.webarchive.org.uk/en/ukwa/info/faq

Improve error handling

One of a set of questions that arose during an internal code review.

When an error occurs, such as one of the Solr back-ends being unavailable, a response is given that is generic and unhelpful.

All error pages should be themed as normal.

Also, ideally, the codebase should have a development mode where stack traces are reported through the UI, and a production mode where the stack traces are collapsed/hidden and a user-friendly message is returned. The production mode should also email errors to a configurable email account.

Advanced Search - facets drop below

When you click on Advanced Search, the facets drop down the page with the rest of it. I understand why this is happening from a technical point of view, but I don't think its good UX:

a) the facets are less accessible
b) a load of white space is introduced.

Before:

image

After:

image

Special Collection Paging via URL Incorrect

Page after the last one specified in url results in a page with ...
1. ... no indication its actually not the one specified, or that the one specified doesn't exist.
2. ... has 2 sets of page buttons! OR
3. ... has no buttons, twice...

Examples:

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=3
gives no indication the page specified doesn't exist (there are only 2 pages).

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=4
Shows 2 sets of buttons

https://beta.webarchive.org.uk/en/ukwa/collection/329?page=10
Shows the back arrow (with no page buttons) twice.

Lumped together as I'm assuming its the same underlying or related cause(s).

Advanced Search - direct url link not available

If I click on Advanced Search, the url stays the same. Not necessarily a bug (because it displays and hides a frame within the current page), just questioning this from a UX point of view.

More maintainable method for knowing whether the user can access Ericom

Currently we use a hard-coded IP address mapping table to work out whether to present the user with links to Ericom. This is a pain to maintain over time.

Alternatively, could we actually test for access in the web page itself? i.e. default to no-access and then use JavaScript to check which (if any) of the secure playback gateways are visible to the user. If they can see one of them, change the page to offer direct access (probably storing the result in a cookie to avoid re-checking over and over again).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.