dilcisboard / e-ark-aip Goto Github PK
View Code? Open in Web Editor NEWE-ARK AIP Specification
Home Page: https://earkaip.dilcis.eu/
E-ARK AIP Specification
Home Page: https://earkaip.dilcis.eu/
We recommended to use "as a prefix an internationally recognized standard identifier for the institution from which the SIP originates. This may lead to problems with smaller institutions, which do not have any such internationally recognized standard identifier. We propose in that case, to start the prefix with the internationally recognized standard identifier of the institution, where the AIP is created, augmented by an identifier for the institution from which the SIP originates."
This sentence doesn’t make any sense… as right after, the example given discards the entire recommendation.
/mets/@OBJID="urn:uuid:123e4567-e89b-12d3-a456-426655440000"
I would remove this entire paragraph. The recommendation should be to use an international standard schema for identifiers (not institutions IDs)
Maybe ask Sven or Bjorn for some?
Review comment (Ch. 5.2.3): Does updating original submission actually mean that we can eventually have different forks of the metadata? E.g. when first creating a new representation of the package or its metadata, and the updating the original submission?
Update following DILCISBoard/E-ARK-CSIP#710 needed
Review comment (Chapter 3): We believe that AIP export functionality does not really reduce costs, since the destination repository needs to handle the exported AIP in a same way as a new SIP to update its internal data structures. Further, the destination repository may need two ingest workflows; one for SIPs and one for AIPs which most likely increases development costs.
Too focused on the E-ARK project. It should be a proper executive summary without references to E-ARK.
Originally raised by Miguel Ferreira
We are missing some important diplomatic attributes such as authorship of the document.
See how PREMIS clearly identifies the authors and contributors of the document. http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
This is important for reference purposes.
Review comment: Figure 11 (rep-001): In METS.xml: "/submission/representations/file.odt" should be "/submission/representations/rep-001/file.odt". Same with "rep-002".
In Sweden we have not created separate FGSs for SIP, AIP and DIP. We recommend that all extra elements for SIP, AIP and DIP should be included in CSIP, but as optional information. Then, documents are needed that describe how to create a SIP, AIP, DIP and how to use the elements at EU level and that it is possible to adapt for local use in different countries.
Proposed element Preservation status. Status for retention, disposal or preservation. S=Save, preserve (swe. bevara), DI=Disposal, retention (swe. gallra), PA=PAused (swe. parkerad), UN=UNknown (swe. okänt). Element type: String. Allowed values: S, DI, PA, UN
Section 2 starts with the following sentence "This AIP format specification is based on E-ARK deliverable, D4.4 “Final version of SIP-AIP conversion component”. It relates to part A of this deliverable which is the AIP format specification. "
Mentioning this without a chapter describing what E-ARK was is, in my view, pointless. This is a spec.. not a project deliverable. Context must be given.
Review comment (Page 23): Extra ">" characters in the text.
Requirements related to the AIP format are not derived from a Mets profile. Requirements need to be defined in a METS profile, and the profile must be used for generating the requirements tables to be included in the AIP specification.
Update following DILCISBoard/E-ARK-CSIP#709 needed
Requirement 14. The root directory of the package MUST contain a “submission” directory which is a container for the original submission and might eventually contain SIP updates which are submitted after the AIP was created.
I might be wrong here... but shouldn’t this be a COULD? I read before that the submission folder is optional.
Review comment (Figure 11): In METS.xml: "/submission/representations/file.odt" should be "/submission/representations/rep-001/file.odt". Same with "rep-002".
In the specification it is stated that AIP is continuously supplied with conservation metadata and this is what distinguishes the package type from SIP / DIP. In the practical archival care, we currently manage generations of AIP, where care efforts in the form of, for example, conversion to conservation format, lead to the creation of a new IP and where the relation to original IP needs to be preserved. If necessary, generations are preserved and, where appropriate, generations IP will be thrown away. There is also a need for variants of AIP (AIC /AIU). We have not studied the specifications in detail but would like to emphasize the need for support for different versions and variants of AIP.
See section 5.4.1 on page 43.
Provide information on why exactly we should care to implement this! Just add a paragraph that explains the motivation to do it.
In this document we fail to explain the main purpose of having a AIP format, which is to mitigate a potential preservation risk of institutional or repository meltdown and implement in a simple way a repository succession strategy.
Repository systems are not expected to implement this AIP format, however, they are expected to be able to generate it it in a simple way for a set of AIPs.
There might be a flaw in Figure 5 - both folders on the same level are named "binary"?
Proposed element Preservation date. Date after which disposal of AIP shall occur if PreservationStatus="DI". That is preservation shall only occur up to and including this date, i.e. package shall be retained to this date. E.g. ”2020-01-01” means that the AIP shall be destroyed (directly) after this date (or retained to this date depending on ones viewpoint.). Element type: YYYY-MM-DD. Shall be given if PreservationStatus="DI". Otherwise it is optional.
The preface image needs to be changed to the updated one.
Review comment: the directory structure is too complex. For example, representations in /AIP/submission/representations are in different level than in /AIP/representations.
In the sentence "We will explain in section 3.3.1 in more detail how the referencing of METS.xml files must be implemented if this alternative is chosen." (page 13), I don’t think this section number is correct. I’m not sure which section is the correct one.
Review comment (Page 37): Using plus sign (+) instead of colon (:) might not be a good idea, since in HTTP protocols, space character is usually encoded as plus sign. REST interfaces typically used in machine to machine communication use HTTP protocol.
The main purpose of having a AIP format is not explained sufficiently (in "Scope of this document").
Mitigate a potential preservation risk of institutional or repository meltdown and implement in a simple way a repository succession strategy.
Repository systems are not expected to implement this AIP format, however, they are expected to be able to generate it it in a simple way for a set of AIPs.
Originally raised by @jmaferreira
Review comment (Page 31): AIP-PREMIS-CHARACTERIZATION: It is stated that JHove output could be embedded, but optionally also other similar outputs from other software too?
Review comment: figures 10 and 11 seems to be very complex.
When an AIP is created during ingest, it receives an unalterable identifier, which defines the AIP as one consistent logical entity. This identifier is also used to derive the name of the physical storage container.
This is so short that hardly deserves its own section. Nonetheless, what exactly does this sentence mean?! Does it mean that we should name the folder with the same name as the identifier? If so, just give that example.
Proposed element Preservation reference. Law or other regulation that determins preservation, disposal or retention. E.g. "RA-FS 2030:12". Element type: Free text. Min 1 character. Max 255 characters.
Words "folder" vs. "directory" should be uniform. Technically, "folder" is different from "directory" (folder may be a physical or a virtual folder, whereas directory is always physical directory in a file system).
Proposed element Classification. Security classification of information in package. P=Publik (eng. public or declassified), BS=Begränsat Skyddsvärde (eng. restricted), HS=Högt Skyddsvärde (eng. confidential), EJ KLASSAT=Ej Klassat, okänd klassning (eng. unclassified, unknown). Elements type: String. Allowed values: P, BS, HS, EJ KLASSAT. "EJ KLASSAT" is default. Values could be extended to support different organisations requirements.
Requirements related to preservation metadata are not formally defined. A PREMIS profile needs to be created which allows validators to include the verification of these requirements.
The overall structure should be simplified to make it easier to implement.
The documents for the different IPs should be standardized so that they are structured in a similar way and the references should keep the same ID for the same element (CSIP7 should always be the same element in all documents).
Review comment (Page 41): Footnote 21 done incorrectly.
Good day!
I noticed that the specification published at https://earkaip.dilcis.eu/ contains the same image describing two different cases of versioning packages.
6.5. Appendix E - Naming scheme examples
Image figure 9 and image figure 10 have the identical source
"https://earkaip.dilcis.eu/figs/ditaa/ditaa_appendix_e_migration_segmented.png"
Proposed element Keywords. A list of keywords used for searching. E.g. "TMJ, BB, varumärkesintrång, värde2variabel" or "TMJ BB varumärkesintrång värde2variabel". Element type: Free text. Max 20 words separated by (space) or (comma). Max 2047 characters.
Example on page 27 for a file ID
ID77146c6c-c8c3-4406-80b5-b3b41901f9d0
This example does not follow the recommendation provided earlier. It should be
urn:uuid:77146c6c-c8c3-4406-80b5-b3b41901f9d0
Requirement 16. If the “submission” folder contains one or several sub-folders, the sub-folders MUST contain IPs.
Should it be IPs or SIPs?
Hi,
I think the E-ARK specs are really wonderful. One thing I'm trying to get my head around is that the root of the E-ARK AIP is the Intellectual Entity. I see why this makes sense from a logical sense, and even from a physical sense. However, I think that there is a potential valid reason for having a representation as the root.
for example:
So I suppose my question is, was it ever considered that a representation could be a root, and were there other reasons other than the ones I've listed that might have accounted for the physical AIP always having an IE at its root? The more i think about it, having the physical AIP reflecting the logical AIP makes a lot of sense, and it maps to the PREMIS object types quite well. My main concern is that some archives may rely on basic tape storage which don't allow for ease of updating packages, so E-ARK might be out of reach as a result.
Best,
Kieran O'Leary
National Library of Ireland
At the moment it is too focused on the E-ARK project. It should be a proper executive summary without references to E-ARK.
Review comment (Ch. 5.4.2): Revise the usage of ":" and "+", should some of the ":" characters be "+" characters.
Review comment: Updating AIP conforming e-ark specification: How about a case where AIPs exist in a tape library, but overwriting is practically impossible, since that action reduces the lifecycle of the tapes? Usually changes are saved as physically separate incrementals in this situation.
The use case is as follows:
Sometimes it is useful to resubmit a particular SIP that will replace the information placed before in a given AIP or update the representations of that AIP with additional files. For that, a repository should receive a SIP with the status UPDATE that identifies the AIP or the ID of the previous SIP submitted. The repository should be able to understand that it is an AIP UPDATE request by looking at the SIP status and identify the AIP to which the update should be performed.
In order to support this feature, the AIP spec should be updated to describe what should happen at the AIP level if an update is received via the ingest process.
@luis100 may be consulted to provide more details on current implementations.
This issue is related to DILCISBoard/E-ARK-SIP#6
Important attributes such as authorship of the document are missing.
Raised by @jmaferreira
The spec publisher uses the old figure 1 image.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.