Hauptinhaltsblöcke
Abschnittsübersicht
-
-
"Trigger events": How to ensure the content will be accessible even if a journal discontinues?
One of the goals of long-term archiving is to ensure that content remains accessible even when its publisher can no longer provide it (e.g., a journal was discontinued or the hosting of its website is no longer paid). If the content is stored in a "dark archive", it can be re-published after a "trigger event" occurs.The following aspects should be checked for a journal's content to remain findable and accessible if the journal ceases publication or encounters other major changes and challenges:
- Under which licence was the content publised?
This is important to determine whether the preservation service has the right to republish the content. This should be declared in a contract or by the use of a relevant Creative Commons licence. The same is applicable for repositories: Are they allowed to give this content to a preservation service if they use one? And may the content be republished? - Does the preservation service publish/display the archived content?
This could, e.g. include a presentation infrastructure. Here, it may be of interest what the access modalities are.
For example, the German National Library archives web content and OA journals, but these can only be consulted offline in the library and are not fully listed in its catalogue. So for OA journals archived there it can be advisable to find an additional service to increase accessibility and findability of content in case of discontinuation. - How is a "trigger event" defined?
It is recommended to create a workflow for informing the preservation service about potential challenges and for the period until the republication of the content. (Check downloadable Table in Part 2 for the biggest preservation services). - In the event that a journal or a publisher has ceased publication, who remains in contact with a digital preservation service?
In order for the preservation service to publish the content, it is essential that the journal manager (or the responsible person according to the workflow) contacts it after the discontinuation of the journal. For PKP PN, the process was described in (Sprout & Jordan, 2018, p. 249) as follows: “the trigger event can be notification by the journal manager or cessation of deposits into the PN after a period of inactivity. If a potential trigger event is detected, PKP staff will contact the journal to confirm its publication status, and if it is confirmed, approve the importing of preserved content into a special OJS instance – an “access node” operated by the PN where it will be openly accessible”. For CLOCKSS, “content must be unavailable for 6 months before it is triggered. It is important to observe that this delay is mandatory only when content is being triggered without the consent of the publisher. In all cases so far content has been triggered at the request of the publisher, and the technical process described below has taken 2-4 weeks.” (CLOCKSS: Extracting Triggered Content - CLOCKSS Trusted Digital Repository Documents, n.d.). The final sentence also emphasises the significance of communication with the publisher. Furthermore, the discontinuation of a title and its unavailability on the web are not straightforward to determine, necessitating additional work. This is not easily accomplished automatically (Laakso et al., 2021 ; Lightfoot, 2016), so that the publisher should not rely on a preservation service to determine it without being contacted. Hence, even seemingly mondane steps should not be overlooked, like to keep in mind if there is a possibility that the discontinuation of a journal could result in a journal manager changing their job position and/or contact email. In this case, it is vital to leave an alternative contact address that would remain operational even in such an event, so that the preservation service could continue communicating with a journal’s representative and publish their content.
- Under which licence was the content publised?
-
Let others archive and use your OA content!
Before going into greater detail on issues to keep in mind for digital archiving in the following section, this section explains easy steps in which a journal may increase the coverage, archivability and findability of its content:- Use persistent identifiers - not only are they critical for discoverability, but they are also used for metadata crawling by many of the major databases that could promote your content and store copies of it. For example, content with Crossref DOIs is harvested by OpenAlex and Dimensions, and can also be crawled and stored by the Internet Archive. OpenAlex preserves metadata years after journals cease publication. It is also gaining popularity in the bibliometric community, which can be beneficial for your journal's ranking.
- Publish a self-archiving policy on the journal's website and in Sherpa Romeo
- Encourage your authors to deposit copies of their work in repositories
-
How can publishers make the content more archivable?
This section addresses aspects which influence sucessful long-term archiving which also depends on the data quality of the content that is being archived. Unfortunately, some files are damaged upon arrival at a digital preservation service. Format conversion and file transfer are common points of failure as well. These issues are best addressed at the ealiest of the process. It is, therefore, advisable that editors take action to ensure their data is archivable and to avoid content loss. In the absence of a preservation service, this is of particular importance, as the files are stored in an unmonitored environment and the content producers may be unaware of the issues.Basic preconditions for archiving digital content
The first steps to ensure sucessful long-term archiving include:
- Ensure that documents are not password protected
- Provide metadata on the journal and the article level
- Use known file formats where possible
- Use established tools for format conversion and content creation so that your files are of good technical quality and can be saved, opened and migrated to another format
File format validation
Format validation is an automated process that ensures that a file is of a certain standard and technical quality. It answers, for instance, the following questions:
- Is this file really of a format it seems to be?
- Can the file be opened and read?
- Can the file be correctly converrted into another format? (E.g. if you are only planning to start offering a new format like XML, you may want to convert your files later. So it can be relevant even if you do not use any preservation service.)
- Are all the fonts embedded so that they will work in another computer environment?
As the formats are different, different tools may be used for file validation. For PDF files, for example, these tools may be made use of :
- pdfcpu
- open tools from the German National Library of Science and Technology (TIB): Pre-Ingest Analyzer ; Format Validator
- PdfInfo, ExifTool etc
Checksum
Files may also be damaged during upload or transfer. When you transfer large packages of files, there is a chance that not all the files will be transferred due to a technical error. One way to check this is with a checksum - a sequence of numbers and letters generated to confirm that a file has not been altered in any way. It is worth creating checksums to check the technical integrity of files, especially if you have workflows that involve many files. Some softwares already include this feature for uploading files. There are specialised programs for this, but a command line script may also be employed. One of the ways to regularly make checksums is using Total Commander. If you are using checksums, it is recommendable to display them in the frontend. In that way, users may check them themselves when downloading files from your site.
Supplements and enhanced formats
It is important to check whether the data is complete and whether all metadata, supplements, images, etc. are included in the publication package.
If you are using enhanced data, be aware that this presents a major challenge for interoperability, readability and archivability, and is best addressed at the production stage. For instance, it is advisable to not depend on external content to remain permanently accessible. Instead, it is necessary to ascertain that the publication has all the material needed, even if the original source of the content (for example, a video on YouTube embedded via iframe) is no longer available. If this applies for you, consider the Guidelines for Preserving New Forms of Scholarship (Greenberg et al., 2021) for more information.
If you would like to learn more about checking the quality of content to be archived, take these ressources into account:
Guidelines for Preserving New Forms of Scholarship (Greenberg et al., 2021)
Digital Preservation Handbook: Creating digital materials
An overview of potential challenges is provided in a video Preservation of New Forms of Scholarship about the enhanced content and the methods employed by preservation services to ensure the long-term viability of such content (Millman, 2020)
Recommendations from Internet Archive on optimising content for web crawling
-