Attention: On 21-May-2020 an optional, but recommended, sixth step was added to this workflow in the form of a new Drush command: islandora_mods_post_processing, an addition to my previous work in islandora_mods_via_twig. See my new post, Islandora MODS Post Processing for complete details.
A 5-Step Workflow
This document is follow-up, with technical details, to Exporting, Editing, & Replacing MODS Datastreams, post 069, in my blog. In case you missed it, the aforementioned post was written specifically for metadata editors working on the 2020 Grinnell College Libraries review of Digital Grinnell MODS metadata.
Attention: This document uses a shorthand
./ in place of the frequently referenced
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/ directory. For example,
./social-justice is equivalent to the Social Justice collection sub-directory at
Briefly, the five steps in this workflow are:
Export of all
grinnell:*MODS datastreams using
drush islandora_datastream_export. This step, last performed on April 14, 2020, was responsible for creating all of the
grinnell_<PID>_MODS.xmlexports found in
Execute my Map-MODS-to-MASTER Python 3 script on iMac MA8660 to create a
mods.tsvfile for each collection, along with associated
grinnell_<PID>_MODS.remainderfiles for each object. The resultant
./<collection-PID>/mods.tsvfiles are tab-seperated-value (.tsv) files, and they are key to this process.
Edit the MODS .tsv files. Refer Exporting, Editing, & Replacing MODS Datastreams for details and guidance.
drush islandora_mods_via_twigin each ready-for-update collection to generate new .xml MODS datastream files. For a specified collection, this command will find and read the
./<collection-PID>/mods-imvt.tsvand create one
./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xmlfile for each object.
drush islandora_datastream_replacecommand once for each collection. This command will process each
./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xmlfile and replace the corresponding object’s MODS datastream with the contents of the .xml file. The digital_grinnell branch version of the
islandora_datastream_replacecommand also performs an implicit update of the object’s “Title”, a transform of the new MODS to DC (Dublin Core), and a re-indexing of the new metadata in Solr.
The remainder of this document provides technical details, frequently in the form of command lines used to build and use the aforementioned tools.
Step 1a - Installation of Drush
To help implement this process efficiently and effectively I first turned to Exporting, Editing, & Replacing MODS Datastreams, a workflow developed by the good folks at The California Historical Society. I initiated the workflow by installing two Drush tools on my local/development instance of ISLE on my Mac workstation.
The command line process in my local host/workstation terminal looked like this:
Local tests of these commands were successful so I proceeded to install them in the production instance of Digital Grinnell at dgdocker1.grinnell.edu. Before doing that I needed to change the definition of
Apache to reflect the production instance of our Apache container, like so
Created a Fork of Islandora Datastream Replace
I also chose to “fork” the islandora_datastream_replace project so that I could do a little Digital.Grinnell customization of it. The fork I’m working with is here and my work is limited to the digital_grinnell branch of that fork.
In the digital_grinnell branch I modified the behavior of the islandora_datastream_replace command so that it implicitly performs an
UpdateFromMODS operation that lives in our idu, or Islandora Drush Utilities module. The
UpdateFromMODS, performed immediately after each datastream replace operation does the following:
- Updates the object “Title”, one of its properties, to match the new value of
- Invokes the
iduF DCTransformoperation which runs the default XSLT transform of the new MODS to DC (Dublin Core) and creates a new “DC” datastream for the object.
iduF DCTransformoperation also concludes with an implicit
iduF IndexSolroperation to ensure that the new object metadata is properly indexed in Solr.
Step 1b - Installation of Drush
islandora_datastream_replace Commands in Production
To install the commands in production I opened a terminal to dgdocker1.grinnell.edu as user islandora and executed the following commands there:
Step 1c - Mounting //STORAGE to DGDocker1
Attention! This step, and some that come later, will require that the network storage path
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1 be accessible to our production instance of Digital.Grinnell. To make that possible I had to run this sequence on DGDocker1:
docker exec -it isle-apache-dg bash
mount -t cifs -o username=mcfatem /storage.grinnell.edu/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1 /mnt/metadata-review /mnt/metadata-review
Step 1d - Using Drush
islandora_datastream_export results in my local test were woefully incomplete… NONE of the child objects with a compound parent were exported. I’m still not entirely sure why child obejcts were omitted since the query I used should have captured all objects. In testing I did find that this seems to be a flaw in the islandora_datastream_export command, and specifically in its implementation of any Solr query.
Fortunately, the aforementioned command also has a SPARQL query option, and after some trial-and-error I got it to work properly. To do so I created an
export.sh bash script, shown below, and used it on dgdocker1.grinnell.edu like so:
export.sh script is:
In the case of the Digital Grinnell social-justice collection, for example, this script produced 32 .xml files, the correct number. Each collection’s set of exported .xml files can be found in the collection-specific subdirectory of
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/ and all have filenames of the form:
grinnell_<PID>_MODS.xml. Note that objects which have no MODS datastream were not exported.
Step 2 - Map-MODS-to-MASTER Python 3 Script
The Map-MODS-to-MASTER script was developed, in Python 3, on iMac MA8660 at
~/GitHub/Map-MODS-to-MASTER to facilitate generation of
mods.tsv and accompanying
.log files for each Digital Grinnell collection from the
.xml files found in subdirectories of
The Map-MODS-to-MASTER project can be found in the master branch of https://github.com/DigitalGrinnell/Map-MODS-to-MASTER. I choose to execute it using PyCharm from iMac MA8660 since the directory holding all of the
.xml files and folders is already mapped to
/Volumes/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1 on that iMac. Note that this
//STORAGE location was choosen because the
./ALLSTAFF directory, and its subordinates, are accessible to all staff in the Grinnell College Libraries.
It should not be necessary to run this script ever again…NEVER. However, if it becomes necessary to look back at this code and process, details can be found in Map-MODS-to-MASTER. Note: If it should ever become necessary to repeat the Map-MODS-to-MASTER process it might be wise to look at replacing the Python 3 script with a new Drush command, maybe
islandora_map_mods_to_master, written in PHP and installed directly into the production instance of Digital.Grinnell.
Step 3 - Editing the MODS .tsv Files
Please refer to Refer to Exporting, Editing, & Replacing MODS Datastreams, post 069 in my blog, for details and guidance.
Step 4 - Run
As each individual collection
mods-imvt.tsv file is made ready-for-update, it will be necessary to run a
drush islandora_mods_via_twig command to process the .tsv data. Running
--help with that command produces:
[islandora@dgdocker1 ~]$ docker exec -it isle-apache-dg bash root@122092fe8182:/# cd /var/www/html/sites/default/ root@122092fe8182:/var/www/html/sites/default# drush -u 1 islandora_mods_via_twig --help Generate MODS .xml files from the mods-imvt.tsv file for a specified collection. Examples: drush -u 1 islandora_mods_via_twig social-justice Process ../social-justice/mods-imvt.tsv, for example. Arguments: collection The name of the collection to be processed. Defaults to "social-justice". Aliases: imvt
So, my command sequence to run
islandora_mods_via_twig for the “Social Justice” collection, as an example, was:
[islandora@dgdocker1 ~]$ docker exec -it isle-apache-dg bash root@122092fe8182:/# cd /var/www/html/sites/default/ root@122092fe8182:/var/www/html/sites/default# drush -u 1 islandora_mods_via_twig social-justice
islandora_mods_via_twig command is run, it processes the corresponding
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/mods-imvt.tsv file and creates one
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xml file for each object.
Step 5 - Run
The whole point of this entire process is to get us back to this point with a set of reviewed and modified .xml files in a
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/<collection-PID>/ready-for-datastream-replace/ collection-specific subdirectory so that we can replace existing object MODS datastreams with new data, and we use the
drush islandora_datastream_replace command to do this.
--help for the aformentioned command produced this:
root@122092fe8182:/var/www/html/sites/default# drush -u 1 islandora_datastream_replace --help Replaces a datastream in all objects given a file list in a directory. Examples: drush -u 1 islandora_datastream_replace --source=/mnt/metadata-review/social-justice/ready-for-datastream-replace --dsid=MODS --namespace=grinnell Replacing MODS datastream for objects in --source using the digital_grinnell branch of code. Options: --dsid The datastream id of the datastream. Required. --namespace The namespace of the pids. Required. --source The directory to get the datastreams and pid# from. Required. Aliases: idre
It’s worth noting that this command looks for any files named MODS in whatever ABSOLUTE directory is named with the
--source parameter. The command shown below was executed inside the Apache container, isle-apache-dg, on node DGDocker1, in order to process Digital Grinnell‘s social-justice collection.
root@122092fe8182:drush -u 1 islandora_datastream_replace --source=/mnt/metadata-review/social-justice/ready-for-datastream-replace --dsid=MODS --namespace=grinnell
The same command could have been executed directly from node DGDocker1 like so:
docker exec isle-apache-dg drush -u 1 -w /var/www/html/sites/default drush -u 1 islandora_datastream_replace --source=mnt/metadata-review/social-justice/ready-for-datastream-replace --dsid=MODS --namespace=grinnell
And that’s a wrap. Until next time, stay safe and wash your hands! 😄