Attention: On 21-May-2020 an optional, but recommended, sixth step was added to this workflow in the form of a new Drush command: islandora_mods_post_processing, an addition to my previous work in islandora_mods_via_twig. See my new post, Islandora MODS Post Processing for complete details.
The transition to distance learning, social distancing, and more remote work at Grinnell College in the wake of the COVID-19 pandemic may afford GC Libraries an opportunity to do some overdue and necessary metadata cleaning in Digital.Grinnell.
A 5-Step Workflow
This turned out to be a much more difficult undertaking than I imagined, but as of mid-April, 2020, I have a 5-step workflow that actually works. This post will introduce all five steps, but only provides details for Step 3, Editing a MODS TSV File, the portion that metadata editors need to be most aware of. All technical details, as well as steps 1, 2, 4 and 5, will be addressed in Exporting, Editing, & Replacing MODS Datastreams: Technical Details.
Attention: This document uses a shorthand
./ in place of the frequently referenced
//STORAGE/LIBRARY/ALLSTAFF/DG-Metadata-Review-2020-r1/ directory. For example,
./social-justice is equivalent to the Social Justice collection sub-directory at
The five steps are:
Export of all
grinnell:*MODS datastreams using
drush islandora_datastream_export. This step, last performed on April 14, 2020, was responsible for creating all of the
grinnell_<PID>_MODS.xmlexports found in
Execute my Map-MODS-to-MASTER Python 3 script on iMac MA8660 to create a
mods.tsvfile for each collection, along with associated
grinnell_<PID>_MODS.remainderfiles for each object. The resultant
./<collection-PID>/mods.tsvfiles are tab-seperated-value (.tsv) files, and they are key to this process.
Edit the MODS .tsv files. Refer to the dedicated section below for details and guidance.
drush islandora_mods_via_twigin each ready-for-update collection to generate new .xml MODS datastream files. For a specified collection, this command will find and read the
./<collection-PID>/mods-imvt.tsvand create one
./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xmlfile for each object.
drush islandora_datastream_replacecommand once for each collection. This command will process each
./<collection-PID>/ready-for-datastream-replace/grinnell_<PID>_MODS.xmlfile and replace the corresponding object’s MODS datastream with the contents of the .xml file. The
digital_grinnellbranch version of the
islandora_datastream_replacecommand also performs an implicit update of the object’s “Title”, a transform of the new MODS to DC (Dublin Core), and a re-indexing of the new metadata in Solr.
Editing a mods.tsv File
Creating or editing metadata can be a monumental task, and doing it effectively can demand a wealth of knowledge and experience working with metadata standards and practices. This step in our workflow is easily the most labor-intensive. The goal of this project is largely to present metadata editors with a form, in this case the
mods.tsv or tab-seperated-value file, to make consistent editing of metadata possible. In addition to the
mods.tsv file the workflow will rely on guidance and conventions that are documented in the Metadata Clean-up tab of the Digital_Grinnell_MODS_Master worksheet.
A metadata editor should focus on only one collection at a time. The suggested practice for working through one collection is as follows:
Find the collection’s
mods.tsvfile using the collection’s persistent identifier, or PID. For example, the target .tsv file for Digital.Grinnell‘s “Social Justice” collection, with a PID equal to
social-justice, will be found in
mods.tsvfile, preferably to your local workstation, and optionally give it a new name, like
Open the Metadata Clean-up tab of the Digital_Grinnell_MODS_Master worksheet in a browser so that you have guidance available at all times.
- Note that if you find yourself repeating very cumbersome changes while you edit, please consider taking notes in the _Metadata Clean-up_ tab and email [firstname.lastname@example.org](mailto://digital.grinnell.edu) with any questions or concerns you may have about the process or the guidance.
Open your copy of the .tsv file in Excel, Numbers, or any .csv or .tsv capable worksheet editor.
Edit the data. We suggest doing so on a column-by-column, one column at at time where possible. You are likely to find that the values in a single column may be similar from row-to-row, as it should be. You may also find it possible to do large-scale find/replace in a column. For example, many of our records may list a corporate (organization) name as a “~ Supporting host”, but the proper form of that term is “~ supporting host”, in all lowercase, and you might save time by doing a find/replace operation to make all such changes.
When you are done editing, be sure to save your work AND export the data back into a new .tsv file specifically named
mods-imvt.tsv. Note that “-imvt” stands for
islandora_mods_via_twig, the command that will subsequently used in the next step of our workflow.
Save a copy of your
Email email@example.com to let us know that you have a collection ready for processing, and be sure to provide the collection-PID in your email.
Expect a follow-up email from firstname.lastname@example.org in one or two days. After the metadata has been processed we may ask you to review some of the changes to make sure they appear correctly in Digital.Grinnell.
And that’s a wrap. Until next time, thank you for your attention to our metadata! 😄