PDF
Here There Be WARCs
· ☕ 8 min read · ✍️ Mark A. McFate
The term WARC, an abbreviation of Web ARChive, always reminds me of things like hobbits, elves, dark lords, and orcs, of course. But this post has nothing to do with those things so I need to clear my head and press on.

PDF Ingest in Digital.Grinnell
· ☕ 2 min read · ✍️ Mark A. McFate
A set of 21 PDF objects were ingested into Digital.Grinnell’s Faculty Scholarship collection using IMI on 22-July-2019; unfortunately none of these PDFs contained OCR (optical character recognition) or “text recognition” data, so none of them generated a valid FULL_TEXT datastream. FULL_TEXT datastreams are required to make PDF, and similar text content, searchable and discoverable in Digital.