How to use DROID and command line to harvest metadata
Have you ever discovered a box of floppy discs with no labels? Or labels like MKRESTHQ, written to conform to the old DOS rules of only 8 characters in a file name?
Photo source: Flickr User Blude, https://flic.kr/p/54zsYJ, “floppy disks for breakfast”
Have you ever just tried to OPEN one of the files? Gotten this unhelpful charmer?
This is where DROID and command line come in handy – even if you can’t actively open the files, you can create a listing of their contents and identify the file types.
“DROID stands for Digital Record Object Identification. It’s a free software tool developed by The UK National Archives that will help you to automatically profile a wide range of file formats. For example, it will tell you what versions you have, their age and size, and when they were last changed. It can also provide you with data to help you find duplicates.”
The files are part of a java-script assembly and needed minor tweaking the first time I tried to run them in a Windows 7 environment. DROID isn’t an out-of-the-box perfect tool, but it is worth the effort.
What else does DROID do?
“DROID scans files, collecting information about them into a profile which can later be explored, filtered, exported and reported on. Millions of files can be profiled, and many different profiles can be reported on at the same time. It will also look inside archival files (such as ‘zip’ files), and examine the files inside them too.
One of the most important functions DROID performs is to identify what format a file is written in, even if the file name extension is wrong or missing. Where possible, identifications are made beyond the broad type, down to the version level. For example, it can tell you that a document is written in a very old version (e.g. Word 6.0), not just that it is a Word document.
DROID can currently identify over 250 file formats, and this number is growing all the time. Updated format signatures are automatically downloadable from the National Archives’ PRONOM service.”
Source: official support and guidance PDF.
For this tutorial, we are going to work in three major steps:
- Copy and Quarantine your materials (ideally, on a dedicated workstation not connected to the IT network)
- Use command line to generate a text file listing the floppy contents
- Use DROID to identify the file types and generate a report
The goal is to have several fresh pieces of administrative and descriptive metadata, so that even if you can’t render the files now, you have saved as much evidence as possible for future sessions.
Why Copy and Quarantine?
When possible, do your metadata and processing work on a surrogate copy. Yes, you can skip this step, but you risk damaging your original and you risk introducing a virus into a controlled workspace. If possible, copy the files to a workstation that isn’t actively connected to the network or internet during your session. One of my SAA Workshop instructors recommended moving the items and waiting 30 days before further ingest (or workflow), to hopefully catch any malware or virus that needed an incubation period. The basics of this tutorial are not dependent on this step, but you should learn more about best practices for working with legacy media.
Generating a List of Floppy Contents
First, create a destination folder for where you want to assemble information about the disk contents. If you have an accession system, you can add it to the accession records. In my example, this will be in the folder called “Tutorial_1000_002_011”
Next, in Windows 7: Open command line.
There’s a decent trick, if you are used to using Explorer to navigate to a file. When you get to the folder, SHIFT and RIGHT CLICK on the folder, then select “Open command window here”
Make sure your destination folder is ready. Next you will be typing in a series of commands:
Here the command is:
dir /S > “F:\Tutorial_1000_002_011\floppycontents.txt”
Breaking it down:
The dir says to list the directories
The /S says to include the subfolders
The > changes the output directory
Use quotes around the F drive location are needed because there might be spaces in file names
You can name the .txt file anything you want, but here I used floppycontents.txt
Generically, the statement will type as:
dir /S > “DRIVE:\FOLDER\FILENAME.txt”
Congrats! If you don’t get an error message, you did it correctly. Now, go check the contents of the destination folder for the new floppycontents.txt file.
Close the command window. You’re done with this first part of the metadata harvest. And now you have a text file that maps out the directory and sub-directory contents of the disk.
Running DROID to Automatically Identify File Format Types (using PRONOM)
You will need to download and unzip the latest release of DROID. Here’s what’s inside:
When I first launched version 6.1.3 I needed to update the droid.bat file to work with the system I was using. You might need to make similar adjustments. Consult the very helpful Google Group.
To open DROID, click on the droid.bat file. ( If you are running Apple Mac, or Linux, doubleclick
on the file called “droid.sh”. These files can also be run directly from the command-line, instead of double-clicking on them through your Graphical User Interface. This tutorial is focusing on the Windows GUI, but there are options!)
It has a pretty blank canvas at first, but has created an untitled starter profile:
Click the ADD (+) button and select the entire file folder of interest:
Save the DROID profile. Save it to the metadata destination (accession) folder for the item, not to the quarantine area. For us, that’s the “Tutorial_1000_002_011” folder.
Click the blue START button to begin the analysis. A progress bar will be displayed at the bottom of the window.
Once finished, you will be asking it for two items: a REPORT and you will EXPORT the findings
Run and save a report:
Then click on EXPORT to save the report in the next pop-up window. Yes, they used the word EXPORT for two different functions. Save it as a PDF in your metadata folder.
The Final Step in this very basic run is to EXPORT the resulting list of file names and types to a CSV file.
At this point, you will have some very nice files to inspect!
You could start by inspecting the CSV. In this example, the file type saved on our floppy is Word Perfect version 5.1
How did it know? The PUID: the PRONOM registry Unique Identifier. We had x-fmt/394
You can run a search for every PUID to learn more.
You can clean this up CSV and save it as a different file type if needed. This is just the raw first look.
Next you can inspect the output in DROID itself – it’s a little easier to browse. Plus you can drag the columns around to re-arrange them. Click on the (+) buttons next to the file folders to expand or collapse the folders:
That’s it, you’ve explored a poorly labeled floppy disk! Now you can decide how to proceed, if you need emulators, if you should toss it. And you have a record of how you are making these decisions.
Leave a comment