Published
Vint Cerf, universally recognized as an American Internet pioneer, recently warned of a potential digital Dark Age. Underlying this specter is the issue of accessibility, or digital storage and retrieval, defined in technological terms rather than in terms of human functionality. The Dark Age he has in mind pertains to having all sorts of content stored somewhere but being unable to access it (i.e., to open and manipulate content saved in files or stored on servers or computers). Although Cerf is thinking ahead a thousand years or more, an electronic or digital Dark Age is immediately relevant to publishers and how they create, produce, deliver, store, and access their content. Because computer technology rapidly becomes obsolete and with obsolescence comes inaccessibility, the prospect of this impending Dark Age affects virtually everyone from individual consumers; to students, academics, and researchers; to libraries, organizations, and governments.
We create content using computerized devices such as personal computers, cell phones, tablets, scanners, cameras, and the like. For that hardware to work, it requires an operating system, part of which involves code that provides a user with an interface. Our ability to create content within the operating system depends on the user working with software running in that operating system, such as a word processor, voice recognition or recording software, or any program that enables the user to create or import content. Once the content is created, we normally save it as a file for storage, access, and manipulation. Then, if we engage in best practices, we store the file for archival and security purposes. These pieces of technology operate in concert right now, because the content, software, hardware, and infrastructure are all current; but future access won’t be possible unless all the different pieces of technology, or some replacement for them, are available and functional when we attempt to access the file. Accessing content in the future requires that we go to the media where the file is archived, copy the file to a computer, and open the file with software that will read it.
This can be problematic if the hardware cannot read the media, a system cannot mount, or software cannot read the file. Technology is great when it works, but. . . . Eventually, each type of computer technology will no longer be supported, possibly resulting in inaccessible files. Consider, for example, the storage formats from the recent past, many of which (e.g., a 5.25-inch floppy disk, zip drive) are practically inaccessible today. Even though we may possess the media on which we saved a file, we won’t be able to access that file unless we can insert it into a compliant drive that can be read within an operating system. Without all the necessary pieces being available and working properly, we lose access to the content.
Imagine discovering a time capsule that contains a 5.25-inch floppy disk. Have you actually seen any 5.25-inch floppy disks around lately? Whatever happened to the hundreds of thousands sold per year in the 1990s? How about a 5.25-inch floppy drive? Computers lost that appendage fifteen years ago. And if you had a floppy drive, would your computer have an input jack to accommodate its cable? The list of hurdles is daunting. Even those who have maintained legacy systems in order to avoid these problems have difficulty due to equipment failures.
The life cycle of our technology forewarns us of an electronic Dark Age. The advent of this scary future is sooner than we might think, since, on average, computers last about three to four years. Software is maintained longer than hardware, but even so, updates and new releases supersede the older versions, eventually rendering early versions obsolete and inoperable. With the passage of time, it becomes ever more difficult for a software company to maintain backward compatibility because of changes in hardware and the software, code, or programming languages on which their software depends, as well as changes in standards and so forth. In almost every circumstance, programmers must stop supporting earlier versions of software. The forces working against successfully accessing content in legacy files are powerful and become even more formidable with the passage of time.
“We have to ask ourselves, how do we preserve all the bits that we need in order to correctly interpret the digital objects we create?” Cerf asked in the February 2015 edition of Engineering and Technology Magazine. This is precisely the problem that all sectors of the publishing industry face: authors and content creators, publishers, distributors, libraries, consumers, and not least of all those who produce books, journals, websites, and any other form of communication. Preserving is one thing, but accessing and correctly interpreting the content present another challenge altogether. Stockholders may be focused on this quarter’s dividend, but everyone involved in producing content must account for future access to their products, as well as an immediate profit. It is in the best interests of the publishing industry to take a long view and develop and implement strategies to ensure the future accessibility of content.
If the goal were to sustain ongoing accessibility to content alone, one could store materials as text files based on ASCII code. Materials saved in this fashion possess the greatest degree of universal accessibility, but what ASCII gives us in accessibility it takes away in formatting, because it has no structural indicators. Even though we can open such a file, we may not be able to present the content in any meaningful or useful way. To overcome this weakness, we would need to employ an ASCII-based markup language to introduce structure tags. At our disposal right now are XML, HTML, and SGML. XML is a schema or method of marking up content in a regular and extensible fashion (i.e., it can stretch like a rubber band). It allows for entities to be represented in a simplified way, thus solving the accessibility issue. A code set that is independent and not subject to redundancy is necessary because redundancy introduces ambiguity and errors. Ideally, XML is paired with a document type definition (DTD) for validation, but XML doesn’t require an external method to interpret the markup. What makes XML such an attractive preventative to an electronic Dark Age are (a) that you can control its application and (b) that it allows not only for storage but for future granularity. An XML source file can be accessed, interpreted, and used in the future, thus avoiding a Dark Age. HTML, in its various flavors, becomes redundant and is therefore a less attractive option. SGML, a precursor to both XML and HTML, if not well formed, can fail because the hierarchy can vary (i.e., by breaking nesting rules, it may be impossible to present the contents of an SGML file in a way that people can read).
When content conforms to an XML schema, it is accessible because it is both portable and interoperable. A file saved as valid and correct XML can be mapped to another XML schema or an entirely different form of markup because it strictly adheres to the standard set forth for the XML. The beauty of XML is the fundamental premise on which it is built, which is that XML will change and likely be superseded by some other form of markup better suited to future technologies. By presuming its own obsolescence, XML lays the groundwork for being the most efficient and effective means of serving the publishing industry now while helping it sustain accessibility over the long haul. The dreaded Dark Age that Cerf warns us about may come, but XML is a hedge against complete darkness.
Granted, the media on which XML may be stored is as susceptible to the pitfalls of a Dark Age as any other technology. However, XML is a markup schema embedded into the text, not an ancillary system that future users would need to penetrate in order to access the content. Formatting content in XML makes that content independent of the devices on which it is created. Being independent of the hardware and software increases its level of accessibility on a technological level. Another aspect of XML that makes it less prone to go dark is that in the publishing realm, content and the coding necessary to convey formatting and structural decisions must be interpolated—that is, someone must fill in the blanks between the known and unknown data (i.e., estimate or approximate) in a reliable and consistent way. XML improves the odds of doing so efficiently and reliably so that content created with one piece of software running within a given operating system on a particular machine can be opened and displayed properly in another environment. Using XML, we can achieve this today, and in the distant future, this same XML could be employed to do so in whatever technological environment we find ourselves.
The genius of implementing an XML workflow to capitalize on the power of XML is not only its potential future payoff in terms of accessibility but also the tangible immediate benefits it brings to a publisher. An XML workflow envisions the full array of content types required by publishers to allow them to publish whatever they choose, however they wish, in whatever forms they decide are marketable and profitable in the dizzying array of available sales channels. It improves the efficiency of the production process, which infuses other areas of the publishing chain with new efficiencies. Marketing, publicity, sales, royalty, acquisitions, and other departments all benefit from an XML workflow, as do the editorial and production departments. These benefits show up incrementally but measurably as reduced time for each of the tasks involved in producing a book or journal. Less time translates into shorter production schedules and the ability to get products to market much faster than in the past. Less time shows up as lower unit costs on the P&L of any given season’s list. The efficiencies of an XML workflow pay off in accruing archival XML source files that can be stored, searched (i.e., discovered), accessed, reformatted, and quickly and easily repurposed. These files and the architecture in which they are stored change both the discoverability landscape and the revenue possibilities resulting from increased discoverability. Aside from these present and tangible benefits, an XML workflow contributes to the greater good of society by promoting communication and the exchange of ideas simply through increased accessibility.
We ignore the threat of an electronic Dark Age at our own peril. The best preparation for such an epoch, and possibly the best way to prevent it, is currently to create and store content as XML. Although XML comes with the usual challenges of implementation, an XML workflow is a proven path to present prosperity and future success as a publisher. XML and an XML workflow are fairly simple concepts. The hard part is imposing the discipline on ourselves to adhere to the rules required by XML, which is a human rather than a technological issue. But as with all human endeavors, this one is worth pursuing because the alternatives are not the least bit attractive.