Metadata

By David Alan Rech of Scribe Inc.

Published

The publishing industry seems to be confused about metadata. Metadata is commonly perceived as ancillary information necessary for marketing and distributing books. Consider ONIX (Online Information Exchange) for Books, for example, which, according to the Book Industry Study Group (BISG), is “the international standard for representing and communicating book industry product information in electronic form.” Publishers provide ONIX data about their titles because it facilitates book sales. From this perspective, it is easy to understand how metadata has come to be considered something that exists outside of a publication. Without thinking about it, we arrive at the conclusion that metadata must be housed in marketing or financial systems, whereas books themselves are considered separate entities.

Although we may be aware of metadata in a vague way, how many of us actually give metadata much critical thought? Those responsible for preparing electronic books do, the rest of us should, and this is why: metadata is not only ancillary information but also part of the very fabric of what a book is, how it is produced, and how we communicate about it.

Defining Metadata

It is important to understand the definition and purpose of metadata. Technically speaking, “metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information” (American National Standards Institute [ANSI]). Note the verbs describing the purpose of metadata: describes; explains; locates; and makes it easier to retrieve, use, or manage. Given this definition, metadata becomes all the information, both external and internal, we can apply to a book to make it and its contents easier to locate.

In its external manifestation, metadata is ancillary information outside of the publication. We find this in the example of ONIX, as well as other information required for distribution, marketing catalogs, publicity, and even keeping track of a book’s production schedule. Most of us can easily grasp and recognize these forms of metadata.

Metadata in its internal form (i.e., part and parcel to the book itself) may be less obvious to us. Internal metadata is information that helps with structure, design, and communication of the essence, meaning, and subject of the book. For example, books are broken into chapters to gather the information about and discussion of a single subject in one place or to break up a longer narrative into shorter segments or episodes. Over the centuries, different methods were used to indicate the beginning of a new chapter: using a drop cap, starting the text on a new recto, inserting a numeral, or including a description of the material to follow. Page through a book today and you will probably find either chapter numbers alone (e.g., in novels) or numbers and descriptive chapter titles intended to communicate the subject or topic of the chapter accurately and clearly to the reader. While we may think of chapter titles as the content or data of the book, they are actually internal metadata about the book’s content, complemented by design choices to indicate to the reader that this starts a new section within the book. Not only do chapter titles help a reader know what a book discusses, but they also help the reader find where a given topic is discussed within the book. After all, a reader usually first sees the chapter titles of a book in the table of contents. In addition to chapter titles and the table of contents, another example of internal metadata is the index. Clearly, the index is not the book; removing the index would not alter the book’s content. But indexes are included in books to help the reader find discussions, examples, images, topics, and so on throughout the book.

Producing Metadata

Metadata should not be viewed as an afterthought. When we create and produce books with the understanding that metadata is part of the very fabric of a book, we realize the significance of how we produce these important data.

Metadata should be considered an integral part of the editorial and production processes. The earlier we pay attention to how metadata will be used, the better we can decide what metadata to introduce and where. If we expand our notion of the editorial process to include thinking about how customers will ultimately find the book and what is in it, then we situate metadata creation at the headwaters of the publishing chain.

Making metadata part of the editorial process requires editorial consistency, which involves editing for grammar, syntax, and metadata (i.e., structuring content so it can be found). The practices to enhance metadata work best when we make consistent editorial choices, when we use consistent structure and tagging, and when we realize that editorial decisions shape metadata.

Creating metadata is only part of the picture, however. In addition, metadata is part of the production process. Metadata tells us how to move things from one format to another. We can develop automated processes to create books in every format, but it all begins with consistency in the editorial process.

Consistent editorial standards: Good, consistent copyediting is key to keeping readers engaged in our books. I am of the opinion that there is no substitute for good copyediting. It eliminates distractions, makes a book easier to read and comprehend, and leads readers to engage with the book. Adding to that need is the method by which computer searching takes place. Computer algorithms work by employing consistent rule sets to data. If your content is edited in a consistent fashion, then it is more likely to be discovered by search algorithms. Employing consistent standards is not only important for a good read, but it can help our materials be discovered.

Chapter titles: Titles of chapters should clearly indicate the subject to be discussed. While catchy titles may be fun to read, it is important that they clearly indicate the subject so that searching will locate the material. If a chapter discusses a topic, then the most prominent words should be used in the title.

Heads: Just like titles, heads should be consistent and phrased to increase access. In addition, they should be used frequently (this becomes part of the metadata), as they help search engines crawl through your data. Heads also may make it easier to navigate an e-book, where spatial page locations cannot be communicated. Heads allow books to be better structured for access that is determined through algorithms, as well as normal human means.

Images and captions: Images should have clear, descriptive captions. If an image is included for pedagogical or referential reasons, then those should be articulated in its caption. It helps to refer to each image somewhere in the text (e.g., “see Figure 3.1”). We suggest that internal mentions of images should be linked within e-books and as part of web PDFs.

Indexes: Indexes allow us to find our way into a book. If an electronic book includes an index, the index locators should be linked to the text. Ideally, they should be linked to the exact phrase in the book (we will describe modern indexing techniques in an upcoming newsletter). It also helps if we use consistent index terminology for similar subjects. One way to achieve this is to employ a style sheet for indexing and assert a consistent method of indexing content. Consistent terminology across books not only increases a reader’s ability to find things within a title, but it also helps identify your publishing house with particular terms (think search engine optimization). ## Using Metadata

Editorial, production, marketing, publicity, sales, distribution, finance, and royalties: all these areas require metadata. Keeping in mind that we are talking about “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource,” it is apparent that we use metadata for many different purposes. The thread connecting all these uses is discoverability: being able to find what we are looking for.

It is instructive to consider how people find books. If we are lucky, people find the books we publish through a personal connection—because someone they know suggests it or because it is on their book club’s reading list. Books are often located through footnotes and bibliographies, lists for further reading, mentions in articles, or publicity. All the information used to locate our books is metadata.

Of course, to find books on the Internet within a content aggregator, to combine electronic books, or to have books found within a distribution system requires that metadata be electronic. But as search engines become more refined and the methods to locate books become more sophisticated, we are wise to develop a broader understanding of metadata and how to apply it.

Metadata should no longer be considered information that is appended to content. Whether on the web (yet a remote possibility) or within library or other search systems, the content of books is becoming an important way to find them. This is especially the case if books are aggregated or in “searchable” formats (e.g., PDF or ePub) or if a publisher wishes to find materials to combine into new products (i.e., backlist titles). Search engines use sophisticated algorithms that go beyond traditional metadata (e.g., title, author, subject, e-ISBN) to locate information. They employ expert references and the “natural” condition of the data (as well as other mysterious methods) to determine the most useful source of information on a topic—also known as organic search results. If you have a body of material that naturally satisfies the search and others refer to it, that increases its likelihood of being discovered.

If we want our books to be found, then would we not want to employ all means available, especially if some of those available means can come directly from the book’s content? This isn’t the time to fully describe how search engines function, but generally speaking, the “natural” condition of the data refers to how it is structured: how the titles, head levels, internal references, and other elements are arranged.

Analogous to creating good metadata in our books is the practice of search engine optimization (SEO). The juggernaut Google points out that “search engine optimization is about putting your site’s best foot forward when it comes to visibility in search engines, but your ultimate consumers are your users, not search engines.” SEO is, then, the method to improve the condition of the data so that consumers can more readily find what we have to offer. Data that use keywords, are structured well, and follow other rules are more likely to be found. And while this strictly applies to certain searching arenas at the moment, the areas where this applies are ever increasing. Best SEO practices and best publishing metadata practices overlap quite a bit.

A book that covers a particular subject should be one of the most natural and important sources of information on that subject. To improve the SEO of that book is an editorial function. As editors, we can do a lot to improve the readability and the discoverability of our books. Other uses of metadata related to books’ subjects are developing imprint identity and content chunking.

Imprint identity: Employing common nomenclature, data design, and all the other things that identify our books as ours naturally creates metadata. Recognition of the similarities of our titles, references to our publishing house, and other things related to publisher identification all function as metadata. If a house is known for publishing a particular subject, then it is more likely that a book they publish will be found through the association of subject and imprint identity.

Content chunking and other methods to group parts of books: If we try to combine publications, either topically into a new book or by associating titles dealing with similar subjects, then it is important that the books share characteristics that make them easier to locate. Similar use of terminology and structuring of content (not to mention the necessity of consistent structures and tagging if content is combined into a single source file) will help consumers find these books.

Attention to metadata can help publishers build better books, make the preparation of e-books easier, lay the groundwork for content aggregation, and help in the sales and distribution of the books we publish. The best way to do this is to properly edit books, define head structures, employ consistent terminology, index books, and make good use of descriptive language for images. Metadata need not be a mystery, and applying good metadata should be no more difficult than properly editing your books.