Use the Digital Hub and regular expressions to review indexes.
This procedure is designed for projects created using ScML styles and the Well-Formed Document Workflow. Resolving potential issues as early as possible is recommended, as the consequences of having errors go to print or reach the ebook stage can be significant. At best, it means doing extra work to resolve the issue. At worst, it can mean that errors go to print or that an ebook is suppressed by a distributor.
References/Prerequisites
Vet or QC the Index
Check that the index is fundamentally suited to the book. Requirements and expectations should be determined prior to creating and checking an index.
- If an index was created in-house, use the Index QC Checklist to review the index for compliance with all requirements.
- If an index was created by an author, review the file to determine if it contains issues that need to be addressed before proceeding. This could include a lack of expected entries or obvious formatting issues.
Procedure
1. Export XML from the typeset.
The typesetter exports XML from the typeset and converts the file to the .sam (Scribe Abbreviated Markup) file type in the Digital Hub.
2. Convert the Word .docx for the index to .sam.
Convert the Word .docx file for the index to a .sam file.
Copy the index into the main .sam file and save it. (Do not copy the <sam> tags, just the index itself, and place it at the end of the main .sam file, before the closing </sam> tag.)
3. Convert the combined .sam file to .scml.
Upload the combined .sam file to the Digital Hub and convert it to .scml (Scribe Markup Language).
It is through this conversion of .sam to .scml that the index entries are automatically linked based on known text patterns. The Digital Hub creates the links between the numbers listed in the index and the page IDs found in the book as well as linking “see” references to the corresponding index entry. For more information, see the Index Linking documentation.
4. Search for unlinked content.
Use the Regular Expressions resource page to search for unlinked content.
Recommended: copy the index into a separate file to run these searches.
Run the searches listed under Index Section (.scml files).
Note: If a formatting error is found, update the .sam file and reprocess to .scml to determine that the linking issue resolves.
( [\d]+)
This search will find a space in front of any digit.
In a linked index, the page numbers should be surrounded by <xref> tags, so there should be no spaces in front of page numbers listed in an index.
A number will not link if:
- The formatting is incorrect. This could be due to extra/missing spaces or the use of incorrect punctuation. (Fix the error.)
- The page number does not appear in the book. (Determine the correct page number.)
- The number is not a page number. (No action needed.)
([^A-Za-z])([Ss])ee([^a-z])(.*)
With “match case” turned off, this will find the word “see” and whatever follows.
Run a “find-all” and copy all the results into a new file.
Review the results to see if everything has <xref> tags.
Text will not link if:
- The reference is a general statement like “See also specific battles by name” that does not point to a specific entry. (No action needed.)
- The reference text does not match the text of a main entry. This often happens when the main entry includes a parenthetical while the reference does not. (The <xref> tags will need to be added manually when producing the .scml before processing to ebook formats.)
- The reference (or its corresponding main entry) is misspelled. (Determine the error and resolve.)
- The reference indicates a main entry not found in the index. (Determine the error and resolve.)
5. Check for other common errors.
In addition to functional linking issues, check the index for other common issues.
Punctuation
Search for commas and periods on either side of a closing italic tag (</i>). Check that the handling of punctuation in the index is the same as the body of the book.
Alphabetization
Check that the index entries (including subentries) appear in proper alphabetical order.