Documentation

Export Content from InDesign

Follow this procedure to extract content from InDesign. For files using the WFDW, export to XML and process files in the Digital Hub. For files created outside of the workflow, export to IDTT for manual conversion.

Export XML from WFDW Typesets

Install or Update Scribe Tools for InDesign.

For files typeset using the Well-Formed Document Workflow, use the Export XML tool.

Vet the InDesign File

Duplicate the InDesign file. Do not perform this procedure on live files. The export procedure will change the InDesign file in fundamental ways. Perform the entire procedure on each InDesign file individually.

Confirm the InDesign file is compliant with WFDW requirements.

  • Use the InDesign Checks to review the document.
  • Note whether the file contains images or embedded footnotes.
  • Compare with the print PDF. There should be no differences caused by reflow (due to missing fonts, for example).
  • Check that all fonts are active.
  • Select Scribe Tools > Styles > Report All Style Issues. Resolve any style construction issues prior to exporting content.
  • Check for conditional text (other than the structure condition).
  • Check that images are not masked.

Export XML with Scribe Tools

Select Scribe Tools > Export XML > Run All - Export XML. Use this option in most circumstances for checking a typeset, outputting XML to process to ScML, or round-tripping to a Microsoft Word document.

In some circumstances, a variation of this full export may be needed. Options in this menu include the following:

  • Run All - Export XML
  • Run All - Export XML without Page Indicators
  • Run All - Export XML (text only)
  • Run All - Export XML (text only) without Page Indicators
  • Run All - Export XML Custom

The custom option allows for a user to select certain options to exclude. This is typically used when diagnosing a troubleshooting issue or working with a corrupted file.

After starting “Run All,” the process cannot be stopped. As an alternative, each tool can be run individually.

See the Scribe Tools for InDesign documentation for a description of each export tool.

Review Scribe Tool Export Alerts

When the export tool is finished running, a window will appear indicating Run All - Export XML Report. Use the dropdown menu for further details.

Note: A common issue in this report will be ALERT: Insert Page Indicators. Page indicators cannot be added by the tools in certain circumstances. The pages that were not numbered will be listed, so the page IDs can be added as needed when reviewing and preparing the sam file.

Convert Using the Digital Hub

Upload the XML file to the Digital Hub.

Process the XML file to create a .sam file. Download the .sam file and use it to rearrange text or remove unwanted content as needed.

Note: The Digital Hub does not provide stats or file analysis on files exported from InDesign (with the extension .xml). Convert the XML file to ePub 3 to produce sam, ScML, and ePub files. Each of these files can be used to review stats and alerts.

Review and Adjust the .sam File

Use the Digital Hub and Sublime Text Checks as indicated in the Typesetting documentation to identify any potential errors.

When preparing a file

  • Confirm all content has come through properly.
  • If any images should be moved to different locations because static print page limitations are no longer a factor, adjust the image callouts at this time.
  • If long descriptions are needed, add them to the .sam file.

Note: Do not comment out any content, such as printer information, at this time. Anything that has been commented out will not be included in the conversion to other formats.

Export IDTT from non-WFDW Typesets

Vet the non-WFDW InDesign File

Duplicate the InDesign file. Do not perform this procedure on live files. The export procedure will change the InDesign file in fundamental ways. Perform the entire procedure on each InDesign file individually.

  • Note whether the file contains images or embedded footnotes.
  • Compare with the print PDF. There should be no differences caused by reflow (due to missing fonts, for example).
  • Check that all fonts are active.
  • Check for conditional text (other than the structure condition).
  • Check that images are not masked.

Export IDTT with Scribe Tools

Select Scribe Tools > Export IDTT > Run All.

In some circumstances, a variation of this full export may be needed. Options in this menu include the following:

  • Run All - Export IDTT
  • Run All - Export IDTT without Page Indicators
  • Run All - Export IDTT Custom

Note: The IDTT export does not anchor or provide any information about images used in the typeset.

After starting “Run All,” the process cannot be stopped. As an alternative, each tool can be run individually.

See the Scribe Tools for InDesign documentation for a description of each export tool.

Convert IDTT to .sam

Merge Files

Merge all IDTT files into a single text file with a .sam extension.

Place content in order.

Note: In some cases, it may be most efficient to place content into approximate locations and then review the content for final order later in the process, after all extraneous tags have been removed.

Remove Metadata

Delete file setup information.

Find: <(ASCII|Version|Define)[^\n]*\n
Replace with: NOTHING

Find: <FILENAME[^\n]*\n
Replace with: NOTHING

Remove Control Characters

Remove all control characters (e.g., ESC, BEL, BS). These will have a shaded background.

Find: [^ -~\n\t]
Replace with: NOTHING
or: SPACE

Named Entities

Replace ampersand, less than, and greater than characters with named entities.

Find: &
Replace with: &amp;

Find: \\<
Replace with: &lt;

Find: \\>
Replace with: &gt;

Line Breaks

Note: Some of the following searches may need to be run again at different stages during conversion.

Add Placeholder Style Name

Add “nostyle” as a placeholder paragraph style name.

Find: (<ParaStyle:)(>)
Replace with: \1nostyle\2

Place Paragraphs on New Lines

Find: ([^\n])(<ParaStyle:)
Replace with: \1\n\2

Remove Empty Lines

Repeat the following until there are no more results:

Find: \n\n
Replace with: \n

Move Closing Tags to the Ends of Lines

Find: \n(<[^>]*:>)
Replace with: \1

Remove Unnecessary InDesign Tags

Note: The following searches may be modified if an aspect can be used to determine where an ScML style should be used. For example, “Skew” may be useful for identifying italics, or “TextAlignment” may indicate poetry. Do not delete any tag that may contain vital style information until the appropriate ScML style has been applied.

Remove unnecessary character rendering tags.

Find: <c[^>]*(Leading|Kerning|Tracking|Spacing|Size|Ligatures|OTF|Skew|Language|Baseline)[^>]*>
Replace with: NOTHING

Find: <c(Bouten|Kent(en)|Shatai|Tatech?u|Tsume|Wari(chu)|Hindi|StrokeGradient|NextXChars)([^>]*>)
Replace with: NOTHING

Remove unnecessary paragraph rendering tags.

Find: <p[^>]*(Space|TabRuler|KeepwithNext|Auto|Hyphen)[^>]*>
Replace with: NOTHING

Remove unnecessary paragraph styles representing blank lines.

Find: ^<ParaStyle:[^>]*>[ \t]*\n
Replace with: NOTHING

Remove unnecessary hyperlink tags.

Find: <Hyperlink:=(<[^>]*>)*>
Replace with: NOTHING

Remove unnecessary text alignment tags.

Find: <pTextAlignment[^>]*>
Replace with: NOTHING

Convert Characters to Unicode Entities

Search for the following and determine the best “replace” option based on context.

Typesetting Spaces and Manual Breaks

Search for typesetting spaces.

Find: <0x200[0-9A-F]>
Replace with: NOTHING
or: SPACE

Search for soft hyphens.

Find: <0x00AD>
Replace with: NOTHING

Search for manual line breaks.

Find: <0x000A>
Replace with: \n
or: SPACE
or: NOTHING

Entity Format

Change remaining characters to hexadecimal entity format.

Find: <0(x[A-F0-9]+)>
Replace with: &#\1;

Note: Characters in hexadecimal entity format will be converted to their corresponding Unicode characters when processed to other file formats through the Digital Hub.

Convert Character Styles

Construct searches based on what is found in order to apply the appropriate ScML character styles.

Note: The same rendering may be applied to elements that require different ScML styles.

Note: In some cases, the content between two tags may be complex and not fit the regular expressions listed. As needed, consider deleting unnecessary opening and closing tags using two separate searches to first remove the opening tag and then the closing tag.

<cPosition

Find: <cPosition

Example:

Find: <cPosition:Superscript>([A-z\d\- \.\&\#;]+)<cPosition:>
Replace with: <enref>\1</enref>
or: <fnref>\1</fnref>
or: <sup>\1</sup>

<c

Find: <c

Example:

Find: <cTypeface:Italic>([^<]*)<cTypeface:>
Replace with: <i>\1</i>

Page IDs

Convert page IDs to self-closing tags.

Single page IDs:

Find: <CharStyle:page>\{~\?~PG: @([\da-z]+)@\}<CharStyle:>
Replace with: <page id="p\1"/>

Adjacent page IDs:

Find: <CharStyle:page>\{~\?~PG: @([\da-z]+)@\}\{~\?~PG: @([\da-z]+)@\}<CharStyle:>
Replace with: <page id="p\1"/><page id="p\2"/>

Search for any remaining page IDs.

Find: \{~\?~PG:

<CharStyle:

Find: <CharStyle:

Example:

Find: <CharStyle:Italic>([^<]*)<CharStyle:>
Replace with: <i>\1</i>

<cCase:

Find: <cCase:Small Caps>([^<]*)<cCase:>
Replace with: <sm>\1</sm>

Find: <cCase:All Caps>([^<]*)<cCase:>
Replace with: \U\1

Remove Stray Closing Character Style Tags

After ScML character styles have been applied, remove any remaining closing character style tags.

Find: <cTypeface:>|<CharStyle:>
Replace with: NOTHING

Convert Paragraph Styles

Construct searches based on what is found in order to apply the appropriate ScML paragraph styles.

Note: At this time, do not scribe spacing variations (f, l, s, or o) unless the existing styles in the file provide a 1-to-1 correspondence. Identify only the structural aspects of the paragraphs. Articulation can be added at a later step, either through the refiner or when converting the .sam file to .scml at a later stage by enabling the Articulate Spacing Distinctions setting in the Digital Hub.

<Para

Find: <Para

Example:

Find: <ParaStyle:Chapter Title>([^\n]*)$
Replace with: <ct>\1</ct>

Remaining InDesign Tags

Search for remaining InDesign tags. Replace the tag with the appropriate ScML style or delete it.

Find: <[^>]*:[^>]*>

Images

Place callouts for images in the appropriate locations.

<fig><img src="imagename.jpg alt="[alt text here]"/></fig>

Note: If a logo image is part of the title page, scribe it as bkpub (or bkpub1, if necessary) rather than fig. If a logo image is part of the copyright page, scribe it as crtf (or a different crt style, if necessary) rather than fig.

See the Short Description Text (Alt Text) for more information about adding alt text, including suggested alt text for standard elements.

Structure Indicators

Place structure indicators.

Example:

<structure>{~?~ST: begin chapter}</structure>

and

<structure>{~?~ST: end chapter}</structure>

sam Tags and Validation

Add sam Tags and DOCTYPE Declaration

Add the following text to the beginning of the file.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://www.scribeproduction.com/datafiles/dtd/scml.css" type="text/css"?>
<!DOCTYPE sam PUBLIC "-//Scribe Inc.//DTD sam v1.3.0//EN" "http://scml.scribenet.com/dtds/current/sam.dtd">
<sam>

Add the following text to the end of the file.

</sam>

Validation, Review, and QC

Validate the file.

To validate, set up Sublime Text as indicated here and use the validation options under Build > XML: DTD Validation. Upload the file to the Digital Hub and address any listed errors or alerts.

Note: The Sublime Text Checks work best with Unicode characters in place, rather than hexadecimal formatting. To change hexadecimal entities to single unicode characters, process the .sam file to .docx in the Digital Hub. This is also an opportunity to refine the file.

Review the file using the Scribing QC Checklist and process the file to ScML to perform text checks.

Apply changes in the .docx or .scml file as required. The document type will vary based on the requirements of the project.

  • A Word document should be produced when the next step will be copyediting or author review/revision.
  • An ScML file should be produced when preparing for conversion to ePub or another digital format.