How hidden characters in your Microsoft Word manuscript can mess up your book

What’s hiding in your Microsoft Word manuscript file? It may look fine — but then, when you submit it to the publisher, the results are poor.

If you’re used to using Word to create finished-looking documents, you need to adjust your perspective. You’re now using Word to tell the publisher’s page designers and page layout professionals what you want — and the hidden characters within the file are sending unintended messages.

So let’s look inside your Microsoft Word file for hidden characters.

First, click on the “Show/Hide ¶” setting in the Home ribbon on Word; alternatively type Ctrl-Shift-8 on a PC or Cmd-Shift-8 on a Mac.

What will you see?

Two kinds of spaces

When you look at a chunk of text, you’ll see all the normal spaces indicated by blue dots in the middle of the line. But you may also see mysterious spaces indicated by blue degree symbols like the ones after “with” and “too” in this sample:

What are those? They’re nonbreaking spaces. They tend to get into files when you copy material from other sources, especially web sites and PDF files, and paste it into Word.

When importing a file from Word into a page layout program like Adobe InDesign, normal spaces are well-behaved and don’t generate unexpected results. Non-breaking spaces may behave in unexpected ways. It’s wise to replace them with normal spaces.

This is also a good time to identify places where you’ve got two or more spaces in a row. Two spaces after a period will cause extra spacing in the page layout — search and replace to remove them (and get out of the two-spaces-after-a-sentence habit you learned in typing class in the ’70s or ’80s).

You may also see spaces used for indents or to align text, as shown below:

How will this lay out when imported into a design program? Totally randomly. Using spaces to arrange things on a page marks you as an amateur. Replace it with a Microsoft Word table.

Dots and dashes

Typography has all sorts of dashes: hyphens, en-dashes, and em-dashes (that is, – or – or —). Authors often sprinkle them randomly throughout the manuscript without any care as to which kind of dash they are intending to use.

When you want a hyphen (for example, “5-pound sack”), just type the hyphen key (next to the zero). When you want a dash in between words, just type two hyphens; Word will convert it to an em-dash, and even when it doesn’t, page layout programs will make the same conversion. En-dashes are unusual, appearing in ranges (pages 45–47), again, the page layout program will usually take care of it.

If you type a bunch of periods in a row, Word will sometimes automatically convert it to an ellipsis character … 

Even though this looks like three periods, it’s actually one character. If you use more than three periods, I have no idea what will happen . . . and you shouldn’t want to find out.

From a language perspective, you’re better off writing without ellipses or bunches of dots. I also edit most manuscripts to remove about two-thirds of the dashes, replacing most of them with commas and semicolons. This prevents your manuscript from looking like a bunch of paragraphs with rivers of space going through them.

Paragraph marks, tabs, and line breaks

Pay close attention to what’s happening at the beginnings and ends of lines and between paragraphs. You may see blue arrows pointing right (tabs), blue arrows pointing left (hard returns, or line breaks), and paragraph marks, also known as pilcrows.

Page layout programs generally ignore tabs, because there’s no simple way to know what you intended with them. Indented paragraphs in a layout program happen because of a global style setting for the page design. If you like the look of indented paragraphs in your Word manuscript, set a paragraph indent for the “Normal” style in Microsoft Word; don’t type a tab at the beginning of each paragraph.

People type hard returns (Shift-Enter) to force line breaks in paragraphs. But when the pages are laid out, the lines will end in different places, so these hard returns will break lines at odd places. Delete them.

The paragraph breaks are fine, so long as there’s only one per paragraph. If you’re using them (or hard returns) to add extra space between paragraphs, the page layout person will just have to strip them out again — and might easily miss a few. If you want extra space between your paragraphs in Word, just adjust the Normal style to include space after each paragraph.

Use Word’s tables, not tabs

Word has table features; use them.

Here’s what a table looks like with the invisible characters turned on:

When the page layout professional imports this into InDesign, the result is a table, as it should be. None of the fancy stuff you can do with Word tables, like shading or differing line weights, will transfer over, but the basic table structure will. You can also put anything you want in those cells, like bulleted lists or multiple paragraphs, and then importing it will allow the page layout person to adjust the table content and size to fit the page size and design.

Microsoft Word also supports tabs for tables. Don’t do tables like this. It will create nothing a huge mess in the page layout.

Beware section breaks, page breaks, and lines (rules)

You might decide to put page breaks into your Word file. This might help you keep paragraphs together, or start chapters on a new page.

Here’s how those look when you reveal the hidden characters:

You can also include section breaks, for example, to switch between one- and two-column layouts:

Since the page breaks in your book’s page layout will be in different places from those in the Word file, these aren’t good ways to signal page breaks. As for the page breaks for the start of new chapters, if you use Word styles to format chapter titles with a page break before, they’ll automatically start on a new page.

If you type several underscores, Word will create a rule (a line) for you.

That little lightning bolt will allow you turn this feature off. Which is a good idea, because you rarely want rules like this unless they’re part of the book’s page design.

Index entries

There’s one more hidden bit of content you might spot in a Microsoft Word file. It looks like this:

That text between the curly braces is an index entry. It doesn’t appear in the actual text and any layout program will ignore it. (I’ll explain how you can use MS Word’s index features for books in a future post.)

Word is fine for authoring, if you use it properly

Lots of authors use Word, and it’s pretty close to a standard for delivery of manuscripts to publishers. But you need to keep in mind that Word has two purposes: to create good-looking text, and to hold and make it easy to share what you’ve created with publishing professionals. If you are using it for the latter purpose, you’ll want to become familiar with what’s going on inside your Word file by occasionally clicking on that paragraph symbol in the ribbon. That’s one key part of what it takes to turn in a clean manuscript that will maximize your chances of getting a good looking set of book pages at the end of the process.

3 responses to “How hidden characters in your Microsoft Word manuscript can mess up your book

  1. Wow! This is great information! I’ll start to look for these issues in my own reports and writing. (and yes, I admit to using the double-space after the period)

  2. I have two thoughts here. To start this all just solidifies my opinion that the default (Mac & Windows!) of paste-with-format should really not be the default. This behavior should require a modifier key, not the other way round like it is now. Aside, Word is evil for breaking Shift+Command+V to force paste plain-text on MacOS! Secondly, I wonder if Latex will ever gain popularity with the non-tech crowd? There are now pretty good WYSIWYG editors that generate markup for the writer and even online collaborative tools which do the same. Would this be better or just cause other problems?

