THIS
JUST IN
to B or not to b
capitalization and its discontents: why does my
word processor upper-case Zoloft but not paxil?
FORTUNE
Tuesday, August 12, 2003
By Roger Parloff
like many people, I don't use capital letters when I type e-mail. but
when I got a new computer a few months ago, it had Microsoft software that
automatically capitalizes the first letters of some words. (I'm using it
now.) early on, I noticed some oddities. I was writing an e-mail to a
friend about the campaign-financing scandals of the Clinton administration,
and I referred to a very peripheral figure named Pauline konchanalak, whose
last name I inadvertently misspelled. on second reference, I happened, with
equal inadvertence, to spell her name correctly. but this time the surname
popped up as Kanchanalak! Microsoft knew to capitalize Kanchanalak and yet
not konchanalak!
soon I was noticing other peculiarities. for instance, most
over-the-counter drugs were capitalized, like Excedrin and Tylenol, but
prescription drugs were much harder to predict. thus, Claritin is up, but
celebrex is down. Zoloft is up, prozac and paxil down. does Microsoft ask
Pfizer or merck to pay for brand-name recognition? was there an upper-case
shakedown going on?
given names also held surprises. why Karen and Sharon, but not nancy or
mary? Stephen, but not steven?
or consider these shockers: Muhammad, Mohamed, Buddha, and Confucius are
up, but not allah, jesus, or moses!
it got worse. all these policies were unstable over time! capitalization
practices changed even as I experimented with them. in fact, I've had to
manually capitalize several words in this story because they've lost their
capacity to do it themselves. had I worn them out? stephen and zoloft and
microsoft itself no longer perform for me! even viagra is spent.
it occurred to me that many of the anomalies had something to do with
word length. the longer the word, the more likely it was to be capitalized.
four-letter words were always down, as far as I could tell. five-letter
words, on the other hand, seemed to be right on the fulcrum. most were
down—like jesus, moses, and allah—yet there were exceptions, like Xerox and
Karen. maybe there was something special about the letters 'x' and 'k' that
threw such words into a different category. I eagerly tested my new theory,
but with bitterly disappointing results. kafka, kadar, kemal, xhosa, and
hoxha. yet Kodak, Exxon, and Akaka. meanwhile, unaccountably exalted outliers
sprang from my control group: Helen, Miami, Judah.
had microsoft considered the repercussions of meting out all these
preferences and slights? leaving allah down while honoring Exxon—was that
prudent?
I called microsoft and spoke with simon marks, the 30-year-old,
London-born product manager for the microsoft office division. light
streamed in, and order was restored, as marks opened my eyes to the
structure of microsoftian capitalism.
marks was a gentleman too: even as he dashed my pathetically wrongheaded
hypotheses, he bucked up my self-esteem. as I had so keenly picked up, he
explained, the lengths of words were 'absolutely key.' and yet there was
nothing determinative about the number of letters that any word contained.
'let's take a step back,' he suggested. capitalization was just a narrow
aspect of the broader function performed by the spell-checking software, he
explained. when microsoft's spell-check notices a word it doesn't
recognize, it regards it as a possible mistyping. but it does not presume
to automatically correct anything unless it feels very confident that it
knows what was intended. so in most cases, spell-check merely alerts the
reader to an array of possibilities, by underlining the putatively mistyped
word in red. when I type moses, for instance, spell-check puts a red
squiggly line beneath the uncapitalized prophet's name. (how could I have
written a whole piece on this subject and failed to notice the red squiggly
lines?) if I then pursue the matter further in the tools menu, I discover
spell-check's ample grounds for hesitation; for all it knows, I may be
trying to type mosses, moss, Moses, muses, moseys, modes, musses, muss, or
mossy! only when the spell-checker's algorithms develop a much higher
degree of certainty about what I am trying to say would it dare to
'auto-correct' me.
the instability I thought I had observed simply reflected my having
inadvertently toggled off the auto-correct feature for certain words
whenever, in the course of my research, I used the backspace key in a
certain manner. (even marks wasn't sure why the auto-correct wasn't
resuming for those words when I rebooted, as it was supposed to.)
as for the miraculous recognition of Kanchanalak, marks explained that
microsoft is always updating the spell-check lexicon to keep up with words
that are in common current usage. Kanchanalak had been in the news in about
2000, when the lexicon for my 2002 version of microsoft word was being
compiled. the surname might not be recognized by earlier lexicons—or even later
ones. though the lexicon is continually revised, it is not continually
expanded. rather, it is maintained at a ceiling of about 200,000 words. if
it becomes too inclusive—recognizing obscure words rather than interpreting
them as likely mistypings—it becomes less useful for the majority of users.
similarly, the seemingly willy-nilly capitalization of drug brand names
was determined by the popularity of those brands at the time my lexicon was
being compiled, together with the usual issues posed by the resemblance of
the brand name to other possibly intended words. microsoft certainly
doesn't ask companies to pay for capitalization, marks noted, taking no
offense.
and so it was that in the space of about ten minutes, marks righted my
orthographically toppling world. overarching, benevolent algorithms brought
harmony and meaning to it all.
except, of course, the part about how to toggle the auto-correct back on
for stephen and zoloft and microsoft. but marks said he'd have a tech guy
get back to me on that.
|