Encyclopaedic Compendia    

Hullquist Central

    Ecclectic Interchange




Language Compression History

Wouldn’t it be better if a word like better could be rendered as compactly as betr? After all, there are only four strategic sounds in the word. Reducing better to betr is a 33% reduction in word length. Just imagine how many more words we could squeeze onto a page of print or into an electronic document! Considering the alarming increase in ink and paper (not to mention overburdened digital repositories) required by all the extra and unnecessary letters and syllables, an enormous savings could be enjoyed by the world’s economy from merely streamlining this ubiquitous communicant.

Newspapers could be briefer, E-mails shorter, paperback novels thinner, and billboards smaller. The cost savings in paper alone could easily offset the national debt to the tune of $20 billion a year and improve the global economy by at least twice that. Hey, we’re in the information age. It’s time for our language to come up to speed, to complement technology, not hobble it. Any effort to improve thru-put by data compression should be agressively pursued. The Holy Grail of Gl5 (pronounced Glish) orthography is not simply a consistent, nor predictable, nor phonetic (shouldn’t that be fonetic?) rendering of printed language, but achieving true readable text compression.

Many reforms have littered the political and educational landscape with well-intended schemes for the reform of English spelling. No one denies the shameful state of affairs, nor doubts the benefits that would blossom in the wake of a predictable, consistent and truly phonetic orthography (which the nobel English tongue so sadly lacks). But the motives bouyed with altruistic aims to facilitate our children’s education or assist the foreign language student will never be sufficient to offset the immense inertia of tradition and an intrenched status quo that balk at such efforts simply because ‘it just doesn’t look right.’

Mark Twain recognized this long ago when beseeching the Associated Press to promote the 1906 attempt at Simplified Spelling. "And we shall be rid of phthisis and phthisic and pneumonia and pneumatics, and diphtheria and pterodactyl, and all those other insane words which no man addicted to the simple Christian life can try to spell and not lose some of the bloom of his piety in the demoralizing attempt."

Spelling conventions for current English words are sorely weakened by the sheer multiplicity of orthographic alternatives. And entymology plays a role as well. English has adopted so many words from other languages (often retaining much of their original spelling) that rules are strewn with exceptions. Remnants of foreign influence linger on as silent or redundant letters and superfluous syllables—useless vestiges that serve no other purpose than to bloat our tongue and render it a hopeless mass of variation out of control. Or, as Mark Twain said, "grotesque to the eye and revolting to the soul."


The need to provide English with some semblence of orgnization and consistency should be considered for more than merely data compression benefits. Truely phonetic spelling would reduce the length of time required and cost imposed on educational programs worldwide. And consistency should involve more than merely spelling. Eliminating grammatical irregularities would not only facilitate the learning process but also aid in moving English toward an even more non-inflected language as well as reduce regional discrimination.

Start with consonantal variation. English sports two ways to denote the soft ‘S’ (S,C) and ‘F’ (F,PH) sounds, three alternative renderings for ‘J’ (J,G,DG) and at least five methods of creating a hard ‘C’ (C,K,CK,CH,Q). Many consonants are at one time or another rendered silent (and thus redundant). Take, for instance, the GH in through, the S in island, the B in lamb, the P and S in corps. And the state of vowel usage is really quite dismal. We still tolerate  bus, busy, womb , and women . 

There is much to applaud the vast and colorful English vocabulary. English has accommodated with amazing verbal adroitness thousands of additional words—borrowed, adopted and outright stolen from the rest of the world’s linguistic community. The ever evolving resourcefulness of the language has invested it with a wealth of terminology and beauty of expression that serves equally well the demands of technology and the passions of poetry. But there is, at least in some quarters, a need for a much slimmer version of the language. With the arrival of the Information Age a new, more pressing rationale has emerged for entertaining, once again, that ancient dream of spelling reform.  

Spelling Pioneers

There have been numerous attempts to reform the English tongue. The language’s archaic orthography, which has remained virtually unchanged since the 17th century, has drawn the most fire. The following information comes courtesy of Cornell Kimball and his fine web page, drawing mostly from Ken Ives' "Written Dialects and Spelling Reform" (1979) and Abraham Tauber's "Spelling Reform in the United States."

Benjamin Franklin, Noah Webster, Theodore Roosevelt, and Andrew Carnege have all championed the cause.

Noah Webster’s 1806 American Dictionary made the most profound effect of establishing the significant spelling differences that exist to this day between British and American orthography. Webster is responsible for rendering ‘joal’ as our current ‘jail’, removing the ‘u’ from colour and honour, and reversing the ‘re’ in centre and theatre. He wanted to remove the silent ‘e’ from such words as ‘give’, and use the double ‘ee’ digliph for all long ‘e’ sounding words such as ‘read’ and ‘leave’. Sadly, purists of the time objected too strongly to allow his efforts to prevail. 

In 1876, the American Philological Association promoted the use of ar catalog definit gard giv hav infinit liv tho thru wisht. This same year the International Convention for the Amendment of English Orthography was held in Philadelphia, during the Centennial Exposition. Later this organization evolved into the Spelling Reform Association. A burst of reform associations emerged over the next few years, among them the British Spelling Reform Association, the American Philological Association, the National Education Association. Additional nominations for improved spelling now included altho thruout thoro thoroly thorofare program prolog pedagog decalog.

The Simplified Spelling Board was founded in the U.S. in 1906 and its sister, the Simplified Spelling Society, appeared two years later in the U.K. One of the American founding members was Andrew Carnegie, who pumped in more than $250,000 to promote the cause.

U.S. President Theodore Roosevelt ordered the Government Printing Office on August 27, 1906 to use the Simplified Spelling Board's 300 or so proposed spellings. The date was strategically chosen by Teddy because the U.S. Congress happened to be in recess. But the order was later revoked when Congress readjourned that fallby a vote of 142 to 24.

January 1934 the Chicago Tribune inaugurated what it called a "practical test of spelling reform" by applying its own list of 80 respelled words which included advertisment, catalog, agast, ameba, burocrat, crum, missil, subpena, bazar, hemloc, herse, intern, rime, sherif, staf, glamor, harth, iland, jaz, tarif, trafic. The list was introduced over a series of editorials finally reporting that "short spelling wins votes of readers 3 to 1." The editors chided dictionary makers for not daring to pioneer the effort. But Within five years their list had shrivled in half. A few new recruits were added such as the previous favorites tho, altho, thru, thoro and a series of 'ph' alternatives such as autograf, telegraf, philosofy, photograf, sofomore. Over the decades more and more words were dropped until only "thru" and "tho" remained in 1975 when they, too, where abandoned.

Today the only remaining vestage of the Spelling Reform Association and the Simplified Spelling Board is anorganization called the American Literacy Council. Their modern concern is now directed toward the teaching of reading and writing as well as spelling reform. 

In 1948 linguists Daniel Jones and Harold Orton proposed their New Spelling system. With the aim of making English spelling more phonetic, the resulting changes rendered the appearance of written English uncomfortably foreign. For example:

   “Dhe langgwej wood be impruuvd bie dhe adopshon of nue speling for wurds” 

This was an attempt to use a consistent phonetic application. Thus ‘dh’ for voiced ‘th’, ‘sh’ instead of ‘ti’, ‘ie’ to indicate the long i sound. But why is ‘uu’ used for long ‘u’ in impruuvd but rendered ‘ue’ in new? And no attempt was made to keep ‘o’ consistent. It appears as short ‘o’ and short ‘u’ in adopshon and long ‘o’ in for. Then the usage of ‘e’: long in be but short in langgwej and speling. But aside from the spelling implications, the result did nothing for shrinking the size of written communication.

More spelling systems

Spanglish uses letter doubling to change stress and vowel usage. It is summarized at http://www.unifon.org/alfa-saxon.html

Founded in 1978, the Better Education thru Simplified Spelling orgnization favors simply the use of tho, thru, and possibly hav as an initial effort to buck established convention. The second wave against orthographic inertia would focus on legitimatizing the popular lite and nite. 

Australians proposed introducing a series of limited changes beginning in 1984. Their initial nominees were: hed, fotograf, caut, cof, and (once again) giv. These illustrated two basic principles: use of consistent symbols of phonemes, and elimination of silent letters. Except for caut (why not cot?) and fotograf (everyone already uses foto), the improvement in word length was nearly optimal.

This effort was followed by a more comprehensive scheme known as Cut Spelling appearing in 1992. It continues to be supported by the SSS (www.les.aston.ac.uk/sss) .

Cut Spelng:Esy readng for continuity

One first notices that one can imediatly read CS quite esily without even noing th rules of th systm. Since most words ar unchanjed and few letrs substituted, one has th impression of norml ritn english with a lot of od slips, rathr than of a totaly new riting systm. The esential cor of words, the leters that identify them, is rarely afectd, so that ther is a hy levl of compatibility between th old and new spelngs. This is esential for the gradul introduction of any spelng reform, as ther must be no risk of a brekdown of ritn comunication between th jenrations educated in th old and th new systms. CS represents not a radicl upheval, but rather a streamlining, a trimng away of many of those featurs of traditionl english spelng wich dislocate th smooth opration of th alfabetic principl of regulr sound-symbl corespondnce.

Several concepts are exciting: remove all double consonants (like letrs, esential, spelng). Well, almost all. Notice impression. 

Cut Spelng does delete most unecessary vowels (as in th, norml, esily, systm, ritn, rathr, levl, cor), and apply consonants fairly consistently (jenrations, unchanjed, alfabetic). But this, too, is a disappointment. The applications are still applied inconsistently. In one occurance ‘letters’ appears as letrs, and in another as leters. Probably an oversight, but ‘the’ is rendered the in two places instead of the preperred th. And why isn’t ‘-ing’ (as in streamlining) always treated as efficiently as it is in trimng and spelng? We are treated to a very nice compact afectd, but are still left with unchanjed, substituted, educated, rather. 

We still have to deal with the uncertainty of how to spell with the letter ‘e’.

sometimes it is short (impression, them, identify, afectd, levl, spelng),
sometimes long (esily, imediatly, few, new, be, upheval),
sometimes used with ‘a’ (stream, read, featurs),
sometimes doubled (between),
sometimes silent (one, quite, rules, since, those).

CS advocates promote its space saving ‘advantajs’ in that it is “som 10% shortr than traditionl spelng. This has sevrl importnt advantajs. To begin with, it saves time and trubl for evryone involvd in producing ritn text, from scoolchildren to publishrs, from novlists to advrtisers, from secretris to grafic desynrs.

Why is this CS example so timid in achieving even more impressive space reduction by applying its rules more consistently? If we can benefit by the concise versions of sevrl and importnt why not producng, scoolchildrn, and advrtisrs? And if we can use ‘y’ to replace ‘ig’ in desynrs why not capitalize on similar savings for words like tym ?

It is understandable why so much of English’s inconsistency is retained by CS. The emphasis on keeping the appearance of English for the sake of bridging future generations was probably irresistable. But CS generally scores high on quick acceptance and introduces the concept that alternate spellings are not flaws of education because, as W.C. Fields once observed, “I have no respect for anyone who can spell a word only one way.”

Cut Spelng proposes four basic rules:

Eliminate silent letters

For example: have, through, who, yacht, herb/honest, psychology, pneumonia, would, debt, scene, treasure, friend, people, build, etc.

Eliminate unstressed vowels before l,m,n,r,d

For example: exampl, chapl, centr, entr, randm, persistnt, curtn, fashn, litl, watrd, bedd, submitd, edbl.

Eliminate double consonants

For example: letr, betr, ritn, mitn, hapn, clapn, omitd, travld.

Eliminate redundant variations

Replace   gh, ph with f           ruf, cof, laf, fon, graf, fiziks,
                 g/dg with j             ej, juj, jem, jeni, jorj, ajust, aj
                 igh with y              sy, syt, hy, hyt, thy, ryt, lyt, myt

Other than these, it’s just a capricious as English in predicting just how a word will be spelled.

Gl5 would add to these concepts: vowel consistency and economy thru an even more radical limnation of superfulous or non-critical syllables. Something along the line of Dutton’s Speedwords.


In the late 1940s Reginald J. G. Dutton created a constructed language called Speedwords as a candidate for an international auxiliary language derived primarilly from English. But it has over the years found greater utility as an effective stenographic system as it uses only normal Roman alphabetic letters with no unique symbols.

Speedwords was designed around Zipf's Law, an observation that frequently-used words tend to be short words. 

Dutton claimed that his Speedwords were "logically and methodically built up from Professor Ernest Horn's remarkable analysis of the frequency of occurrence of all words. The Iowa University philologist and his staff examined and tabulated 15,000,000 running words of all classes of written and printed matter. The very-high-frequency words tabulated by Professor Horn are expressed in Dutton Speedwords by single alphabetic letters standing alone. The next highest in his order of frequency are alloted two-letter speedwords, and so on..."

Dutton further intended that a carefully chosen small vocabulary of basic concepts (or as he called them, "semantic primitives") could be compounded to express virtually any idea within semantic space. After a study of Roget’s original 1000 thesaurus categories Dutton surmised that "Only 493 one-, two-, or three-letter word-roots have to be memorised." Thus, Speedwords attempted to achieve two goals: make the most common morphemes as brief as possible, and cover all of semantic space with the fewest possible morphemes.

An illustration of Speedwords’ compact expression capabilities is shown here:

E 3 le ir f v = (There) are three letters here for you.

               Be 3 letters hir for vous.

Dutton drew from other languages than English, as illustrated here by the French vous. His system also imposed its own (often arbitrary and highly idiomatic) grammar upon the user. One interesting aspect of this was his ambitious use of single-letter affixes. Nearly every letter was assigned some modifying quality.

-a     unfavorable     pro = promise, proa = threaten (unfavorable promise)
-b     possibility         kre = believe, kreb = credible (maybe believe)
-c     collective         on = man,   onc = communitee (collection of people)

Single Letter Words (both upper and lower case) were also employed:

a      at, to, toward
b      but
c      this, these {French ce}
d      of, from {French de}

Speedwords quickly begins to appear largely alien to standard English. It is, in many respects, a divergent language with new word forms and structural mechanisms. It does, however, demonstrate several significant compression techniques that can be adaptable to a more immediately readable format.


GL5, in its primary goal of attaining the highest possible text compression ratios, shares several aspects of the Speedwords model. Single letter words are also applied to full advantage producing similar looking sentences of extreme brevity:

           izpm Bdtym   =   It is past my bedtime.

GL5 diverges from Speedwords, however, in a few significant areas:
n       Only English roots are used
n       Root abbreviation, though aggressive, is controlled to avoid ambiguity
n       Phonetic orthography is consistently applied
n       No radical changes from standard English grammar are proposed
n       Standard symbols are used to supplement the limited English character set


In order to achieve maximum data compression, GL5 incorporates several techniques to achieve innate reductions in word length and increases in thru-put:
n       minimal grapheme-to-phoneme ratios,
n       syllable contractions and frank omissions,
n       single letter word extensions, and
n       space character "elimination"

GL5 Breviations