Archive for the ‘semantics and linguistics’ Category

Linguistic and Typographical Conventions

Tuesday, August 11th, 2009

My own grammatical and typographical conventions have fascinated me for some time. When writing, from the most trivial and the most significant, I am dogmatically and painfully conscious of every sentence, every word, every thought, and every comma I determine (joking, joking!) choose to use. Sometimes, when I deliberately revise my conventions, I rewrite pieces of ancient, obsolete text of mine, regardless of the shame and self‐pity I feel for doing so.

Although this post’s presence indicates otherwise, I wrote this for myself: reviewing my conventions will hopefully discourage (and hopefully hinder) me from changing my conventions so often. Its presence here is of secondary concern, but hopefully it will provoke discussion among readers—at least the non‐verbal, internal kind.

When writing, I do…

Capitalization

  • Capitalize honorifics.
  • Capitalize possessives and pronouns of veneration.
  • Capitalize doctrines, philosophies, and Platonic ideals.
  • Capitalize names of genera.
  • Capitalize names of regions that are proper nouns.
  • Not capitalize directions.
  • Not capitalize names of species.
  • Not capitalize names of regions that are not proper nouns.

Acronyms and abbreviations

  • Not use periods or spaces to delimit letters in acronyms or initialisms.
  • In non-web mediums, include the acronym form in parenthesis when introducing an acronym or initialism.
  • In web-based text, use <acronym> tags to introduce acronyms and initialisms at every occurrence.
  • Not use apostrophes in non-possessive plural acronyms.
  • Generally avoid abbreviations. When using an abbreviation, I always append a fullstop.

Quotations

      Use block quotations for quotations that exceed three lines.
    • Indent block quotations with one tabstop.
    • Not surround block quotations with quotation marks.
    • Introduce block quotations with colons, not horizontal bars (U+2015).
    • Use square brackets for editorial marks.
    • Use ‘[sic]’ to indicate mistakes that are reproduced verbatim.
    • Replace confusing text with editorial enclosed in square brackets.
    • Replace unintelligible speech with ellipses enclosed in square brackets.
    • Alter the typography where necessary to assist readability and consistency.
    • Alternate between double quotation marks and single quotation marks for nested quotations.
    • Enclose punctuation within quotation marks only if it is part of the quotation (“the British way”).

    Parenthesis

    • Avoid nested parenthesis.
    • For nested parenthesis, use spaces between consecutive opening parenthesis and consecutive closing parenthesis.
    • Enclose punctuation within parenthesis only if it is part of the parenthesized idea.

    Commas

    • Use commas wherever grammatically required.
    • Not use commas wherever stylistically appealing.
    • Not splice with commas.
    • Use serial commas except where their use would cause ambiguity. For example, I would use ‘He showed his friends, John, and Mary’ but ‘He showed his friend, John and Mary’.

    Semicolons

    • Use semicolons to separate full sentences that continue a single thought.
    • Use semicolons as a lower-precedence delimiter for lists.
    • Avoid using multiple semicolons in a single sentence.

    Colons

    • Use colons to introduce lists.
    • Use colons to explain, prove, or refine the preceding thought.
    • Avoid using multiple semicolons in a single sentence.

    Hyphens

    • Use hyphens to join prefixes and suffixes with a word.
    • In digital mediums, always use U+2010 instead of the deprecated, ambiguous U+002D.
    • Use hyphens to distinguish between homographs.
    • Use hyphens to form compound modifiers, except where the first operand is an -ly adverb.
    • Use hanging hyphens on both prefixes and suffixes.
    • In handwritten text, use hyphens to indicate that words are continued on following lines.

    En dash

    • Use non‐spaced en dashes to indicate (closed) ranges.
    • Use non‐spaced en dashes to indicate relationships between two things.
    • Use en dashes with single appended spaces as a replacement for hyphens when one operand includes a space. For example, I would use “pro– San Diego doctrine”. Does anyone have a better alternative?
    • Avoid using en dashes in compound modifiers when a comma would suffice.

    Em dashes

    • Use non‐spaced em dashes to enclose parenthetical but equally important information.
    • Use non‐spaced em dashes to provide stylistic and tone-providing sharp breaks in flow.

    Slashes

    • Prefer other punctuation over slashes.
    • In informal writing, use ‘and/or’ where applicable to remove ambiguity.

    Punctuation

    • In informal writing, use exclamation marks, question marks, and interrobangs.
    • In informal writing, Use parenthesized exclamation marks and question marks in the middle of sentences.

    Figures

    • Present numbers as words only when they can be pronounced in fewer than four syllables and written in less than one word.
    • Not provide information to distinguish between the long and short systems.
    • Prefer using ‘USD’ or other unambiguous symbols of currency units over the generic ‘$’ and ‘¤’ except when previous text makes the meaning of ‘$’ or ‘¤’ obvious.
    • Always prefer SI and other prescribed standards over other systems.
    • Use scientific notation (using superscript forms) for large numbers and Kunth’s up-arrow notation for extremely large, known numbers when precision is required.
    • Abbreviate units in scientific and informal writing.
    • Generally use SI prefixes but may present figure forms in scientific notation as alternatives to extremely high and uncommon prefixes.
    • Use ISO 8601 for dates.
    • In digital mediums, use digit dashes (U+2102) instead of hyphens in text that contains numbers.

    Formatting

    • In digital mediums, italicize programming keywords, computer commands, hash sums, foreign text, and other non-English text. When referring to the text as text, I do not use single quotes (I normally would).
    • In digital mediums, use italicized text for emphasis.
    • In informal text in digital mediums, use bold text for serious emphasis.
    • In digital mediums, not underline text.
    • In hand-written text, underline titles of creative works.

    Other grammar

    • Start sentences with ‘But’ and ‘And’ when doing so enhances readability.
    • Not confuse ‘which’ and ‘that’.
    • Always use commas for non-essential clauses and never use commas for essential clauses.

    • Use who and whom correctly.
    • Use data as plural and datum as singular, radius and singular and radii as plural, cactus as singular and cacti as plural, and so on.
    • Choose not to split infinitives except in some circumstances.
    • End sentences with prepositions where doing so is clearer.
    • Not use ‘they’ as a replacement for ‘he or she’.
    • Prefer comparative and superlative adverbs and adjectives over ‘more’ and ‘most’.
    • When using comparative adverbs and adjectives, include both operands of the comparison.

    Spelling

    • Use -ize over -ise, -yze over -yse, -er over -re, and -xion over -ction (“the American way” ¹).
    • Now use -or over -our.
    • Prefer standard -ed endings for irregular verbs (“the American way”).

    Miscellaneous

    • Enclose words and other characters in single quotes to refer to the word itself or characters themselves.
    • In digital mediums, always use correct Unicode character identities if possible, including for en dashes (U+2013), em dashes (U+2014), figure dashes (U+2012), swung dashes (U+2053), soldi (U+2044 ²), ellipsis (U+2026), single quotation marks (U+2018 and U+2019), double quotation marks (U+201C and U+201D), prime marks (U+2032, U+2033, U+2034, and U+2057), and minus signs (U+2212).
    • When referring to the plurality of a single character (as text), italicize the character and append ‘-s’. For example, I would write “multiple a-s”.
    • Use contractions only in some informal text.
    • Use IPA where helpful.
    • Not invent words and other dittononsensibilities.
    • Not use second-person in cases where third-person is more clear.
    • Use Harvard‐style citation.
    • Use footnotes as supplemental notes for non‐normative information.

    ¹ With regard to ‐ize and ‐ise and ‐yze and ‐yse and with note to common ignorance, the “American way” predates the “British way”, and the “British way” only became the “British way” during the last century. French influence and disdain for American conventions are two potential explanations for the switch.
    ² Unicode incorrectly assigned the solidus to be “FRACTION SLASH U+2044” and the slash to be “SOLIDUS U+002F”.

A Kilobyte is 1,000 Bytes

Friday, November 14th, 2008

I become annoyed whenever I hear people complain about hard drive manufacturers’ alleged misuse of units and greed. They are not misusing units; they are correct. Let us examine:

kilometre = 1000 metres
kilobyte = 1000 bytes

The confusion is propagated by the JEDEC Memory Standards’s 100B.01, which attempts to redefine the SI—and indeed metric—systems to use base two (after over 400 years)—or to add a definition using the same prefixes. It defines the following ¹ ²:

Digital Storage Capacity Serial Transfer Rates
kilo: 10241 kilo: 10001
mega: 10242 mega: 10002
giga: 10243 giga: 10003

I think that is moronic. It convinced most consumers, who became convinced hard drive manufacturers warp the definitions with greedy intentions. IEC 60027 has been an international standard since 1998 and is more rational:

Metric IEC
kilo: 10001 kibi: 10241
mega: 10002 mebi: 10241
giga: 10003 gibi: 10243
tera: 10004 tebi: 10244
peta: 10005 pebi: 10245
exa: 10006 exbi: 10246
zetta: 10007 zebi: 10247
yotta: 10008 yobi: 10248

It is that simple. JEDEC Memory Standards intends to change a standard to fit a misconception—trying to reduce confusion by increasing ambiguity and appealing to gullibility. SI is older, as is IEC 60027; more importantly, they do not contradict each other; and more importantly, they make sense. I favour IEC 60027.
¹ Although I used the expanded prefixes for contrast and simplicity, note that JEDEC 100B.01 discourages that and prefers using 80MB, 24GB, etc.
² I wanted to read more about JEDEC 100B.01, but the definition costs 71USD See the publication list.

Phone Number Formatting

Wednesday, October 22nd, 2008

Recently, I have been noticing the use of periods as group separators in phone numbers in the United States. I do not know the development’s reason but list four ideas here:

  1. Supporters think it is a standard outside of the US. It is not. US-ens frequently do not adhere to standards, but if periods became popular because they think it is a standard yet continue to use inches and feet, then I am lost for words.
  2. Supporters think it stands out more. Such changes with stylistic basis and disregard for semantics is common in advertisements (note the -ize instead of -ise—the latter is not a standard, either). If advertisers started the period revolution, then others might start. I have also heard people insist it is “prettier”. (note that the period is outside the quote because it belongs there).
  3. Supporters are trying to prevent bots from obtaining their numbers and calling them with advertisements—at least until the bots’ authors become aware of the changing (non-)convention.
  4. Supporters are avoiding line breaks. Hyphen-minuses (the legacy hyphen characters from ASCII) are breakable characters, so using them in phone would allow typographical nightmares like:
    1-800-
    746-663
    Non-breaking equivalent characters were added to Unicode to address such issues; use U+2011 for hyphens and U+00A0 for spaces.

I write my phone numbers as a fully qualified, space-separated sequence of digits (+011 209 858 xxx xxx) and recommend that because it is closer to a standard than any other format I have seen.