Please comment on this entry, this is something I put a lot of thought into!
I apologize in advance to the non-programmers in the group, who just want to read English. I'm not going to talk much about unix or perl or the other actual programming languages here. What I'm looking to discuss is communications, linguistics, and human perception.
Before I get started, I've seen several humorous emails circulating like these over the years, where a person suggests lingual changes and starts using them as soon as mentioned, deteroritating the language in the essay to their satirical end. I am suggesting lingual changes and using them live, although I'm using them from the start of this entry. I hope to all of you that my result is not as unreadable in the end as some of these. If it is, you all have my apologies.
I find when writing these entries that I tend to tailor my writing for the web and more specifically Livejournal.
Most of you probably do not know that all my posts run through an awesome little filter called Markdown, which takes email-like formatting, and turns it into HTML. Things like __this__ become this. Links are very often automatic. It makes things refreshingly easy for most things. Occasionally, though, when writing something like poetry, I bypass Markdown and write the HTML myself, because HTML ignores carriage-returns. It's a case by case basis, too. Sometimes I use <i>, sometimes I just use the asterisks. Sometimes markdown gets it wrong, like when talking about the apache module: modaccessrbl in a previous post, when I typed mod_access_rbl. I had to go back manually and correct the lj entry on that one.
I should also mention that I haven't found a good word word processor for the unix shell, where ctrl-b, ctrl-i, ctrl-u act as you'd expect them to, or there are formatting menus or options. Even though I'm writing this in a livejournal client, it's really a plain text editor, most of the formatting is either left to Markdown, or myself writing the raw html in the entries.
The point in all this, is that I am already not writing, in what would be considered standard written English, as one would write on a piece of paper.
If I write a postcard to a friend, I don't put <br /> in the closing. There are, whether using Markdown's syntax, or my own, already formatting sigils in place, designed to influence how the reader will ultimately read it. It is a programming language: there are input and output and these are not the same.
I tend to write English as though I'm writing code. I put in asides, in jokes, parentheticals: the tendency is to dump everything I'm thinking on a particular subject, and this only results in more editing later. My essays suffer "feature creep". There are often things I would mention in conversation, to make a subject more clear, and in conversation, it would probably become more apparent if the person understood concepts without overexplantion, or wanted the humorous asides, or was not someone I perceived to have a sense of humor. After all, if the point of communication is to share one's thoughts, why not share all of them?
It was the goal in this entry: to leave nothing out, leave nothing on the cutting room floor, so to speak. Of course, as mentioned in the previous entry, this obscures readability.
Writing like on Livejournal is a broadcast medium. Some of you are technical, some not. Some of you get my jokes, some not. Some of you are familiar with a concept, for others, it requires explaining, and I cannot always be sure which of you have been reading for how long, so I need to make backreferences to older entries and link them. At work, there is the same problem. Some of my coworkers heard me go on about an idea for redoing a project, some not. But in an email or an LJ post, the same dataset is sent to the same people. The only other option is to write two emails, one of which will likely fully include the content of the other. The simple reason is: most people don't read as they would talk or listen. Most people aren't set to bypass and parse the tags that start with X or Y.
I've also come up with a technical, rather than linguistic, problem on LJ: there's no way to target segments of an entry to a particular class of people. Like those reading via a particular filter, or those in a specific friendsgroup.
These problems are related, and I came up with something very cool, that solves them better than LJ: Dynamic CSS.
(as an aside: for people who do not understand the power and flexibility behind what stylesheets can do for you, please go visit the web site http://csszengarden.com right now. They give you a single, static, valid HTML page, and show you the different possibilities that can be done just by altering the stylesheet. In all cases, the HTML is untouched.)
Imagine if you will, that you defined () as a tag in HTML, just like <> are. Lingually, parens are pretty close to what a tag is in HTML. Except instead of telling the browser how to format and display it, you're telling your target audience whether or not to read it.
Once you formalize the parens as a tag, correctly written entries could show only the raw text to those who wanted it, or the parentheticals and asides, all by simply altering the "stylesheet". Consider that you could use (XXX:) as your tag. Which means when talking about some things that need expansion (for example: this, that, and the other thing), that's a tag that could be shown. However, when talking about more tangential topics (funny story: I once wrote a whole linguistics entry on a tangent), such things could optionally be omitted by the parser/displayer. It's already in somewhat-use. (For example: I'm sure other people have written with this "tag".)
Your audience could choose which format they're most comfortable reading. You get the benefit of knowing that your thoughts are out there, and if your audience understands you, tends to follow your particular mode of thinking, they may be able to safely turn off tags (like: (e.g.:) or (i.e.:) but may want to leave others on. (Such as: (humor:) or (tangent:).) It makes me a stronger writer because it forces me to identify such behaviors, but at the same time, lets me continue writing to a level of technicality I want. (for example: I could define tags like (technical:) and (reallytechnical:)). All my writing would go through an additional filter that checked for unrecognized tags, and would allow me to add them to the stylesheet, or alias them to others (For example: (e.g.:) may be aliased to (for example:) or (example:)).
One more semantic is that in writing such a parser, it makes sense to define a tag to include the trailing space before a paren, so that if a tag were omitted by the parser, sentences would still end with a period directly following a word (such as: this one).
Nested tags would not be parsed. Parens without a colon-tag would be passed-through as-is, although might optionally trigger a warning in the parser. Perhaps this would require "real" parens to be written as "(:whatever)" to indicate a null-tag, but probably not. After all, there are cases where you need normal parens. Either when quoting original source that contains them, or in situations like: Sen. George Johnson (D, Kansas), or typing out equations.
The goal of this is that the language would be still parse completely validly, and be completely readable and syntactically valid either with or without the tags.
To be completely geeky, I should be able to take this whole entry, put it in a text editor, and find-and-replace the perl regular expression "\s\x28.*:.*?\x29" with "nothing", and it should all make sense. It may need a little more logic than that to handle nested tags, though. Note carefully the \x28 and \x29 are the parens, in hex. Otherwise the expression itself would not survive the substitution. But because I put them like that, anyone else here can try it with a smart enough text editor that understands wildcards, too. (aside: I should probably invite calmingshagoth to come up with a better regex, or just tell me outright it's not possible.)
They may make concepts harder or easier to grasp, but they let me write to multiple audiences as well. Like all standards, this entry is actually written in it. (aside: the author of Markdown wrote their documentation in Markdown's own format.)
For those who understand the C programming language, this is entirely like a lingual #IFDEFINE. Most programmers consider them to be somewhat "dirty", but very often they are the only option when writing or adapting code that needs to be understood by a wide variety of systems of varying age and standard. (humor: Gee, look where I work!)
As stated before, LiveJournal suffers an issue like this. There's no easy way to address people reading via a specific filter, via a specific page (for example: reading an entry directly on its own page as opposed to your recent page or even their friends page), or a person who meets certain lists.
Livejournal could certainly benefit from this. "I am pregnant. I don't want to discuss the sex of it publicly. (friends: It's a boy!!!)". Livejournal has already caused users to learn a subset of HTML that only works on Livejournal, and even then somewhat inconsistently. (ranting: Why is it that LiveJournal can't just send me an email when I post an entry with "irreparable markup"? LJ's behavior is to spew out the RAW html to my friends list and say "owner must fix". Wouldn't it make more sense to put broken entries behind a cut, or set them private?). On livejournal specifically, you have certain things defined: you know who is reading, people are usually logged in, and they could have the ability to set preferences for which of a person's lingual tags they want to read.
Imagine for example the following text on LJ: "I was in san francisco this weekend (friends:at the folsom street fair!), and it was awesome. I met up with my friend Jeff (interest=furry:aatheus) after that." One sentence. Four distinct targets. This is more complex because it requires the LJ framework to work, and isn't readable in standard English, but that's the point. I don't write <lj-cut> when I run out of space on a postcard! I don't use "@jacel: thanks for the compliment" on LJ. They are different media, and can afford to have different syntax. At the very least, such a concept could make people smarter about saying things that could get them into trouble. (aside: anyone remember Jag's ban from Anthrocon and the whole room 909 problem?)
In conclusion, English is a fluid language. It evolves over time. Unlike HTML or CSS or C, there's no standards body to determine how it's to be used, parsed, and worked. I don't know if it will ever evolve to the point where people will write in this syntax constantly, or where people will be able to constantly read and expect this raw syntax. However, in the case of Markdown, it evolved from a pragma people were already using: the way we had adapted a plain text medium to carry formatting information. And while Markdown is relatively new, that concept, within email, dates back long before HTML was invented, let alone was possible in an email. (aside: You can blame Microsoft for that one).
If I ever get around to writing my own blogging software, there's one more feature I would put that is complimentary to this, but that is to be discussed in a different entry, as I'm trying to strongly embrace the one-concept-per-text rule, both for the sake of brevity, as well as to give each concept the attention they deserve. After all, if I believe in these ideas, I owe them that. (sarcastic: And yes, this has all been all one concept.)
One last thought, added after I posted this: I somewhat-intentionally overused these concepts here as an illustrative point. I doubt in regular writing that I would do so, but part of my goal for this was to allow me to integrate stream-of-consciousness writing. Even with the tag format, there's still a balance.
Once again, comments extremely welcome.