gushi: (Default)

Please comment on this entry, this is something I put a lot of thought into!

I apologize in advance to the non-programmers in the group, who just want to read English. I'm not going to talk much about unix or perl or the other actual programming languages here. What I'm looking to discuss is communications, linguistics, and human perception.

Before I get started, I've seen several humorous emails circulating like these over the years, where a person suggests lingual changes and starts using them as soon as mentioned, deteroritating the language in the essay to their satirical end. I am suggesting lingual changes and using them live, although I'm using them from the start of this entry. I hope to all of you that my result is not as unreadable in the end as some of these. If it is, you all have my apologies.

I find when writing these entries that I tend to tailor my writing for the web and more specifically Livejournal.

Most of you probably do not know that all my posts run through an awesome little filter called Markdown, which takes email-like formatting, and turns it into HTML. Things like __this__ become this. Links are very often automatic. It makes things refreshingly easy for most things. Occasionally, though, when writing something like poetry, I bypass Markdown and write the HTML myself, because HTML ignores carriage-returns. It's a case by case basis, too. Sometimes I use <i>, sometimes I just use the asterisks. Sometimes markdown gets it wrong, like when talking about the apache module: modaccessrbl in a previous post, when I typed mod_access_rbl. I had to go back manually and correct the lj entry on that one.

I should also mention that I haven't found a good word word processor for the unix shell, where ctrl-b, ctrl-i, ctrl-u act as you'd expect them to, or there are formatting menus or options. Even though I'm writing this in a livejournal client, it's really a plain text editor, most of the formatting is either left to Markdown, or myself writing the raw html in the entries.

The point in all this, is that I am already not writing, in what would be considered standard written English, as one would write on a piece of paper.

If I write a postcard to a friend, I don't put <br /> in the closing. There are, whether using Markdown's syntax, or my own, already formatting sigils in place, designed to influence how the reader will ultimately read it. It is a programming language: there are input and output and these are not the same.

I tend to write English as though I'm writing code. I put in asides, in jokes, parentheticals: the tendency is to dump everything I'm thinking on a particular subject, and this only results in more editing later. My essays suffer "feature creep". There are often things I would mention in conversation, to make a subject more clear, and in conversation, it would probably become more apparent if the person understood concepts without overexplantion, or wanted the humorous asides, or was not someone I perceived to have a sense of humor. After all, if the point of communication is to share one's thoughts, why not share all of them?

It was the goal in this entry: to leave nothing out, leave nothing on the cutting room floor, so to speak. Of course, as mentioned in the previous entry, this obscures readability.

Writing like on Livejournal is a broadcast medium. Some of you are technical, some not. Some of you get my jokes, some not. Some of you are familiar with a concept, for others, it requires explaining, and I cannot always be sure which of you have been reading for how long, so I need to make backreferences to older entries and link them. At work, there is the same problem. Some of my coworkers heard me go on about an idea for redoing a project, some not. But in an email or an LJ post, the same dataset is sent to the same people. The only other option is to write two emails, one of which will likely fully include the content of the other. The simple reason is: most people don't read as they would talk or listen. Most people aren't set to bypass and parse the tags that start with X or Y.

I've also come up with a technical, rather than linguistic, problem on LJ: there's no way to target segments of an entry to a particular class of people. Like those reading via a particular filter, or those in a specific friendsgroup.

These problems are related, and I came up with something very cool, that solves them better than LJ: Dynamic CSS.

(as an aside: for people who do not understand the power and flexibility behind what stylesheets can do for you, please go visit the web site http://csszengarden.com right now. They give you a single, static, valid HTML page, and show you the different possibilities that can be done just by altering the stylesheet. In all cases, the HTML is untouched.)

Imagine if you will, that you defined () as a tag in HTML, just like <> are. Lingually, parens are pretty close to what a tag is in HTML. Except instead of telling the browser how to format and display it, you're telling your target audience whether or not to read it.

Once you formalize the parens as a tag, correctly written entries could show only the raw text to those who wanted it, or the parentheticals and asides, all by simply altering the "stylesheet". Consider that you could use (XXX:) as your tag. Which means when talking about some things that need expansion (for example: this, that, and the other thing), that's a tag that could be shown. However, when talking about more tangential topics (funny story: I once wrote a whole linguistics entry on a tangent), such things could optionally be omitted by the parser/displayer. It's already in somewhat-use. (For example: I'm sure other people have written with this "tag".)

Your audience could choose which format they're most comfortable reading. You get the benefit of knowing that your thoughts are out there, and if your audience understands you, tends to follow your particular mode of thinking, they may be able to safely turn off tags (like: (e.g.:) or (i.e.:) but may want to leave others on. (Such as: (humor:) or (tangent:).) It makes me a stronger writer because it forces me to identify such behaviors, but at the same time, lets me continue writing to a level of technicality I want. (for example: I could define tags like (technical:) and (reallytechnical:)). All my writing would go through an additional filter that checked for unrecognized tags, and would allow me to add them to the stylesheet, or alias them to others (For example: (e.g.:) may be aliased to (for example:) or (example:)).

One more semantic is that in writing such a parser, it makes sense to define a tag to include the trailing space before a paren, so that if a tag were omitted by the parser, sentences would still end with a period directly following a word (such as: this one).

Nested tags would not be parsed. Parens without a colon-tag would be passed-through as-is, although might optionally trigger a warning in the parser. Perhaps this would require "real" parens to be written as "(:whatever)" to indicate a null-tag, but probably not. After all, there are cases where you need normal parens. Either when quoting original source that contains them, or in situations like: Sen. George Johnson (D, Kansas), or typing out equations.

The goal of this is that the language would be still parse completely validly, and be completely readable and syntactically valid either with or without the tags.

To be completely geeky, I should be able to take this whole entry, put it in a text editor, and find-and-replace the perl regular expression "\s\x28.*:.*?\x29" with "nothing", and it should all make sense. It may need a little more logic than that to handle nested tags, though. Note carefully the \x28 and \x29 are the parens, in hex. Otherwise the expression itself would not survive the substitution. But because I put them like that, anyone else here can try it with a smart enough text editor that understands wildcards, too. (aside: I should probably invite [livejournal.com profile] calmingshagoth to come up with a better regex, or just tell me outright it's not possible.)

They may make concepts harder or easier to grasp, but they let me write to multiple audiences as well. Like all standards, this entry is actually written in it. (aside: the author of Markdown wrote their documentation in Markdown's own format.)

For those who understand the C programming language, this is entirely like a lingual #IFDEFINE. Most programmers consider them to be somewhat "dirty", but very often they are the only option when writing or adapting code that needs to be understood by a wide variety of systems of varying age and standard. (humor: Gee, look where I work!)

As stated before, LiveJournal suffers an issue like this. There's no easy way to address people reading via a specific filter, via a specific page (for example: reading an entry directly on its own page as opposed to your recent page or even their friends page), or a person who meets certain lists.

Livejournal could certainly benefit from this. "I am pregnant. I don't want to discuss the sex of it publicly. (friends: It's a boy!!!)". Livejournal has already caused users to learn a subset of HTML that only works on Livejournal, and even then somewhat inconsistently. (ranting: Why is it that LiveJournal can't just send me an email when I post an entry with "irreparable markup"? LJ's behavior is to spew out the RAW html to my friends list and say "owner must fix". Wouldn't it make more sense to put broken entries behind a cut, or set them private?). On livejournal specifically, you have certain things defined: you know who is reading, people are usually logged in, and they could have the ability to set preferences for which of a person's lingual tags they want to read.

Imagine for example the following text on LJ: "I was in san francisco this weekend (friends:at the folsom street fair!), and it was awesome. I met up with my friend Jeff (interest=furry:[livejournal.com profile] aatheus) after that." One sentence. Four distinct targets. This is more complex because it requires the LJ framework to work, and isn't readable in standard English, but that's the point. I don't write <lj-cut> when I run out of space on a postcard! I don't use "@jacel: thanks for the compliment" on LJ. They are different media, and can afford to have different syntax. At the very least, such a concept could make people smarter about saying things that could get them into trouble. (aside: anyone remember Jag's ban from Anthrocon and the whole room 909 problem?)

In conclusion, English is a fluid language. It evolves over time. Unlike HTML or CSS or C, there's no standards body to determine how it's to be used, parsed, and worked. I don't know if it will ever evolve to the point where people will write in this syntax constantly, or where people will be able to constantly read and expect this raw syntax. However, in the case of Markdown, it evolved from a pragma people were already using: the way we had adapted a plain text medium to carry formatting information. And while Markdown is relatively new, that concept, within email, dates back long before HTML was invented, let alone was possible in an email. (aside: You can blame Microsoft for that one).

If I ever get around to writing my own blogging software, there's one more feature I would put that is complimentary to this, but that is to be discussed in a different entry, as I'm trying to strongly embrace the one-concept-per-text rule, both for the sake of brevity, as well as to give each concept the attention they deserve. After all, if I believe in these ideas, I owe them that. (sarcastic: And yes, this has all been all one concept.)

One last thought, added after I posted this: I somewhat-intentionally overused these concepts here as an illustrative point. I doubt in regular writing that I would do so, but part of my goal for this was to allow me to integrate stream-of-consciousness writing. Even with the tag format, there's still a balance.

Once again, comments extremely welcome.

gushi: (Default)

Language is an important thing to me.

I have a few language problems in my writing style, and I feel they reflect flaws in my thinking style.

Specifically, I tend to interlude heavily in standard writing. I tend to insert references, or jump to asides. I tend to use commasplices, even where correctly done, in sentences that don't need them. For example, a "correctly done" commasplice is where the segment between two commas can be removed and the sentence will still make sense, such as 'even where correctly done' in the previous sentence.

I think more quickly than I can write, and I struggle to get my ideas down on paper.

I tend in email communication to make messages far longer than most people's "tl;dr" filter (too long; didn't read).

I've done a few things to combat this, especially at work:

1) At my manager's insistence, I've turned on the pine option "do not send flowed text", which means basically that my email goes out hard-wrapped wherever the composer wraps it. Apparently I am the only person whose email goes all the way across the screen, otherwise. Now, in my brain, that would mean "your window is the wrong size, then", but he's the boss. And pointing out that my mailer, in doing so, was complying with RFC 3676 didn't seem to help. The net result of this is LESS of one of my emails fits on a screen, which makes the next steps more of a challenge.

2) I started using my screen length, which varies depending what system I'm on, as the delimiter for if a message is too long. If it goes beyond screen-length, I seriously consider scrapping it and start over. This isn't a hard and fast rule, but I've discovered some things about people:

  • People are more likely to read something through if they see your signature as soon as they start reading.

  • People aren't in the same frame of mind I am, not thinking the same thoughts along the same lines: if I've forced them to scroll down, this means what I've previously said is off-their-screen, and will be when replying as well.

  • People only tend to grasp one concept per email. If I sent a 10-screen email about various projects I want to do, breaking them all down, it's less likely to get read than if I had sent ten emails, and cautiously timed them throughout the day.

  • The length of your message is inversely proportional to the number of completely-relevant responses you will get (i.e. responses which address all points you have made). I think I can prove this by the number of responses I will get to this entry, versus me making a post where I simply say something short and meme-like, like "Gushi can't enjoy his sandwich". Now, in the corporate world, it's more important. I write messages because I need people to understand why I'm about to do something, why I need to take a server offline, why I need to spend money.

  • The last rule above is less true on my blog. I love knowing people read it, and I love feedback, but I fully expect that the people who will understand everything I write is a subset of the total readership: i.e. on the blog it's more about "get your thoughts out" and less about "make people want to read your message".

3) I worked quite hard, at least in corporate mail, to reduce or eliminate a few overused standards I love:

  • Ellipses I love these things, but I'm trying to mentally...train myself...to hear william shatner...when I read them. I tend to use ellipses when I'm unsure of a concept, or when a concept is...not quite right...almost like another problem.

  • Parenthetical asides and other things like footnotes. They basically are the universal symbol for getting off topic for a short while. (All of family guy's humor is based on this concept. See?)

  • The em-dash I don't abuse this as much as I used to. I tend to use it more abusively in fiction -- where I'm trying to describe the stream of consciousness inside a character's head. I guess this means I tend to think in em-dashes. It makes sense.

  • Emoticons and humor. I'm a geek in a company full of geeks. I tend to be laid back, but I need to try and communicate more seriously. I suppose a part of me feels this is necessary as I'm in a new-ish situation at a new job, and I perceive a lot of people as a bit uptight, and don't know them well. I guess in a widely-geographically-distributed company I'm trying to impart the same level of relaxation and gregariousness I'd show in person, but the analog is less than perfect, and I feel it might make me seem less than professional.

Like the "length" rule above, I tend to think the last one is less true on my blog as well. An emoticon can mean the difference between lawsuit-angry and bofh-angry. But being more aware of it in general is not a bad thing.

I find becoming conscious of the above helps me be aware of it. I'm not trying to stop using them entirely, just to realize that if I'm using them, I'm losing the message. I mean, they have their place -- all punctuation does. (Doesn't it? I'm not sure...) Sorry, unavoidable.

4) Syntax checking. As I tend to write in a technical sense, and in a harried fashion, I notice a lot of times where I'll do something like:

We need to check for this syntax (like we did on that other thing, which is important all the time (except in case X)

See above? It's the desired format. It gets the information across, and yes the parentheses are necessary. But just like in programming, it fails to parse because there's no secondary closing brackets. I tend to miss this and endquotes all the time. It annoys me. And there's no good open-source "readability checker" I can filter my mail outbound through.

Ironically, most text editors let me do this for writing code, let me find mismatched or misbalanced brackets, it's just not built into my email client. And above, where should the closing paren be? After the word 'thing', or should it be a double-paren after 'X)'? Only I know, so sending mail out without it is sloppy. And it bothers me quite a lot.

I'm a technical writer, and I try to treat my audience as techical. While I may talk about nontechnical manners like emotions in this journal, I maintain a technical tone. And lets face it, the emotions and nuances of the human brain are infinitely more complex than simple things like computers.

Writing technical is a lot like writing lawyerese. Very often you have to detail several examples of things, and more often than not, some of those examples will have things in common that others do not. The semantic differences between words like "MAY" and "SHOULD" and "MUST" are critical in the world I live in. It involves detailing problems most people don't see, and predicting standards that will be used long after you're gone.

Writing is also an arduous process for me. It tends to be a brainstorming long-write process, then getting out ideas and de-duplicate things. I'll often mention the same idea two or three times, then edit and refine them down, moving whole paragraphs and sentences around. As a quick example, the list above was not written in order at all; #2 was written last.

It means cutting concepts that I think are notable to say but ultimately un-relevant. Above, when talking about there being no unix-based readability checker, I wanted to talk about how I'd see the ideal use of such a thing to be in a spam filter. But it dilutes the topic, and that's bad. For that one, I can mention it here. But on others, dropping those ideas hurts, since I may not know if or when I'll remember to write up a whole separate entry about how cool that would be, and a lot of ideas have merit. Especially when talking about improving an existing system: often you lose scope and want to change the system to make it better, rather than working a single problem. This is hard for a lot of people.

It's definitely not helped by the fact that my work and my life are interrupt driven. In mid-paragraph I might need to get up to handle a "fire", and come back 45 minutes later, and experience a need to reorient myself, which I often don't do as well as I should. Caffiene also makes it worse for me, it makes me more focused on making a post/letter LONGER and more-tangented. My boss is rather famous for saying "I'll explain it because I've had too much coffee".

I'm working on it, slowly. It's not easy. I'm hoping the techniques I've detailed here give insight to anyone else who reads into what goes on in my head, and into what it takes for me to do this. I had someone today say that I was very worth reading, which is awesome. (Thanks [livejournal.com profile] jacel).
I've been told by several people I should write a tech-blog, but what's the point there? This is me. This is who I am. I am a human who is technical. I suppose the logic of splitting my blogs if I decide to is best saved for another post as well.

Now, if you'd like to talk about tangents: this started off as a post in my other blog, where I share intimate things about my relationship-life. Within two paragraphs, I was off the original topic and talking about writing standards. Since it's a reasonably good chance that everyone who reads that blog reads this one, I'll probably consider this read-first type material.

August 2017

S M T W T F S
  12345
678 9101112
13141516171819
20212223242526
27 28293031  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 08:58 am
Powered by Dreamwidth Studios