1/16/2010

Talking with machines

On the Twitter microsyntax front, Project EPIC is working out an emergency syntax for Haiti rescue communication efforts.

It's interesting to me because the syntax expression itself is not new, as it just uses hashtags. However, it utilizes hashtags for pre-set message components, as befitting important communication relay elements.

From the link above:

Our team and collaborators are proposing a Tweet-friendly hashtag-based syntax to help direct Twitter communications for more efficient data extraction for those communicating about the Haiti earthquake disaster. Use only requires modifications of Tweet messages to make information pieces that refer to #location, #status, #needs, #damage and several other elements of emergency communications more machine parsable.

EXAMPLE1: #haiti #imok #name John Doe #loc Mirebalais Shelter #status minor injuries

EXAMPLE2: #haiti #need #transport #loc Jacmel #num 10 #info medical volunteers looking for big boat to transport to PAP

EXAMPLE3: #haiti #need #translator #contact @pierrecote

EXAMPLE4: #haiti #ruok #name Camelia Siquineau #loc Hotel Montana

EXAMPLE5: #haiti #ruok #name Raymonde Lafrotune #loc Delmas 3, Rue Menelas #1

EXAMPLE6: #haiti #offering #volunteers #translators #loc Florida #contact @FranceGlobal


PRIMARY TAG
#need
#offering
#imok
#ruok
#damage
#injured
#road
.....

SECONDARY TAG
Need/Offering Descriptor Tags
#food
#water
#fuel
#medical of #med
#shelter
#transport
#volunteers... can shorten to #vols
#translator
#status
#status
#financial or #money
#information or #info
#supplies [list specific supplies needed]
.....

Data tags
#name [name]
#loc [location]
#num [amount or capacity]
#contact [email, phone, link, other]
#photo [link to photo]
#source [source of info]
#status [status]
.....

End Tag
#info [other information]

Overall order is not as important as tag-descriptor connection.


In a time of crisis, it makes sense to not quibble about whether slashes, backslashes, other symbols, or certain pre-set abbreviations make the most sense. And so, they've actually put something together quite sensible--they've basically converted a twitter post in a DB data record, with hashtag delimination. Any firehose sniffing program should be able to pick out and synthesize the relevant information from this list of tags.

They don't have any parsing programs showcased on the site, but they are live, tweeting with this syntax (@epiccolorado), and it shouldn't be too hard (for someone other than me) to build one pretty quickly.

There are some things I really like about this.

- It's simple. It takes a convention people already know, and re-uses it.

- It's basically making a simple little code book. One could print out the list of commonly-used tags on an index card, and in only a few seconds put together a message readable to the network of people looking for this format.

- It is indentifying a basic sentence structure, on a level up from "twitter syntax". This is new for Twitter semiotics. If you look at a commonly used syntax, such as the re-tweet, you will see a variety of different amalgamations of the syntax. Some put the "RT" first, or last, or some are now using "via" rather than "RT". Some RT only the last person in the RT chain, some put the first, or some put all. None of this matters, of course, because the message is still getting across. But with this EPIC format, the order of the tags matters, and yet is still a bit flexible. It leads with the identifier, "#haiti", and then continues in a line of primary, secondary, data, and then additional information tags to shape the message in an understandable way. The simplest way of forming a regular sentence is with [Subject] -> [Verb]. Then, you can expand that to [Subject] -> [Verb] -> [Object]. And then, [Subject] -> [Verb] -> [Object] -> [Adjective]. And then, [Subject] -> [Verb] -> [Adverb] -> [Object] -> [Adjective]. You get the idea. The position changes depending on what language you use, but our system of language is basically a database, assigning values to these different data types in a particular record, and then parsing the record in conjunction with other records. This EPIC format is doing that with the basic information types for crucial rescue information.

- It's readable by humans as well as machine. Anyone looking at a tweet in this format could tell what it means. In this way, it fits into the main trending flow of #haiti tweets, but also can be pulled out from the noise. It is a very ingenious, although simple, middle ground between incomprehensible DB record, and common language sentence. This is where I see the microsyntax on Twitter heading... some common, comprehensible ground between XML script and common language punctuation. It is an understandable written language, but syntaxed to be capable of being metadata.

It will be interesting to see how well this works in Haiti, but thinking ahead to the next disaster, they should print up laminated index cards with these tags on them, and syntax examples on the other side. They can air drop them, or distribute them with Twitterized cell phones. The beauty is that anyone can contribute to the information collection, using whatever means happens to work: cell phone, SMS, Internet, Twitter app, or even potentially voice. Add geotagging to the metadata, and you are getting near instant, localized, specifically formatted information from the ground. It should be pretty easy to go back and rank the tweets coming in, as DB reports are verified, bumping up users who provide good information. Any responder on the ground could easily be linked into the overall real-time awareness DB, without having to transfer on phone and radio, or waiting for confirmed contact. Report is made, and then the responder can go about his/her work.

Just wait until this sort of thing goes audible. Ten codes, the codes police and dispatch use over the radio, are currently being phased out all across the country because they are not unified, and sometimes cause confusion in hectic situations. But these are merely translations. One ten code stands for something else. What if they were syntactical codes, to let a computer or human listening know what sort of information was being read over the air? What if we are started using a vocal "click" to denote a hashtag, so the next spoken word would be known as an indexable primary or secondary tag, giving additional meaning to the data spoken next? It would be "plain speech", but plain speech imbued with metadata for easy compilation into DB style records. With voice-to-text-capture on the radio feed, there could be one open channel, with everyone speaking at once. The computer would capture the speech, complete with hash tags, and publish it to a readable timeline on the screen. The radio metadata (the unit's number is already included silently in the broadcast in current technology) would allow the dispatch or the particular units to follow the timeline of only particular units, say, involved on that particular response. You could listen to the open feed for instant vocal communication, or you could filter the feed to particular data tags.

Language has a great potential for cyborgization. Cybernetics is an extension of our logical thought processes, so there is no reason why our thought processes can't increase our computerized tools by interfacing our current age-old communication techniques with our new technology. Speak the future.

No comments: