A lot of these are just getting off the ground. But you're welcome to poke about.

Putting sound in your webpages: the new <audio> tag

A lot of websites about language include sound files so visitors can hear words. Often, when the visitor clicks the sound file, they are taken to a separate page in their browser with a single "player" to play the sound. This distracts from the flow of the page. There's a better way.

Unicode Combining Character Tester

This is a small one-page web application for testing how various fonts handle multiple combining characters. It was created during a class on orthography taught by Keren Rice and Gwendolyn Hyslop at InField2010.

Fuzzy Matching

A few experiments in fuzzy lookup with minimal code.

An Introduction to Fuzzy Matching

Fuzzy matching is the common name for the process of finding non-exact results when searching a textual database. The technique is used in familiar functionality such as spellchecking and suggestions in search engines. Fuzzy matching can also be very useful for linguists working with linguistic databases. This brief introduction explains some basics behind fuzzy matching, and explains a simple implementation. For instance, using fuzzy matching, a query for _Nicolai_ might match _Nicolaï_; Further reading [1] http://en.wikipedia.org/wiki/Fuzzy_matching

Glossing: Experiments in web-based linguistic glossing

The Leipzig Glossing Rules are a practical characterization of traditional linguistic glossing technique. I'm working on tools to edit glosses in this format within the web browser.

Rendering Linguistics in HTML

What format to exchange linguistics scholarship in has been a matter of contention for years. How should interlinear glosses be rendered—as tables in word processing files? But which sort of file? .doc? .docx? .rtf? .pdf? One of the various forms for Open Office? I'd like to suggest that the most future-proof format is HTML5. This project contains some experiments that how HTML5 can be used to create documents that look very much like a traditional linguistics paper.

Love: Emulating Print Dictionaries on the Web

Herein some random experiments at marking up entries from various bilingual dictionaries as HTML. A few conclusions: After encoding several examples, I've come to the conclusion that using definition lists (<dl>'s) to encode bilingual information seems to work best. My reasoning for this is that if you turn off the CSS in your browser, the examples that were marked up as definition lists look much more reasonable.

Hunting for Minimal Pairs

If you have a big list of words, how can you find good minimal pairs? Like this...

Notebook: A web-based audio transcription tool for linguists

Linguists have traditionally used field notebooks to record their elicitations with speakers. While a noble tradition, there are drawbacks to this approach: above all, the text is not searchable. But also, there is no convenient way to link the transcriptions to recordings. We propose a web-based tool for transcribing audio recordings.

Picnic: An image collection and annotation tool

Flickr.com offers a programmatic interface to its vast array of tagged images. Picnic is a tool for collecting images under a free license from Flickr for use in language documentation and language learning. The code is hosted on github here.

In Search of Reduplication

Suppose you are interested in studying reduplication, and suppose you have a big list of words in the language. How can you automatically collect many words which contain reduplication from the list of words? This article will tell you how.

From text files to sentences

Suppose you have a bunch of .txt files filled with prose. How can you turn that prose into a list of individual sentences? You can start by reading this little tutorial.

SQL for Linguists

Wals.info is an amazing web-based resource for cataloging the linguistic features of the languages of the world. Anyone interested in language or linguistics should check it out. Interestingly, the project also allows users to access a SQL dump of their database. Herein, we take a look at some basic SQL using that database to keep it interesting.

How to do Stuff with Text and Python

Python is a great language for working on linguistic tasks. It's readable and reasonably concise. It's also popular, which means there are a lot of existing libraries you can tap into to help you with your project. Here's a super short tutorial on a simple linguistic task: how can we extract sentences from a text file? ## Philosophical Preliminaries (Just kidding.) A good way to think of programming is how you would learn to be a wizard. When Hermione tells you that there's a spell to ## Read the file

JSKBD: Javascript input interface

Problem: There is a need for a browser-based method for inputting text in non-standard writing systems. Proposed solution: The tool will be implemented as a jQuery plugin which can be included in any web page, and specify a format for defining new keyboard layouts (preferably by using an existing standard).

Putting video in your webpages: the new <video> tag

In which I attempt to use the <video>.

Web Basics for Linguists

A little HTML can go a long way to toward bringing your work in linguistics to the web. Herein I'll be collecting some basic documentation on how to produce simple web content for linguists.