XML for Music

Desktop musicians have an embarrassment of riches when it comes to software tools for making music. The proliferation of applications and plug-ins should

FIG. 1: The above shows a one-note song encoded inMusicXML.FIG.2: AMuseTec's MuseBook Score relies on MusicXML for its combination ofmusic-performance recognition and its automatic page turningcapability.FIG.3: This XML-based Pitch Range Analysis tool was developed byRecordare.

Desktop musicians have an embarrassment of riches when it comes tosoftware tools for making music. The proliferation of applications andplug-ins should allow you to piece together a suite of toolstailor-made for a variety of creative and commercial projects. Reality,however, doesn't always work out quite so neatly, even with the greatpower and versatility of MIDI.

The problem is that MIDI was designed to be a mechanism for sendingand receiving musical event information, especially the explicit,quantifiable attributes of a performance. It is less than optimal forthe richer set of attributes that many applications, such as notationand analysis programs, depend on. As a result, it's often difficult toget software from different vendors to play together nicely.

For example, a musician might select a best-of-breed sequencer tocapture the nuances of a performance from a MIDI-equipped instrument.She wants to expand the arrangement, so she imports the MIDI file intoher favorite notation program and manually fleshes out the score. Sofar, so good.

Next, she needs to send the file to her writing partner, but he hasa different notation program. Because MIDI doesn't have any sense ofbeams, stems, ties, or other staples of music notation, it can't beused to move the score from one application to another. An adapter toconvert the proprietary format of the original program to that of thesecond program must be used. Each time the collaborators exchange thefile, this translation occurs and most likely introduces errors witheach pass.

The day of the session arrives, and the final score must again beconverted for the studio's software, and some parts must be translatedback into MIDI for the sequencers. When the session players arrive,several are toting digital music displays. With a sigh, our songwriterfires up a Web browser and starts hunting for a plug-in that will lether properly communicate with everyone. The moral of the story? Withouta sufficiently rich standard for representing music, each applicationneeds a specific translator for every other program that it wants totalk to.

Several attempts to address this unfortunate situation have beenmade, chief among which are the Notation Interchange File Format (NIFF)and the Standard Music Description Language (SMDL). NIFF was completedin 1995 and was designed specifically as a music-notation interchangestandard. Even though it has been adopted by several notation andmusic-scanning programs, its graphical approach to representing musicinformation limits its usefulness for performance applications. WhileSMDL has reached the status of official standard (ISO/IEC DraftInternational Standard 10743), it suffers from the opposite extreme oftrying to be all things to all people.

The ISO standard defines SMDL as “an architecture for therepresentation of music information, either alone, or in conjunctionwith text, graphics, or other information needed for publishing orbusiness purposes. Multimedia time-sequencing information is alsosupported.” In short, SMDL set out to become an all-encompassingstandard for representing all music-related information. While SMDL haslargely succeeded in its ambitious goal, its approach is so abstract,unwieldy, and even obscure (a note is called a cantus event)that a commercial implementation has yet to emerge after 15 years ofavailability.


This problem is not unique to music applications. Demand forinteroperability among products from competing vendors has been thebane of the computer industry since its birth. Fortunately, a solutionhas arisen from the world of Internet applications: the ExtensibleMarkup Language (XML). XML has emerged as the accepted standard fordata representation and interchange among commercial software packages.Readers of EM are probably familiar with XML as it relates toWeb pages (see “Mark My Words” in the November 2001 issue).But readers may not realize that XML has capabilities that reach farbeyond just souping up home pages.

XML is a markup language for defining markup languages. It allowsyou to define a grammar for describing the data associated with anygroup or class of computer applications. When two programs want toshare data, they exchange a highly structured text document thatconforms to the syntax rules of XML and the grammatical rules that thetwo programs have agreed upon.

XML requires you to define your grammar in the form of another XMLdocument called a schema or a Document Type Definition (DTD). Byfollowing XML's rules for creating a valid DTD, you don't need to worryabout how your grammar will be interpreted by a program. There arestandard tools for that, and that frees you to focus on how to organizeand represent the data you want to share. This balance of flexibilityand consistency is the core strength of XML as a vehicle for datainterchange.

XML is particularly well suited to the representation of music.First, there are many ways to represent a musical structure rangingfrom a single, indivisible monolith to a finely grained, orderedcollection of notes and inflections. Most music has internal structurethat falls somewhere between those two extremes. The different levelsof structure imply hierarchy, and that is how XML structuresinformation. In addition, XML lets you identify and isolate varioustypes of musical components.

This object-oriented approach to music representation allows you todefine how a certain operation behaves depending on the nature of themusical element it is applied to. It also ensures that a given type ofmusical element is treated consistently regardless of where it occursin a piece of music or even across completely unrelated works. Mostimportantly, XML enables you to move beyond the structure of musicnotation to represent its semantics.

Image placeholder title

To demonstrate this concept, look at the example in Fig. 1.It shows a one-note song in 4/4 time represented in MusicXML. MusicXMLis one current implementation of XML for music produced by Recordare(reh-cor-DAH-ray). The first four lines of the example consist ofheader information that shows the file is an XML document. The linesalso show what the XML version number is and which character set isused. Most importantly, they specify which DTD or grammar should beused to validate the document called “score-partwise” andwhere it can be found (www.musicxml.org/dtds/partwise.dtd).

The rest of an XML document is composed primarily of elementsindicated by angle-bracketed tags such as and. Similar to HTML, this open- and close-tag pairindicates the beginning and end of a particular element — in thiscase, the part list. It also implies that elements may be nested withinone another according to rules that are spelled out in the DTD.

In the MusicXML example, a is made up of that consist of a , and so on. An element defines the information that is needed tointerpret the rest of the marked-up song. That includes things such askey and time signatures and the base time unit as divisions of aquarter note. This is where semantic markup and the strength of XMLreally come into play.

As mentioned, MIDI has no concept of a note; it just has Note On andNote Off events. Rests don't exist but are inferred from the emptyspaces between Note Off and Note On. This leads to ambiguities that candestroy the fidelity of a notation program. Likewise, MIDI makes nodistinction between enharmonic equivalents; the F-sharp and G-flatabove middle C are both MIDI Note Number 66. That leaves a notationprogram to make its best guess, often with mixed results. XML solvesthis problem by letting you explicitly indicate what you are encoding.A note (and all of its components) is described as follows:


This fragment defines a single note in the XML-encoded song. Thenote has a pitch that is defined by a of C in the currentkey signature (defined elsewhere in the example) and an of 4 indicating that the note is middle C. The note value may bealtered by the key signature or by an tag indicating asharp or a flat. Every note also has a duration that is based ondivisions of a quarter note. The previous example, in 4/4 time, doesnot divide a quarter note at all so our whole note has a of 4 (4 quarter notes). Just to make sure that it'snotated correctly we give it a of “whole.”

Including separate tags for duration and note type may seem overlyverbose. At a surface level, the entire example may seem bloated, butthat's part of the price paid for flexibility. For example, the way asong is notated may be very different from how it is performed. You maywant to score a series of notes as straight eighth notes but want togive the notes more of a laid-back feel when performed. You couldaccomplish that by tagging the notes with a of“eighth” and then playing with the value.

The potential verbosity of an XML score is alleviated somewhat bythe use of attributes. Attributes can modify or extend thenature of any given element. For example, a well-placed attribute canhelp you know exactly where you are in a given work. The tag, where “number” isthe attribute, indicates that the chunk of XML we are about to look atdescribes a measure, and it happens to be the first one in thesong.

This emphasizes one of the goals of XML encoding and how it bypassesthe pitfalls of some past efforts: it is intended to be readable. Thecomprehensive tagging makes an XML-encoded score much morecomprehensible to the average person than a shorter document crammedwith acronyms and cryptic indicators. If disk space is an issue,compression can reduce XML files to the size of MIDI files.

Another benefit of XML is that it's free — something thatmight well ensure its adoption. All of the currently defined music DTDsand schemas are available under royalty-free licenses. As a result,they can be incorporated into existing and emerging products withoutpaying a dime to their authors, and that's very appealing to softwaredevelopers.


For XML to be truly useful as a music representation and aninterchange mechanism, a standard must emerge. The ability to specifywhich DTD a program wants to see would seem to make that unnecessary,but in fact it makes the situation worse. Defining music-related DTDsseems to be the rage among both music and markup enthusiasts. A quicksearch of the Internet identifies more than 20 music-related DTDs,along with assorted schemas. Most are purely academic exercises orhobbyist tinkering, but a few are gaining momentum toward commercialviability.

Without significant consolidation of the field, each applicationmust map its own capabilities to each markup scheme (in the form of aDTD) that comes its way. That eliminates most of the benefits ofadopting XML in the first place. Fortunately, some leaders have startedto emerge; the most ambitious is the Music Markup Language (MML) beingdeveloped by Jacques Steyn (www.musicmarkup.info).

MML is more of a comprehensive framework for music markup than asingle DTD. Steyn has identified 12 modules for structuring andrepresenting music; the Time and Frequency modules form the essentialcore. All music objects and events are represented within these twomodules, with additional modules being layered in as necessary.According to Steyn, MML is “the only XML-based attempt todescribe a very large scope of the domain of music.”

While this approach runs the risk of turning MML into the next SMDL— too complex to be of any practical use — Steyn seems tobe adopting more of a philosophical goal than striving for a productimplementation. His intention is to frame and structure the discussionof music markup in order to develop a standard within the musiccommunity rather than within corporate labs. His efforts seem to bebearing fruit.

The Music Encoding Initiative (MEI), lead by Perry Roland of theUniversity of Virginia, is emerging as a contender XML standard (www.people.virginia.edu/~pdr4h/mei). The MEI is astepchild of the Text Encoding Initiative (TEI), a mature standardseffort revolving around the encoding and transmission of text. Like theTEI, the Music Encoding Initiative is a noncommercial endeavor, whichRoland says eliminates “the pressure to rush to shipment; insteadwe can slow down and develop a DTD that not only works right now buthas some longevity.”

Roland describes the MEI as being in the “testing andtweaking” phase. The MEI strives to meet the needs of a broadrange of music applications while focusing on a smaller set of featuresthan the full MML framework. It also avoids the obscurities of SMDL.(In the MEI, a note is called a note.) In addition, the MEItakes a more scholarly bent than many of its rivals by supportingfeatures most commonly associated with historical or critical editionsof music, such as commentary analysis.

While the MEI is “way past the proof-of-concept phase”in Roland's words, it has yet to move from the academic realm into acommercial product. MusicXML from Recordare (www.recordare.com)is one XML-for-music implementation that has made the leap.

MusicXML is without question the most mature effort so far in usingXML to encode music. Like the MEI, MusicXML has its roots in academiaas a direct descendant of MuseData and Humdrum. Company founder MichaelGood took the core functionality of those venerable programs and builtan XML implementation, choosing to support essential core features formusic encoding rather than comprehensive coverage. That focus hasresulted in an XML implementation that has found its way into more thana dozen commercial applications, including MakeMusic Finale.

Most recently, Recordare released Dolet for Sibelius Inc.'sSibelius. The software plug-in exports Sibelius 2.1 (and later) filesto MusicXML, serving as a sort of “Universal Translator”between Sibelius and any application that understands MusicXML. Withthe Dolet plug-in, programs like Finale, Igor Engraver, and Sibelius,which prior to MusicXML had been isolated by their proprietary formats,can exchange files much more accurately.

Image placeholder title

MusicXML is also finding its way into hardware applications, such asthe AMuseTec MuseBook Score (see Fig. 2). MusicXML is currentlyat version 0.8. Good plans to have it to the 1.0 level by the end ofthe year. At that point, Recordare intends to submit the DTD to OASIS(Organization for the Advancement of Structured Information Standards)for consideration to become a formally recognized XML standard formusic.


Each of the leading implementations of XML for music has focused onnotation and scoring, but that's a small sample of the possibleapplications for XML and music. Several applications taking advantageof XML for other music-related purposes are starting to emerge,including several music-analysis tools (see Fig. 3). Otherdevelopers, MML's Steyn among them, are working on ways to“sequence” a piece of music with no software beyond asimple text editor, such as Microsoft's Notepad.

Image placeholder title

One potential drawback, viewed as a strength by developers, is thatall of the currently available DTDs restrict themselves to CommonWestern Music Notation (CWMN or just CMN), with some also includingtablature. CMN focuses on European music from the period between 1700and 1950, which glosses over a large part of the world's music and itshistory. This is largely due to the dangers of trying to be all thingsto all people. Even with this limitation, XML holds remarkable promisefor the representation, interchange, and analysis of music.

Darin Stewartis a Chapman Stick player in the Portland,Oregon, area. He is also Director of Research Information Systems forOregon Health and Science University (stewarda@ohsu.edu).