Let’s talk about metadata

When we talk about digital music, we use “metadata” to mean the artist, title, and other information that comes with (or is attached to) a recording.

I’ve spent a big chunk of this week on several projects involving classical metadata. Since “the right way to tag classical music” has been an ongoing theme for my entire career, I thought I’d share a few lessons I’ve learned.


1) There is no perfect system

We live in an age where I’ll regularly rely on atomic clocks flying thorugh space at 8,700mph to remind me where I parked my car (this is how GPS works), so you’d think we could build a database to coherently store music, right?

Well, we can, but there are limits to categorisation. When you set about designing a database to store music, you start with the obvious things: you’ll need a list of people, and a list of pieces of music, and perhaps a list of albums or tracks. Then you start joining them up. You tell the database what people did on each recording, so the computer knows who was the composer and who was the conductor. It all starts to look easy, so you decide to store a bit more information. You categorise pieces, add some dates, maybe biographies and sleeve notes.

Then you start to stumble upon exceptions:

  • Walter Carlos became Wendy Carlos half way through her career. Are they different people in the database, or do you build a system for people who change their names? Either way, you can handle Cat Stevens’ switch to Yusuf Islam, but you might struggle with the period when Prince changed his name to an unpronounceable symbol, or with an artist who uses different names for different styles of music.
  • Stravinsky revised The Firebird several times. These aren’t quite different pieces of music, but nor are they the same. Do we build a system to handle versions of the same thing?
  • How much input can an arranger have before they’re effectively the composer? Where do you credit the author of a cadenza? Can the rules you apply to classical music also work with jazz?
  • The movements of a Mozart symphony are all easy to identify, but Puccini operas are through-composed, and different recordings chop them up in different places. How do you handle that when identifying recordings of the same aria? What about excerpts from operas that have had final cadences added? Are they arrangements? Who arranged them?
  • Overtures are pretty straightforward. We know they go on the beginning of operas. And plays. Except sometimes they don’t. Do we call concert overtures something different?

Anybody who has ever interviewed a band has heard a phrase like “It’s not easy to categorise our music”. Musicians almost seem to conspire against us here: any time a boundary appears between two distinct traditions, it’s only a matter of time before somebody comes along to knock it down. I swear they do it on purpose.


2) The better a system fits, the more complex it is

There are some pretty complicated taxonomies for classical music. Every so often, a well-intentioned and sometimes well-funded utopian scheme emerges to build a database of every song ever written. Many fall at the first question: “what’s a song?”

If your system allows for all the exceptions above, and the hundreds more that will crop up when you add ten million tracks to it, it’ll be pretty complicated.

That’s fine if it’s self-contained, like Naxos Music Library. We control the data coming in (we enter it ourselves, and it’s a big job) and we control the software that displays it (we wrote that too). What happens, though, when you don’t control it all?


3) The more complex a system is, the less interoperable it is

Once you let your data out into the big wide world, you can’t control how other people will use it. No matter how beautifully tagged your tracks are in your software, they might look terrible in somebody else’s. The way we tag MP3 and AAC files for downloads would allow us to add extra fields for “conductor” and “instrument” and “soloist” and “catalogue number”, but what would be the point? We have no idea how, or even if, they might be displayed on a customer’s computer, because we don’t write the software that does that. We could write software that did that, but it would be expensive, and wouldn’t necessarily work with the other music our customers own.

Devising a data scheme that provides detail but is robust to omissions is a major challenge, and all workable solutions inevitably include a certain degree of duplication.


4) It’s tempting to fiddle with the fields, but this is risky

Several people have told me they’ve found the solution to classical tagging in digital collections, and then gone on to explain some variation on “use the album field for the work title, use the artist field for the composer and use the composer field for the artist”. A complex version of this scheme is explained in detail here.

If you apply this system to your entire collection, you’ll notice several things.

  • It takes a really long time to do it
  • It works really nicely on old iPods
  • It makes a total mess of the menus on a new iPod
  • It doesn’t play nicely with the non-classical stuff in your collection
  • You have to edit every single bit of data on every single track you add to your library

That’s fine for a small collection, but it quickly becomes unwieldy. It’s also a hack: you’re using it for something it wasn’t designed for. Sometimes, that’s fine. In my office, I use binder clips to turn the edge of the suspended ceiling into a picture rail. This is pretty low-risk, because they don’t bring out a new version of the suspended ceiling every few months, and even if they did, it wouldn’t take me long to think of a new way to hang my pictures.

What about all this data? What if iTunes or Windows Media Player or your smartphone gets an update that improves the way classical data is displayed? When they finally fix this, the software’s going to need all the data in the right fields. Then you’ll have to enter it all again.


5) Here’s how most people do it, and why

So what should you do? Well, the future-proof method is to use the fields for what they’re basically intended for, in a way that’s consistent with the way the same data is handled by the people who invested the most in it. iTunes isn’t going to re-tag 15 million tracks just to update their user interface, so at this point, we have an established convention.

Name/Track Title/Song

This is for all the information about the piece of music (except who wrote it). When we deliver something to iTunes, they demand:

Work Title, Catalog Number: Movement Number. Movement Title

So thats:

Symphony No. 5 in C minor, Op. 67: I. Allegro con brio

It’s long, but it’s all there. One day, somebody is going to realise that everything before the colon is the work title, and present it that way wherever the tracks are listed together. That colon is the future of your data.


This is for the performers. When we put it in a database, we use different fields for the artists. It would be nice if we could tag all the artists independently on downloads, too, but that’s not how any mainstream application reads the data, so we put them in a comma-separated list:

Takako Nishizaki, Stephen Gunzenhauser & Capella Istropolitana

There isn’t an established convention on ordering, so this is really up to you. I suggest putting the most important one first, since it’s unlikely you’ll be able to see the full list all the time when you’re browsing your library.

iTunes and Windows Media Player both support a composer field, so we use that. In iTunes, you can ensure the composer field is displayed in your library by going to View > View Options and checking the “Composer” box.


Here are some of the many options I’ve seen:

  • Bach
  • Bach, J.S.
  • Bach, JS
  • Bach, Johann Sebastian
  • Bach
  • Bach, Johann S
  • BACH
  • J.S. Bach
  • Johann Sebastian Bach

None of these is perfect. We identify some composers by their last name (Haydn), some their initials and last name (J.S. Bach) and some by their full names (Philip Glass). Sometimes all this isn’t enough, and (as with the Strausses) we have to number them as well. Some names are transliterated differently in different countries, so two albums from different places might not even agree on spellings. It’s all maddening.

So what do you do?

We use the most common form of their full name, and that’s what most stores do as well. Anything else is going to drive you crazy in the long run, especially if you try to browse your collection by composer.

Album Title

The album title is important because it generally contains the composer names, which are otherwise not visible on most mobile devices. SInce most modern jukebox applications will also show you the album art, the combination of the two should give you a pretty clear idea what you’re listening to, regardless of the other data you can see.

If you tag your music this way, you will be able to see everything you need. Your main complaint with most interfaces will be that they don’t show you enough of each field. If the entire classical music community should get together and ask for something from the tech world, it should be this: “Make sure we can always see the composers*, and give us support for long fields. We can figure everything else out ourselves”.


6) While imperfect, it’s still easier to find music on a computer than in a record store

If you’ve followed the steps in (5) above, then you should always be able to work out what you’re listening to, and it should always be possible to find the recording you have in mind.

The trick, here, though is to rely on search, not browse.

In a closed, controlled environment like the Naxos Music Library database, we can be sure we always called composers the same thing, and we consistently applied naming conventions. That means we can give you a nifty browse interface where you can pick the names of artists, composers and genres from a list.

Even with fairly nice data, it just doesn’t work this well once the music has arrived in your library. The artists are all joined-up, and you can rarely sort the track titles by anything useful. Any inconsistencies in composers’ names makes it a total mess.

While a lot of people are working hard to find a widely-accessible solution to this problem, I suggest that, in the meantime, we don’t allow our enjoyment of digital music to depend on them finding an answer any time soon.

When I want to listen to something on my computer, I only ever use search. Even the simple “and-contains” search in iTunes* allows you to make quite specific requests very quickly. While this can be a bit clunky when you’re looking for something among the store’s millions of tracks, it’s an extremely effective way to find things in your own collection, even if it is very large. Type “Mozart” and you’ve got a list of everything with Mozart somewhere in it. Add “Symphony” and you’ve got everything with Mozart AND Symphony somewhere in it. This could include the odd overture, but it has excluded 99% of the irrelevant music, and given you a manageable list. Add the name of an artist, and you’re right there, at the recording you were looking for.

I have a fairly large CD collection at home, shelved alphabetically (by composer and artist). It’s about the size of a small specialist record shop. I’ve taken great care to be systematic about shelving music, because I know that, otherwise, I may never find it again. On my computer, meanwhile, I’ve been sloppy with data. The music just arrived too fast for me to spend much time on it. Still, there is no way I could find a CD on the shelf in the time I can find it on my computer.

This makes me wonder: when people complain that metadata is a serious barrier to downloading classical music, what, exactly, are they comparing it to?

* Spotify doesn’t show you the composers. Just try to find a specific classical recording on Spotify, and you’ll quickly see how maddening this is: the content is all there. You just can’t sort through it. Spotify is, though, a relatively young company, and I think they’ll probably fix this in time. If you want a really good classical streaming experience, you might prefer to use Naxos Music Library or Classics Online.

Categories: Tips


  • Joe Shelby says:

    I primarily use iTunes but also have written my own manager to work in additional fields. I use the composer field, leave the soloists in the comment field. The artist is “Conductor Full Name; Orchestra/Ensemble”. The album title is the album title (which may or may not include the composer, usually doesn’t if a variety). The composer field is *just* the last name (with identifying additions if needed, e.g., the aforementioned Strausses – Richard is the only one without a first initial). The conductor is replicated in the conductor field as just the last name (again with add-ons if needed).

    In addition (not mentioned here) I use the “grouping” field to associate pieces together – e.g., each movement of Beethoven’s 5th is “Symphony 5″. If there’s an additional subtitle, it goes here in single quotes, “Sympony 3 ‘Eroica’”. This way my software can assemble m3u playlists grouping things together for a single-click to trigger playing the whole set in order (the playlist title ends up “Composer (last name) – Grouping (the work) – Conductor (last name).m3u”.

    • Andy Doe says:

      Ah yes. Grouping. I didn’t mention this because (1) it’s iTunes-specific and (2) it’s not very widely used, but it can be very helpful. Where the iTunes store has the data for works, they deliver this in the grouping field, but it’s also used for other types of grouping – it you buy “The Complete Depeche Mode”, I think the individual albums are grouped in this way.

      If you use iTunes, you should have an automatic playlist called “Classical Music” which utilises groupings to let you browse all the recordings of a single piece. Try adding the same “grouping” to multiple recordings of the same work, and see what this looks like. If you had all the data, it could be rather handy.

  • Joe Shelby says:

    Where things start to get weird is in the genre. I don’t like just tagging everything “Classical” as it wrongly assumes that Teleman belongs side-by-side with Takemitsu.

    Similarly, with Naxos’s large selection of (mostly excellent) 20th century American works, there’s a need for me to separate early-mid 20th century tonal/Romantic American (Ives/Piston/Copland) from the later works that are more atonal or post-serialism. My musical mood isn’t always for a particular composer so much as for a particular era and style…but then as noted above, such a division leaves someone like William Schuman in an odd spot as he straddles both sides of such a dividing line. (side note, I love the Schwartz cycle you’ve released over the last few years.)

    Usually in this case I just pick one side or the other and adapt.