What is metadata?

This is something I wrote for KBOO Community Radio volunteer hosts and programmers to allow understanding of how metadata is used in their digital preservation and archiving work, as part of the AAPB NDSR program.

Erin mentioned that the KBOO community has questions as to what metadata is. Simply put,

Metadata is a set of data that describes and gives information about other data.

metadataMetadata is information. That cup on the table? It’s red, it’s ceramic, it belongs to Alex, it was bought last week but was made long before that, when, we aren’t sure. These are all things that provide information about the cup. People who enter and manage metadata document the most important pieces of information about an item, depending on who they perceive will be looking for or learning about the item. At KBOO, like other libraries and archives, metadata is entered and structured in a specific way to be human and machine readable.

A human readable piece of metadata is a notes field that combines all the descriptive information we just discovered about that cup. “This cup is red, made of ceramic, it belongs to Alex, it was bought last week but was made long before that, when we aren’t sure.”

A machine can’t understand the contents of that notes field. How does metadata become machine readable? The first step is to follow a metadata schema. In the field of information science, different metadata schemas have been developed, each specific to certain kinds of data. Schemas provide definitions and meanings to metadata fields. At KBOO, the PBCore metadata schema is very useful. It was developed specifically for organizing and managing public broadcasting data elements, and makes sense for time-based media like audio and video. People managing information about print books don’t need metadata fields for duration or generations. People managing audio metadata would want to document the duration of the content, and whether the content is the edited version or broadcast version with promos: these fields are included in PBCore.


Image by Stephen J. Miller; I added PBCore to the mix. His web page on metadata resources is also excellent: https://people.uwm.edu/mll/metadata-resources/

PBCore is a national metadata standard. Specific definitions and rules ensure that content entered by different institutions in the same way can be shared. PBCore data held in an  XML document allows data to be transmitted across a number of fields and disciplines. It is both human and machine readable. In the absence of XML, many organizations use spreadsheets that can be transformed or edited to work with various systems.

It is KBOO’s intent, at the end of the NDSR program, to upload a first batch of audio content and metadata to the American Archive of Public Broadcasting (AAPB). The AAPB’s metadata management system is a complex hierarchical database, and data must be formatted in a specific way. When metadata fields for KBOO’s audio metadata are formatted using the PBCore schema, multiple records can be uploaded into the AAPB’s system in a csv file. This takes advantage of the machine-readable definitions in the schema. If the metadata was not machine readable, a person would enter information manually: each metadata value for every field, for every record. Using standards and metadata schemas allows an archivist to let computers do the heavy-lifting.

Here are some examples of PBCore metadata fields:

pbcoreTitle is a name or label relevant to the asset.

Best practices: There may be many types of titles an asset may have, such as a series title, episode title, segment title, or project title, therefore the element is repeatable. Usage: required, repeatable.

Sensible and understandable, right? Here’s another:

essenceTrackSamplingRate measures how often data is sampled when information from the audio portion from an instantiation is digitized. For a digital audio signal, the sampling rate is measured in kilohertz and is an indicator of the perceived playback quality of the media item (the higher the sampling rate, the greater the fidelity). Usage: optional, not repeatable.

If you are a KBOO program host, you have encountered sample rate without knowing it. The autoarchive mp3 file that magically shows up on your episode page is encoded at a sample rate of 44.1 kHz. Audio archives keep track of the quality and type of digital files it has collected. Digitizing from open reel guidelines are 96kHz sample rate at a bit depth of 24.

Right now I’m keeping 43 fields of information for each audio item. A handful of these fields will become defunct, once the data in them is reviewed and unique metadata is moved into more appropriate fields. Erin and I decided that we would require a minimum of six fields for physical items: unique identifier, title, date, date type, format, and rights statement. With this minimum amount of information, KBOO would be able to know what an item is and what it can do with it. All metadata fields are important, but if we made all of them required, it would slow down the cataloging process due to unknown information or long research periods. Examples of non-required fields are: publisher, subject, contributor names. Thirteen fields relate to the digital object, once it becomes created. The AAPB’s system requires 13 metadata fields to be filled.

There is so much to discuss about metadata, if you have any questions, you can email them to me at selena@kboo.org or tweet them at @selena_sjsu.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s