I’m interested in learning about different applications of ResourceSpace for audiovisual digital preservation and collection management and wanted to explore PBCore XML data exports. Creating PBCore XML is possible in ResourceSpace, but it is dependent on each installation’s metadata field definitions and data model. Out of the box, ResourceSpace allows mapping of fields to Dublin Core fields only.
There was talk on an old thread on the ResourceSpace Google Group about the possibility of offering PBCore templates, or sets of predefined PBCore metadata fields because one doesn’t exist currently. I did not create KBOO’s archive management database with all possible PBCore metadata fields, instead it was important for me to allow KBOO to enter information in a streamlined, simplified format without all fields open for editing. I can imagine that having a template will restrict users to enter data a certain way, and may not offer the best flexibility for various organizations.
ResourceSpace data created is flat, so it exports to CSV in a nice, readable way but any hierarchical relationships (i.e. PBCore asset instantiation; essence track and child fields) need to be defined with the metadata mapping and xml export file.
I learned some important things when building off of code from the function “update_xml_metadump”:
- “Order by” metadata field order matters. Its easier to reuse this function if the order of metadata fields follows the PBCore element order and hierarchy.
- Entering/storing dates formatted as YYYY-MM-DD makes things easier. In ResourceSpace, I defined the date fields as text and put in tooltip notes for users to always enter dates as YYYY-MM-DD. I also defined a value filter. A value filter allows data entered and stored as YYYY-MM-DD to display in different ways, such as MM/DD/YYYY.
- It is important to finalize the use of all ResourceSpace tools (such as Exiftool, ffmpeg, staticsync) because this may affect use, display, and order of metadata fields.
- I was incredibly challenged to figure out the structure of data in the database and how the original function loops through, in order to loop appropriately to put the data in a hierarchical structure. My end result is from “might” and not necessarily “right” meaning someone with more advanced knowledge of ResourceSpace could probably make the php file cleaner. I ended up creating a separate function each time I needed special hierarchical sets of data, i.e. 1 function for the asset data, 1 function for the physical instantiation, 1 function for the preservation instantiation, etc. Each function is called based on an expected required data field. For example the preservation instantiation for loop will only run if a preservation filename exists.
- Overall, if you know what you’re looking at, you’ll notice that my solution is not scalable “as is” but hopefully this information provides ideas and tips on how to get your own PBCore XML export going in ResourceSpace.
The work done:
2. Created a new php file based on an the ResourceSpace function “update_xml_metadump” which exports XML data in their default template, and which also offers renaming tags mapped to Dublin Core tags.
3. Created a new php file to call the new pbcore xml metadump function, based on the existing /pages/tools/update_xml_metadump.php file
4. Ran the update file. XML files are exported to the /filestore directory.
Things are always in flux at KBOO, many times in order to improve the station. The digital program production practices change with newer software, hardware, or workflows to meet the constantly evolving standards and support KBOO’s radio programming into the future. For that reason, this brief diagram of current digital program production practices of live programs (i.e. how radio programming moves around, gets broadcast and archived) reflects the flow as of December 2016 with known changes coming down the road.
What goes into digitizing audio content?
Some of KBOO’s archival audio content sits on 1/4″ audio tape. The majority of the tape’s substrate is polyester base, with a few acetate base tapes. Both of these types of tape have their own challenges for preservation. Polyester base tape typically can present a “sticky-shed” syndrome that affects playback and quality of tape. It is reversible, but affects longevity of the tape. Acetate base tape can present “vinegar syndrome” which is a sign of substrate deterioration. Digital preservation is a method of preservation that requires its own care and handling of newly created materials. Files made of bits and bytes need to be managed to ensure that the file’s bit and byte structure remains fixed and is playable with audio software.
Audiovisual digitization vendors provide a service with specific details. I developed an RFP (request for proposals) for vendors that is specific to the digital preservation needs of KBOO for 1/4″ open reel audio items the archive collection. It was created using an RFP template from AV Preserve.
An RFP, or a RFI (request for information) is a chance for an audiovisual archive to ensure that what it wants will be included in the service at a particular cost, and allows the organization to budget appropriately. You will notice that our RFP includes sections on care and handling, appropriate storage at the vendor facility, documentation of metadata, sample rate and bit depth of audio files created, and a request for no sound processing, among other details. Digital preservation differs from the creation of a produced sound file. KBOO wishes to preserve the best quality of analog audio content as it is represented on the tape. So, if the audio was originally recorded with background noise or varying audio levels, these are kept in the preservation file. Often, keeping details in offer contextual information about how audio was created. For listening ease, lower quality mp3 files are created from the preservation wav file and an organization can determine how much editing it wishes to perform for these proxy or derivative files.
Sending reels out for specific and detailed work of digitization is only one step in KBOO’s workflow for open reel audio digital preservation. In brief, the workflow is presented below (two images). You’ll noticed that KBOO is continuously working on improving the information documented about audio items and managing it in an archive access system, and that it is a group effort! A collective wisdom of many KBOO volunteers is necessary to add more information to sparsely documented audio tapes.
At the end of the workflow, KBOO’s radio content will have moved from the physical substrate of the tape to a streaming file format that is accessible to researchers and patrons.
I am an archivist and I help libraries, archives, and museums increase access to their cultural records by building solutions with my digital technologies knowledge, creativity from the performing arts, and experience from my Master of Library and Information Sciences career path.
I studied Animal Sciences at the University of Illinois, worked in a science research lab and then a science education non-profit where I first applied my web and database skills. After 10+ years of professional dancing, I returned to graduate school to obtain my Master of Library and Information Sciences degree with a focus in archives management. I am currently an American Archive of Public Broadcasting National Digital Stewardship Residency (AAPB NDSR) fellow at KBOO Community Radio in Portland, Oregon.
I enjoy supporting dance and performing arts scholastic initiatives with my MLIS and archives knowledge. Please feel free to email me directly.
Early in our NDSR cohort chats, each resident agreed that our host site’s current metadata collecting and organizing practices could be improved. Many of us sought to find a database tool that could help our staff manage their data effectively.
KBOO is like many other organizations that keeps its archives information in multiple spreadsheets. Tapes that were digitized by a vendor were sent back with magnificent details on the digitization process—yet these details were not consolidated with the inventory. The original inventory did not document which items had digital files, or where the files were stored. File location information was not in any spreadsheet and had to be determined by asking staff. If anything happened to the preservation master files, there would have been no way to restore them or get them back, and perhaps nobody would have noticed. This is obviously a tragic example that nobody wants. It demonstrates what could happen if an archive doesn’t know how to maintain digital files.
Audiovisual archivist and technologist Dave Rice reminded me that the important work is for an archive to protect its media and metadata, regardless of a database system or not. Spreadsheets are perfectly acceptable since repositories of a range of sizes need solutions in a range of sizes. At KBOO, the spreadsheet system was not efficient. Excel doesn’t allow more than one person to edit the file at the same time to control versioning so new information would be created by volunteers and not folded into the master spreadsheet. The master record wasn’t kept up to date, and the proliferation of Excel sheets would have continued. Yes, Excel sheets work but KBOO’s use of them was not working. Google Sheets and Excel Online didn’t handle KBOO’s large single spreadsheet very well. Could an easy to use database encourage staff to control and protect its metadata? I kept researching for a system that could make things better.
Kara Van Malssen, Partner & Senior Consultant at AVPreserve, gave our first NDSR webinar (it was excellent!). In a follow-up, I asked for some advice in my database search and she suggested that I prepare for comparing database systems by creating business requirements, functional requirements, and use cases. This information is necessary whenever an organization is thinking of engaging a tech developer or vendor. Small repositories often don’t have the time or archival perspective to ask useful questions about how it wants its data to be stored and accessed. Some developer/vendors work with the organizations (at cost) to determine what the needs are. I suggested that KBOO could determine their needs in-house, it would take time, but no additional cost (contact me if you’re curious to see them).
So, after documenting the requirements, the question was “Is there something that does it all?”
My perspective is that it is possible to do almost everything with technology. So the answer is yes, multiple people and companies could develop the system. However, the requirements are only one thing an organization has to document. KBOO also needed to document the availability of financial resources (up-front and ongoing) and commitment of staff time and knowledge (up-front and ongoing) required for ongoing maintenance of a system.
This is where the database comparison work began. I decided to research and compare several open-source and non-open source systems that support audiovisual records and archival metadata: systems as column headers and KBOO requirements as row headers (contact me if you’re curious to see it). I also gathered notes about up-front and ongoing costs and reached out to staff at institutions to get comments about their experience installing and maintaining their chosen system—this relates directly to up-front and ongoing costs. For open-source I looked at the activity level of the user base in finding solutions.
KBOO’s immediate needs are for managing its audio metadata in-house, preparing records for public searching, and opening up the records so that multiple staff and volunteers can see what we have and fix records with correct or additional information. I proposed ResourceSpace (on a Bitnami stack) installed on a network server for in-house use for several reasons that fit KBOO’s needs:
* Records can be imported and exported to/from csv
* Different security/access levels for administrator and volunteer/staff data entry
* More than one person can view and edit at a time
* Batch upload of media can associate files with existing records
* Database fields can be defined based on my PBCore data model
* Installation and set up is easy to understand
* Great documentation and very active user group
* More reasons (contact me if you’re curious)
Time will tell if KBOO will use and maintain the system into the future. Exporting back to csv actually is the most important from my perspective—the data is still flat. I will only be at KBOO until the end of May for the NDSR program, and KBOO does not have an archivist. The set-up of ResourceSpace is uncomplicated enough to set up with clear directions (which I’m writing), and if anything happens, KBOO can always turn the data back into the Excel format it is familiar with. ResourceSpace can be used for other potential archiving needs as determined by KBOO and a future archivist, but for now I’m keeping it simple to meet their current requirements.
So can ResourceSpace do it all? Well, it can do the specific things KBOO needs it for. ResourceSpace is still just a tool that a human uses. As such, it is a tool that a human needs to understand and take care of. I said earlier that almost anything can be done with technology. Human work can’t be replaced with computers. But—human work can be simplified with technology and computers. I hope this tool encourages work to be done more easily by many people.
As part of the AAPB NDSR fellowship, we residents are afforded personal professional development time to identify and learn or practice skills that will help us in our careers. This is a benefit that I truly appreciate!
I had heard of the many programming and scripting languages like Perl, Python, Ruby, and bash but really didn’t know of examples that showed their power, or applicability. In the past month, I’ve learned that knowing the basics of programming will be essential in data curation, digital humanities, and all around data transformation and management. Librarians and archivists are aware of the vast amount of data: why not utilize programming to automate data handling? Data transformation is useful for mapping and preparing data between different systems, or just getting data in a clearer, easier to read format than what you are presented with.
At the Collections as Data Symposium, Harriett Green presented an update on the HTRC Digging Deeper, Reaching Further Project [presentation slides]. This project provides tools for librarians by creating a curriculum that includes programming in Python. This past spring, the workshops were piloted at Indiana University and University of Illinois–my alma mater :0). I can’t wait until the curriculum becomes more widely available so more librarians and archivists know the power of programming!
But even before the symposium, my NDSR cohort had been chatting about the amounts of data we need to wrangle, manage, and clean up in our different host sites. Lorena and Ashley had a Twitter conversation on Ruby that I caught wind of. Because of my current project at KBOO, I was interested in webscraping to collect data presented on HTML pages. Webscraping can be achieved by both Python and Ruby. My arbitrary decision to learn Ruby over Python is probably based on the gentle, satisfying sound of the name. I was told that Python is the necessary language for natural language processing. But since my needs were focused on parsing html, and a separate interest in learning how Ruby on Rails functioned, I went with Ruby. Webscraping requires an understanding of the command line and HTML.
- I installed Ruby on my Mac with Homebrew.
- I installed Nokogiri.
- I followed a few tutorials before I realized I had to read more about the fundamentals of the language.
I started with this online tutorial, but there are many other readings to get started in Ruby. Learning the fundamentals of the Ruby language included hands-on programming and following basic examples. After learning the basics of the language, it was time for me to start thinking in the logic of Ruby to compose my script.
As a newbie Ruby programmer, I learned that there is a lot I don’t know, there are better and more sophisticated ways to program if I know more, but I can get results now while learning along the way. For example, another way data sets can be manipulated in Ruby is by creating a hash of values. I decided to keep going with the array in my example.
So, what did I want to do? There is a set of program data across multiple html pages that I would like to look at in one spreadsheet. The abstract of my Ruby script in colloquial terms is something like this:
- Go to http://kboo.fm/program and collect all the links to radio programs.
- Open each radio program html page from that collection and pull out the program name, program status, short description, summary, hosts, and topics.
- Export each radio program’s data as a spreadsheet row next to its url, with each piece of information in a separate column with a column header.
The script takes about a minute to run through all 151 rows, and I’m not sure if that’s the appropriate amount of time for it to take. I also read that when webscraping, one should space out the server requests or the server may blacklist you–there are ethics to webscraping. I also noticed that I could clean up the array within array data: the host names, topics, and genres still have surrouding brackets around the array.
It took me a while to learn each part of this, and I also used parts of other people’s scripts similar to my need. It also showed me that it takes a lot of trial and error. However, it also showed me that I could work with the logic and figure it out!
There is a lot of data on web pages, and webscraping basics with a programming language like Ruby can help retrieve items of interest and order them into nice csv files, or transform and manipulate them in various ways. Next, I think I can create a similar script that lists all of a program’s episodes with pertinent metadata that can be mapped to a PBCore data model, i.e. episode title, air date, episode description, and audio file location.
Please share any comments you have!
I haven’t been to a live performance in a very long time, and it was apparent when I entered Royce Hall. Seeing the stage and house set up with lights, various instruments and a round raised platform on stage, I took a long drink in from my surroundings, thoroughly letting my soul be quenched. It’s a cliche phrase, but it’s completely true. The same way one may receive a sense of solemn ritual by entering a church, I too have a sensitivity to the theater–I have come to know that I may experience something innovative or profound. Even when the world around me may make the thrill of movies or feats of technology tools commonplace, I still hold my moment with world-class talented performers in high regard as a one-time, fleeting affair. Continue reading