I help libraries, archives, and museums manage and utilize their cultural records by building solutions with my digital technologies and performing arts experience, and my Master of Library and Information Sciences degree.

Prior to my archives career, I applied my Animal Sciences degree in a science research lab and a science education nonprofit, received web and database develoment contracts, and performed in theaters around the world as a professional dancer. For current information, please see https://www.linkedin.com/in/selenachau/

I value holistic wellness, individual empowerment, creativity, and adventure. Please feel free to email me directly.

KBOO’s Edit-a-Thon and Reunion: Sustainable archiving through outreach in a community radio station

KBOO Community Radio has nearly 50 years of radio broadcast audio that represents news and perspectives less commonly heard on mainstream media. Its collection has been developed through individuals’ personal responsibility and action. Older recordings exist because they were recorded from on-air broadcast by KBOO’s long time news and public affairs director, or by board operators during shows. When I work with the older materials, I have questions that go unanswered by my contemporaries but are necessary to document the content, date it, and know who created it. I wanted to get volunteers in a room so that we could find answers to these questions!


Photo courtesy of Sylvia Podwika

KBOO is a community radio station, which means volunteers run the station, with long-time volunteer hosts producing shows, training new volunteers, and being part of station committees. However, there wasn’t an existing cadre of archives volunteers. I developed tools and workflows for digital preservation and recruited two MLIS students to help me get the project up and running, but there still is no guarantee that this work will continue. The sustainability of archiving projects requires visibility and understanding in order to maintain levels of support in staffing, resources, and technical infrastructure for continued work. To increase visibility and understanding of current archives work, I combined archives outreach with volunteer engagement and organized KBOO’s Archives Database Edit-a-Thon and Reunion event. The goal was to have past KBOO hosts come back and share their knowledge of KBOO news and public affairs programming while volunteers who value documentation (librarians and archivists) help get these details into our archives management database.


Photo courtesy of Sylvia Podwika

I planned two structured activities, and also knew that we wanted to ask specific attendees targeted questions about the content they were familiar with. One of the activities was a fill-in-the blanks style task to decipher shorthand and written notes on the audio cassette to fill in specific fields in the archive database such as title, description, date, date type, contributor, and written notes on casing. Volunteers could also take photos of the casing to add to the database, or listen to audio. The second activity was listening to mystery content, i.e. a speaker at an event, with no speaker name, location, or date recorded. Some of these examples seem almost impossible but are useful to educate people on why certain metadata fields are important for search and discovery. And if someone recognizes a voice, then the mystery can be solved. It takes the right combination of people with unique knowledge to identify unique items. Our database supports “proposed changes” so that people didn’t have to be highly trained on metadata requirements. I gave basic instructions and knew that any proposed change could be understood in post-event editing. The information received, once reviewed, becomes part of the record of our station’s content.

Results? It worked! We found that success was had by:

  • Clearly defining outcomes and expectations
  • Staying flexible for people to work on their own, at their own pace, or with help
  • KBOO being KBOO: there is already a spirit of volunteerism in this community
  • Having KBOO people from its different decades available to provide insight: the 1960s, 1970s, 1980s, 1990s through current day hosts.
  • Having dedicated helpers
  • Donuts and coffee, pizza and seltzer water


  • People had fun!
  • Mysteries were solved!
  • There were reunion moments: hugs and photos
  • Everyone learned more about what KBOO is doing with their news and public affairs audio collection and contributed in a hands-on way
  • Individuals asked how they could get more involved with archives work

An archives event like this is replicable and valuable for other public broadcasting stations or archives. Each station will have to define outcomes and make a pathway for success for themselves and their participants based on how their archive is set up. For our Edit-a-Thon event, volunteers responded to making tangible archives contributions and understanding the real day-to-day work of archiving.

Lessons learned
Tech preparation was important. Our database is on our network server, inaccessible online and our largest event space doesn’t have computers in it. We bought a wifi access point and had our IT contractor set up a temporary secure wireless internet address so that participants could work with the database for the day. Receiving RSVPs of people with laptops was important, and having extra laptops was important. Even so, we didn’t have enough computer spaces and had to start showing people how to do the work on computers in the hallways, which took away from the intended group feel of the event.


We referenced old listener guides, too

When promoting the event, it was difficult to describe an edit-a-thon, which is not surprising considering that the work I’m doing doesn’t come into the view of what most people think or know about archives. I had flyers, blurbs in newsletters, and on-air promos but still had people contact me for clarification. The short, engaging description of the event could be better.

People didn’t just come. They were asked multiple times by email and in person, called directly, and may have felt an obligation to come. Recruiting the right people was the most important part of this event and took the most time. I worked with KBOO’s volunteer coordinator and only contacted people who had left on good terms. KBOO’s program director also had suggestions of people who would hold a great amount of knowledge, such as former station managers. I also targeted individuals whose content was in the database but with important information missing. Although it was appropriate for the general public, participants self-selected based on their existing interest in archives: I asked for help posting in Northwest and Portland area archives calendars. Although I was only expecting library/archivist types and former KBOO staff and volunteers, we had a radio listener come in as well. We had 21 RSVP-ed participants for the 6 hour event, which we broke into two shifts: 10am-1pm and 1pm-4pm. Three of us were dedicated helpers/planners. Some people stayed for more than one shift, and there were additional people who stopped by out of curiosity with whom I was able to talk about archives and invite to buddy up with event participants doing work.

IMG_20170325_163201 (1).jpg

What is CFH and MAK? Some mysteries solved

We didn’t schedule in breaks. Breaks would definitely be a requirement in future event plans so facilitators could be reminded to take a moment. It took a lot of energy to be “on” for over six hours. In our debrief, we talked about the benefit of planned share-out sessions for the entire group in future events, where we collectively would take a break to share some fun discoveries.

There were definitely one-time participants as well, but KBOO now has new archives volunteers (and new-old volunteers) eager to sustain archives work.

Archive work is not known, yet the sustainability of archiving projects requires visibility and understanding in order to maintain levels of support in staffing, resources, and technical infrastructure for continued work.


Moving Beyond the Allegory of the Lone Digital Archivist (& my day of Windows scripting at KBOO)

The “lone arranger” was a term I learned in my library sciences degree program and I accepted it. I visualized hard-working, problem-solving solo archivists in small staff-situations challenged with organizing, preserving, and providing access to the growing volumes of historically and culturally relevant materials that could be used by researchers. As much as the archives profession is about facilitating a deep relationship between researchers and records, this term described professionals, myself to be one of them, working alone and with known limitations. This reality has encouraged archivists without a team to band together and be creative about forming networks of professional support. The Society of American Archivists (SAA) has organized support for lone arrangers since 1999, and now has a full-fledged Roundtable for professionals to meet and discuss their challenges. Similarly, support for the lone digital archivist was the topic of a presentation I heard at the recent 2017 Code4Lib conference held at UCLA by Elvia Arroyo-Ramirez, Kelly Bolding, and Faith Charlton of Princeton University.

Managing the digital record is a challenge that requires more attention, knowledge sharing, and training in the profession. At Code4Lib, digital archivists talked about how archivists in their teams did not know how to process born-digital works, that this was a challenge, but more than that unacceptable in this day and age. It was pointed out that our degree programs didn’t offer the same support for digital archiving compared to processing archival manuscripts and other ephemera. The NDSR program aims close the gap on digital archiving and preservation, and the SAA has a Digital Archives Specialist credential program, but technology training in libraries and archives shouldn’t be limited to the few who are motivated to seek out this training. Many jobs for archivists will be in a small or medium-sized organizations, and we argued that processing born-digital works should always be considered part of archival responsibilities. Again, this was a conversation among proponents of digital archives work, and I recognize that it excludes many other thoughts and perspectives. The discussion would be more fruitful by including individuals who may feel there is a block to their learning and development to process born-digital records, and to focus the discussion on learning how to break down these barriers.

Code4Lib sessions (http://bit.ly/d-team-values, http://scottwhyoung.com/talks/participatory-design-code4lib-2017/) reinforced values of the library and archives profession, namely advocacy and empowering users. No matter how specialized an archival process is, digital or not, there is always a need to be able to talk about the work to people who know very little about archiving, whether they are stakeholders, potential funders, community members, or new team members. Advocacy is usually associated with external relations, but is an approach that can be taken when introducing colleagues to technology skills within our library and archives teams. Many sessions at Code4Lib were highly technical, yet the conversation always circled back to helping the users and staying in touch with humanity. When I say highly technical, I do not mean “scary.” Another session reminded us that technology can often cause anxiety, and can be misinterpreted as something that can solve all problems. When we talk to people, we should let them know what technology can do, and what it can’t do. The reality is that technology knowledge is attainable and shouldn’t be feared. It cannot solve all work challenges but having a new skill set and understanding of technology can help us reach some solutions. It can be a holistic process as well. The framing of challenges is a human-defined model, and finding ways to meet the challenges will also be human driven. People will always brainstorm their best solutions with the tools and knowledge they have available to them—so let’s add digital archiving and preservation tools and knowledge to the mix.

And the Windows scripting part?

I was originally going to write about my checksum validation process on Windows, without Python, and then I went to Code4Lib which was inspiring and thought-provoking. In the distributed cohort model I am a lone archivist if you frame your perspective around my host organization. But, I primarily draw my knowledge from my awesome cohort members and my growing professional network I connected with on Twitter (Who knew? Not me.). So I am not a lone archivist in this expanded view. When I was challenged to validate a large number of checksums without the ability to install new programs to my work computer, I asked for help from my colleagues. So below is my abridged process where you can discover how I was helped through an unknown process with a workable solution using not only my ideas, but ideas from my colleagues. Or scroll all the way down for “Just the solution.”

KBOO recently received files back from a vendor who digitized some of our open-reel content. Hooray! Like any good post-digitization work, ours had to start with verification of the files, and this meant validating checksum hash values. Follow me on my journey through my day of Powershell and Windows command line.

Our deliverables included a preservation wav, mezzanine wav, and mp3 access file, plus related jpgs of the items, an xml file, and a md5 sidecar for each audio file. The audio filenames followed our filenaming convention which was designated in advance, and files related to a physical item were in a folder with the same naming convention.

Md5deep can verify file hashes with two reports created with the program, but I had to make some changes to the format of the checksum data before they could be compared.

Can md5deep run recursively through folders? Yes, and it can recursively compare everything in a directory (and subdirectories) against a manifest.

Can md5deep selectively run on just .wav files? Not that I know of, so I’ll ask some people.

Twitter & Slack inquiry: Hey, do you have a batch process that runs on designated files recursively?

Response: You’d have to employ some additional software or commands like [some unix example]

@Nkrabben: Windows or Unix? How about Checksumthing?

Me: Windows, and I can’t install new programs, including Python at the moment

@private_zero: hey! I’ve done something similar but not on Windows. But, try this Powershell script that combines all sidecar files into one text file. And by the way, remember to sort the lines in the file so they match the sort of the other file you’re comparing it to.

Me: Awesome! When I make adjustments for my particular situation, it works like a charm. Can powershell scripts be given a clickable icon to run easily like windows batch files in my work setup where I can’t install new things?

Answer: Don’t know… [Update: create a file with extension .ps1 and call that file from a .bat file]

@kieranjol: Hey! If you run this md5deep command it should run just on wav files.

Me: Hm, tried it but doesn’t seem like md5deep is set up to run with that combination of Windows parameters.

@private_zero: I tried running a command, seems like md5deep works recursively but not picking out just wav files. Additional filter needed.

My afternoon of Powershell and command line: Research on FC (file compare), sort, and ways to remove characters in text files (the vendor file had an asterisk in front of every file name in their sidecar files that needed to be removed to match the output of an md5deep report).

??? moments:

Turns out using powershell forces output as UTF-8 BOM as compared to ascii/’plain’ utf output of md5deep text files. Needed to be resolved before comparing files.

The md5deep output that I created listed names only and not paths, but that left space characters at the end of lines! That needed to be stripped out before comparing files.

I tried to perform the same function of the powershell script in windows command line but was hitting walls so went ahead with my solution of mixing powershell and command line commands.

After I got 6 individual commands to run, I combined the Powershell ones and the Windows command line ones, and here is my process for validating checksums:

Just the solution:

It’s messy, yes, and there are better and cleaner ways to do this! I recently learned about this shell scripting guide that advocates for versioning, code reviews, continuous integration, static code analysis, and testing of shell scripts. https://dev.to/thiht/shell-scripts-matter

Create one big list of md5 hashes from vendor’s individual sidecar files using Powershell
–only include the preservation wav md5 sidecar files, look for them recursively through the directory structure, then sort them alphabetically. The combined file is named mediapreserve_20170302.txt. Remove the asterisk (vendor formatting) so that the text file matches the format of an md5deep output file. After removing asterisk, the vendor md5 hash values will be in the vendormd5edited.txt file.

open powershell

nav to new temp folder with vendor files

dir .\* -exclude *_mezz.wav.md5,*.xml,*.mp3, *.mp3.md5,*.wav,*_mezz.wav,*.jpg,*.txt,*.bat -rec | gc | out-file -Encoding ASCII .\vendormd5.txt; Get-ChildItem -Recurse A:\mediapreserve_20170302 -Exclude *_mezz.wav.md5,*.xml,*.mp3, *.mp3.md5,*.wav.md5,*_mezz.wav,*.jpg,*.bat,*.txt | where { !$_.PSisContainer } | Sort-Object name | Select FullName | ft -hidetableheaders | Out-File -Encoding “UTF8” A:\mediapreserve_20170302\mediapreserve_20170302.txt; (Get-Content A:\mediapreserve_20170302\vendormd5.txt) | ForEach-Object { $_ -replace ‘\*’ } | set-content -encoding ascii A:\mediapreserve_20170302\vendormd5edited.txt

Create my md5 hashes to compare to vendor’s
–run md5deep on txt list of wav files from inside the temp folder using Windows command line (Will take a long time to hash multiple wav files)

“A:\md5deep-4.3\md5deep.exe” -ebf mediapreserve_20170302.txt >> md5.txt

Within my new md5 value list text file, sort my md5 hashes alphabetically and trim the end space characters to match the format of the vendor checksum file. Then, compare my text file containing hashes with the file containing vendor hashes.
–I put in pauses to make sure the previous commands completed, and so I could follow the order of commands.

run combined-commands.bat batch file (which includes):

sort md5.txt /+34 /o md5sorteddata.txt

timeout /t 2 /nobreak

@echo off > md5sorteddata_1.txt & setLocal enableDELAYedeXpansioN
for /f “tokens=1* delims=]” %%a in (‘find /N /V “” ^<md5sorteddata.txt’) do ( SET “str=%%b” for /l %%i in (1,1,100) do if “!str:~-1!”==” ” set “str=!str:~0,-1!” >>md5sorteddata_1.txt SET /P “l=!str!”>md5sorteddata_1.txt echo.

timeout /t 5 /nobreak

fc /c A:\mediapreserve_20170302\vendormd5edited.txt A:\mediapreserve_20170302\md5sorteddata_1.txt

The two files are the same, so all data within it matches, therefore, all checksums match. So, we’ve verified the integrity and authenticity of files transferred successfully to our server from the vendor.

Before & after XML to PBCore in ResourceSpace

I’m interested in learning about different applications of ResourceSpace for audiovisual digital preservation and collection management and wanted to explore PBCore XML data exports. Creating PBCore XML is possible in ResourceSpace, but it is dependent on each installation’s metadata field definitions and data model. Out of the box, ResourceSpace allows mapping of fields to Dublin Core fields only.


Before: default XML file created in ResourceSpace


After: PBCore XML formatting for data fields

There was talk on an old thread on the ResourceSpace Google Group about the possibility of offering PBCore templates, or sets of predefined PBCore metadata fields because one doesn’t exist currently. I did not create KBOO’s archive management database with all possible PBCore metadata fields, instead it was important for me to allow KBOO to enter information in a streamlined, simplified format without all fields open for editing. I can imagine that having a template will restrict users to enter data a certain way, and may not offer the best flexibility for various organizations.

ResourceSpace data created is flat, so it exports to CSV in a nice, readable way but any hierarchical relationships (i.e. PBCore asset instantiation; essence track and child fields) need to be defined with the metadata mapping and xml export file.

I learned some important things when building off of code from the function “update_xml_metadump”:

  • “Order by” metadata field order matters. Its easier to reuse this function if the order of metadata fields follows the PBCore element order and hierarchy.
  • Entering/storing dates formatted as YYYY-MM-DD makes things easier. In ResourceSpace, I defined the date fields as text and put in tooltip notes for users to always enter dates as YYYY-MM-DD. I also defined a value filter. A value filter allows data entered and stored as YYYY-MM-DD to display in different ways, such as MM/DD/YYYY.
  • It is important to finalize the use of all ResourceSpace tools (such as Exiftool, ffmpeg, staticsync) because this may affect use, display, and order of metadata fields.
  • I was incredibly challenged to figure out the structure of data in the database and how the original function loops through, in order to loop appropriately to put the data in a hierarchical structure. My end result is from “might” and not necessarily “right” meaning someone with more advanced knowledge of ResourceSpace could probably make the php file cleaner.  I ended up creating a separate function each time I needed special hierarchical sets of data, i.e. 1 function for the asset data, 1 function for the physical instantiation, 1 function for the preservation instantiation, etc. Each function is called based on an expected required data field. For example the preservation instantiation for loop will only run if a preservation filename exists.
  • Overall, if you know what you’re looking at, you’ll notice that my solution is not scalable “as is” but hopefully this information provides ideas and tips on how to get your own PBCore XML export going in ResourceSpace.

The work done:

1. Reviewed metadata field information and defined metadata mapping definitions in the config.php file

Screen Shot 2017-01-23 at 11.32.25 AM.png

2. Created a new php file based on an the ResourceSpace function “update_xml_metadump” which exports XML data in their default template, and which also offers renaming tags mapped to Dublin Core tags.

3. Created a new php file to call the new pbcore xml metadump function, based on the existing /pages/tools/update_xml_metadump.php file

4. Ran the update file. XML files are exported to the /filestore directory.

Current digital program production practices at KBOO

Things are always in flux at KBOO, many times in order to improve the station. The digital program production practices change with newer software, hardware, or workflows to meet the constantly evolving standards and support KBOO’s radio programming into the future. For that reason, this brief diagram of current digital program production practices of live programs (i.e. how radio programming moves around, gets broadcast and archived) reflects the flow as of December 2016 with known changes coming down the road.

Click to view document


From substrate to streaming: Audio archives at KBOO

What goes into digitizing audio content?

Some of KBOO’s archival audio content sits on 1/4″ audio tape. The majority of the tape’s substrate is polyester base, with a few acetate base tapes. Both of these types of tape have their own challenges for preservation. Polyester base tape typically can present a “sticky-shed” syndrome that affects playback and quality of tape. It is reversible, but affects longevity of the tape. Acetate base tape can present “vinegar syndrome” which is a sign of substrate deterioration. Digital preservation is a method of preservation that requires its own care and handling of newly created materials. Files made of bits and bytes need to be managed to ensure that the file’s bit and byte structure remains fixed and is playable with audio software.

Audiovisual digitization vendors provide a service with specific details. I developed an RFP (request for proposals) for vendors that is specific to the digital preservation needs of KBOO for 1/4″ open reel audio items the archive collection. It was created using an RFP template from AV Preserve.

KBOO RFP for open reel digitization

An RFP, or a RFI (request for information) is a chance for an audiovisual archive to ensure that what it wants will be included in the service at a particular cost, and allows the organization to budget appropriately. You will notice that our RFP includes sections on care and handling, appropriate storage at the vendor facility, documentation of metadata, sample rate and bit depth of audio files created, and a request for no sound processing, among other details. Digital preservation differs from the creation of a produced sound file. KBOO wishes to preserve the best quality of analog audio content as it is represented on the tape. So, if the audio was originally recorded with background noise or varying  audio levels, these are kept in the preservation file. Often, keeping details in offer contextual information about how audio was created. For listening ease, lower quality mp3 files are created from the preservation wav file and an organization can determine how much editing it wishes to perform for these proxy or derivative files.

Sending reels out for specific and detailed work of digitization is only one step in KBOO’s workflow for open reel audio digital preservation. In brief, the workflow is presented below (two images). You’ll noticed that KBOO is continuously working on improving the information documented about audio items and managing it in an archive access system, and that it is a group effort! A collective wisdom of many KBOO volunteers is necessary to add more information to sparsely documented audio tapes.

At the end of the workflow, KBOO’s radio content will have moved from the physical substrate of the tape to a streaming file format that is accessible to researchers and patrons.