Category Archives: Uncategorized

Innovation and Audiovisual Preservation

I was nominated for AMIA’s Alan Stark award and received it at a ceremony presented by Reto Kromer at the AMIA conference. It was a privilege to be acknowledged with an award that focuses on work in innovation within the field. This was right after the 2016 election and still not long after the death of Gene Wilder. Here’s what I ended up saying at the ceremony.

Thank you, Reto, for the kind words. I’m very thankful to my friends and collaborators and AMIA for considering me. This has already been the most emotional AMIA for me, even more so receiving the Alan Stark award and your encouragement. I’d like to speak briefly on the topic of AMIA as catalyst for and a dependant of innovation.

I first came to AMIA in 2002 as a hopeful film preservationist. I attended the Selznick School and learned every aspect of film preservation comprehensively from optical to chemical and mechanical to perceptual. My first archival job at Democracy Now had no film, but early digital recordings. Applying a Selznick School education to a Democracy Now archive requires some serious innovation. My Selznick experience taught me the feeling of controlling a collection but I knew I was struggling for any control at work. From nitrate to MiniDV we have similar objectives but very different tools and it is very reasonable to find that innovation we require needs improvement or does not yet exist at all.

Often archivists working on new technological challenges must quickly adopt the tools of related communities, but any Mac user knows the pain of trying to maintain those tools once that related community has moved on. I acknowledge that archivists have to grab onto what works to get the job done, but we can be more strongly empowered by creating, contributing, and supporting for ourselves. Just as we need to open our decks and projectors to understand, tinker and fix, just as we need to run our hands along a film print on a bench or open a cassette, we have a similar need with the new digital equivalents. Whether analog or digital, we should support our own hackers.

Bringing innovation into one’s work can produce meaningful personal accomplishments; still more meaningful when such innovations can be a solution shared with others. However, the more impactful innovations may be those that also act as a building block or foothold for others to build upon or learn from. For innovation within a community, being a contributor or supporter can create a bigger impact than being a lone pioneer.

AMIA has grown rapidly and I do not consider this as simply an expansion of what we already were, but as ever-changing with new voices and skills to welcome. However sometimes it feels as if we’re several simultaneous conferences. In one room we’ll lament about a digital dark age while in the next we’ll be better exploiting the opportunities of digital preservation. In one room we’ll discuss how to make diverse collections available online simply while another room demonstrates the distinct power of unique presentation forms, from cinematic to mobile.

I’m grateful that this year’s AMIA, through the good work of our volunteers and instigators, seeks to innovate the form, governance, and environment of our field in order to promote opportunity, showcase more voices, and meld our ideas and visions. Recent AMIA conferences had me return to work with a focus on innovating within the archive; however this AMIA has very strongly motivated a desire to innovate from the archive outward.

The field of archiving can contain its share of discouragement to innovation. Trying to improve our existing opportunities, can be met with resistance by those who consider that further research and innovation are not needed in an area that has already seen a pioneer or found a conceptual best practice. Or by those that consider innovation as credible only from a so-called expert. Additionally organizing our communities under uncooperative terms such as ‘analog vs digital’ limits our abilities to work stronger together. Best practices can be better, gatekeepers can be surpassed, and we together can innovate for ourselves.

In the 35mm print of Willy Wonka and the Chocolate Factory, Wonka is demonstrating an invention of lickable fruity wallpaper. In the variable-area audio track of the print, Verona Salt ridicules Wonka to claim that no one has heard of a Snoozberry before. Willy Wonka changes his tone and quotes Arthur O’Snaughessy to assert: “We are the music makers and we are the dreamers of the dreams”. To the extent we can put ourselves into action as curators, educators, activists, and archivists, please make music and please dream dreams. Thanks again, AMIA, peace.

#amia15: The Next 25

Here I’m posting my talk from the 2015 AMIA Conference Opening Plenary. This is intended to be read as fast as possible while in a state of severe anxiety. Thanks #amia15 and good luck #amia40.
Dave

Hi everyone,

Thank you. I’m Dave Rice, an archivist at City University of New York. Last week I got an email from Caroline asking if I would speak to you about AMIA. I don’t know very much at all about baby AMIA or about what sort archivist gatherings led to the conception of baby AMIA. But I can speak about the preteen AMIA that I met in 2001 and try to discern what’s been happening here since and speak about hopes and fears for the future of AMIA. This is actually my 13th AMIA conference so at some time later today I will have participated to half of the AMIA conferences.

Before 2001 no one told me about any of this AMIA stuff. I had a vague awareness of the fact that archives did exist but this was comparable to my awareness that black holes exists. I didn’t have any reason to consider that either black holes or archives were irrational, but also had no awareness of their direct consequence to my life or community. As I explored an interest in silent film I started to ponder such questions as “Why is that there are 20 different versions of the Chaplin Mutual shorts on home video but so many other films aren’t available in any form? And why does so much silent film on home video look terrible and occasionally it doesn’t?” Although I couldn’t directly observe audiovisual archivists I began noticing enough observational evidence to form a hypothesis that they do somehow exist and I wanted to attempt to communicate with one.

I had a home video of 1925 film “Grass: A Nation’s Battle for Life” from Milestone Films about the migration of a nomadic society so in March 2001 I sent an email from dericed@hotmail.com to MileFilms@aol.com. Presumably it was something like: “Dear MileFilms@aol.com, I am 22 years old. Please tell me the truth, are there audiovisual archivists?”

Dennis Doros kindly replied with acknowledgement that audiovisual archivists are real and provided suggestions which significantly impacted my professional development, including “Look into joining the Association of Moving Image Archivists (AMIA) as they are a very good group. I believe the website is www.amianet.org If not that, do a search and I’m sure you’ll find it.”

That year I started following the AMIA listserv that taught me things like how to try to get the last word in a technical argument and how to unsubscribe from a listserv. I had many questions but found the listserv too intimidating to participate in directly (it still is). Instead I would email individuals offlist with my dumb questions and began to find community in their usually kind and patient responses.

In 2002 I received a small scholarship from the Kodak Student Filmmaker Program that made it possible to attend my first AMIA conference. Attending the first one was eye opening as it demonstrates how professionally diverse AMIA is: similar objectives but significant diversity in the perception of and response to the challenges that affect the field. AMIA is not only diverse in specialty, but also diverse in circumstance. Many attendees are here at the expense of their employer and others have taken days off of work to pay their own way. I have been encouraged with the increased diversity of voices. Today at the conference, note that every panel (excepting one solo) includes a woman presenter.

Within AMIA over the last 25 and next 25 years I think some things will be consistent. Our field will always include elements of despair and paranoia, as there’s either nitrate that won’t wait or the digital dark age. We face decay and loss on all formats, though with different rates and burdens. We’ll continue to struggle with hope as the archival metrics of acquisition, conservation, preservation and decay collide.

A future AMIA will continue to evolve against the needs of the formats we seek to sustain and use. Already I suspect that a majority of those here do not and will not have the opportunity to work as an archivist of film. A future AMIA will find film, videotape, analog and tape-based digital formats to be less pre-dominant as newer archivists, employing new tactics, join us.

There are concerns of that AMIA’s expertise in film and analog formats is lessening, but I suggest an additional concern that our expertise in digital materials has been slow to expand. A future AMIA will certainly be a more technologically diverse AMIA.

A recent AMIA thread focused on an article that was critical of the role of open source technology in LTO tape storage, something an increasing number of us directly depend on. I was very encouraged John Tariot’s comments in this thread: “I would point to the work of many in AMIA, primarily our younger members, who have wrested away control of digitizing, cataloging, storing, and sharing media from closed, proprietary and expensive systems which, only a few short years ago, were our only options. Hacktivist and open-source culture has made a big change in our field, and should give you reason to hope!”

Personally talking about AMIA present or future is a challenging experience for me. I am happy to come to AMIA and present research or technical findings but for myself and for many here the topic of AMIA is very personal. I have not been a part of the early formation of AMIA but AMIA has been essential in my own professional formation. Having been working within AMIA for 13 years I can no longer claim imposter syndrome but can freely admit that I advocate for change and inclusion within our organization and agree that open-source culture has been a dramatic and personally a welcome change to our field.

I am gratified that we are not a format-specific community; not an association of film archivists nor optical media conservators, but an association focused on the archival challenges of the ever-evolving concept of the moving image in whatever forms it may be carried. We are not the Association of Time Capsule managers; preservation will be increasingly active in a future AMIA.

For the future AMIA I see the balance of conservation vs. migration tipping so that preservation is a much more active and involved endeavor. Film’s conservational powers make it feasible to maintain a collection over a long period of time by paying the electric bill. Digital storage can not and does not seem to try to compete with film’s conservational abilities, but for digital storage most recommend an ongoing series of frequent migrations. Demonstrating to a professional field that is tasked with stabilizing our moving image heritage to move from the stability of film to the perilous digital carriers is somewhat like suggesting to a kingdom to move from their walled city to take up a life as a nomad, that is to leave Castle Film to wonder and to struggle from tent to tent of hard drive to server to data tape.

We have new powers with digital data that we had not with analog media, in that migration is scalable many-to-one rather than one-to-one, more menial work can be done automated as the humans move to new challenges.

I certainly don’t mean to understate the advantages in scale and authenticity that digital moving image archiving can provide, but as archivists we should not play John Henry. We should not deplete our own personal resources to our own professional death in order to compete with and resist the opportunities of digital preservation.

AMIA includes decades of years of refined experience and expertise in many aspects of analog moving image preservation but in some digital areas we are either the newcomers or the late arrivals. We have many tools and tricks to restoring film and analog video but our toolkit for the same with digital media is still in development. Whether unspooling a film through your hands on the bench or deciphering hexadecimal payloads of a QuickTime header, I anticipate that a future AMIA will find that the same objectives and concerns of our digital and analog format may be resolved through unique means that are increasingly perceived as parallels.

While studying film preservation I was taught that film decays gracefully but that digital counterparts will decay unnoticed up until a point where the data is no longer usable and nothing can be dome. Increasingly we know that this is not true, there exists a way to resolve the same preservation tactics in both data and film. The preservation of the moving image needs AMIA’s expertise, innovation and community in preservation and access no matter what the format of the moving image.

Finally some news, many archivists and projects have been exploring and testing use of the open formats of lossless FFV1 and the Matroska container in preservation context. In fact Indiana University recently announced the selection of FFV1 and Matroska for their large-scale digitization efforts. Many archivists have contributed to the evolution of these open formats and the European Commission funded PREFORMA project enabled the groundwork to propose standardization of these format through an open standards organization, the Internet Engineering Task Force. At 8am this morning the Internet Engineering Steering Group approved the charter, timeline and project proposal for an IETF Working Group called CELLAR standing for Codec Encoding for LossLess Archiving and Realtime transmission. FFV1 and Matroska will be standardized and change control will move from FFmpeg and Matroska to an open standards body.

To close, looking at the future AMIA, I see there are more ways to participate and collaborate on our challenges, there is more technological and personal diversity within the group, with more inclusion, sympathy, and impact amongst us. There are new opportunities for archivists to participate directly in creating solutions rather than adopting them awkwardly from other communities. We have a lot to be hopeful for in a future AMIA. Thanks so much.

256 Shades of Grey

I’m currently working with BAVC on a project called qctools which is developing software to analyze the results of analog audiovisual digitization efforts.

One challenge is producing a software-based waveform display that accurately depicts the luminosity data of digital video. A waveform monitor is useful when digitizing video to ensure that the brightness and contrast of the video signal are set properly before digitizing the video and if not then the waveform allows a means of measuring luminosity so that adjustments could be made with a signal processors such those often available on a timebase corrector.

A Tektronix WFM 300A

A Tektronix WFM 300A

While working with a draft waveform display that I had arranged via ffmpeg’s histogram filter I realized that my initial presentation was inaccurate. In order to start testing I needed a video that showed all possible shades of gray that an 8 bit video might have (two to the power of 8 is 256). I was then going to use this video as a control to put through various other software- and hardware-based waveform displays to make some measurements, but producing an accurate video of the 256 shades was difficult.

I eventually figured out of way to write values in hexadecimal from 0x00 to 0xFF and then insert a 0x80 as every other byte and then copied that raw data into a quicktime container as raw uyvy422 video (2vuy) to make this result.

256 shades of gray, separated into sections of 4 and 16

256 shades of gray, separated into sections of 4 and 16

This video is a 1 frame long 8 bit 4:2:2 uncompressed video that contains the absolute darkest and lightest pixels possible in an 8 bit video all possible 8 bit graytones in between separated by thick white or black stripes every 16 shades and thin white or black stripes every 4 shades.

In a waveform monitor such as Final Cut’s waveform display below, the result should be a diagonal line with dotted lines at the top and bottom that show the highest and lowest shades of grey allowed.

Putting the 256_shades file in Final Cut's waveform shows that Final Cut does not plot values from 0-7.5 IRE but does plot the rest all the way up to the 110 IRE limit.

Putting the 256_shades file in Final Cut’s waveform shows that Final Cut does not plot values from 0-7.5 IRE but does plot the rest all the way up to the 110 IRE limit.

However, Final Cut’s waveform display does not plot the lowest graytone values. Columns from 1-16 are not display. By divided the graytone shade number by approximately 2.33 you get the IRE value. So from 0-7.5 IRE is not plotted in this display but all crushed together at 7.5 IRE.

And here is the same video displayed through ffmpeg’s histogram filter in waveform mode. A few other filtering options are added to the display to give guidelines to show values that are outside of broadcast range, from 0-7.5 IRE in blue and 100-110 IRE in red.

256_shades file in ffmpeg's histogram filter showing the full range of 0-110 IRE (boundary lines mark broadcast range at 7.5 and 100 IRE)

256_shades file in ffmpeg’s histogram filter showing the full range of 0-110 IRE (boundary lines mark broadcast range at 7.5 and 100 IRE)

In qctools all 256 shades of gray are plotted appropriately, showing a diagonal line going from one corner of the image to another, with the white and black spacing columns show as a half-line of dots and dashes at the very first and very last night of video.

Follow the qctools project at http://bavc.org/qctools for more information.

FLAC in the archives

The first time I heard about FLAC was from a co-worker within the early days of my first full-time audiovisual archivist gig. I was trying to start digitization projects and figure out preservation practices. He was working in a half-IT and half-broadcast-engineer capacity and happy to support archival work where he could help. We were discussing audio preservation and digitization of 1/4″ audio reels and he remarked on how FLAC was really an ideal choice for this type of work. I hadn’t heard much about FLAC before but based on the list-servs of ARSC and AMIA knew that when an archivist is asked to select a digital audio format that really Broadcast Wave Format (BWF) was the only legitimate choice. We went on to debate preservation objectives and the advantages and disadvantages of one format versus the other. Broadcast Wave was the “best practice” in digital audio archiving, but by the end of the conversion I was questioning why I was defending it.

My colleague clarified that the choice between FLAC and BWF was not about audio quality since FLAC is a lossless audio encoding. A FLAC encoding of an audio signal and a BWF encoding of an audio signal (at the same specifications) will decode back to the same audio signal, but the FLAC file was much smaller (about a third the size of the uncompressed audio). He clarified that FLAC is an open format well supported by free software. During this conversion I was imagining the shock and disbelief that may emit from various archival communities to know that a n00b archivist was being lured towards the lossless audio codecs of Free Software. For BWF I didn’t have much of a defense; it was a well-respected standard across the audio archiving community, but at that point I didn’t know why. I feebly tried a BWF defense by pointing out that because the BWF file is larger than FLAC that it may be more resilient since a little bit corruption would have a more damaging effect on the compact FLAC as opposed to the vast BWF file.

Following this conversion I searched archival listservs for references to FLAC and didn’t find much though I did find references to FLAC in archival environments at http://wiki.etree.org and band sites. This research also led me to the communities that develop FLAC and related applications. Around that time their work was especially productive as noted in their change log. All this left me confused as if FLAC and BWF play the same singular role in two parallel archival community universes.

For the time, I would digitize analog audio to BWF and sleep well. There was a large amount of audio cassette transfers, CD ripping, and reel-to-reel work and we worked to keep the decks running day-after-day to achieve our preservation goals. As the data piled up digital storage became an increasing complicated issue. The rate of audio data that was being created was simply larger than the rate of digital storage expansion. As storage stresses began to grow FLAC looked more and more tempting. Finally in 2007 FLAC 1.2.1 added an option called –keep-foreign-metadata which meant that not only could I make a FLAC file from a BWF that losslessly compressed the audio but I could also keep of the non-audio data of the BWF as well (descriptive information, embedded dates, bext chunks, cart chunks, etc). Basically this update meant that one could compress a BWF to a FLAC file and then uncompress that FLAC back to the original BWF file; bit-for-bit. Knowing that I could completely undo the FLAC decision at any time with these new options, I finally went FLAC. Using the FLAC utilities and tools such as X Lossless Decoder I compressed all the BWF files to FLAC, recovering substantial amounts of digital storage. This process involved a lot of initial testing and workflow tinkering to make sure that the FLAC compression was a fully reversible process, it was, and I was happy to finally make the preservation-standard switch and invest in learning FLAC inside and out.

[ technical interlude ]

If you wish to convert WAVE files to FLAC files in a preservation context here is how I recommend you do it. Firstly, use the official FLAC utility to get the options mentioned below or a GUI that gives you access to these options. The following are a list of FLAC utility options that I found relevant:

--best
We can wait for the most beneficial result. The –best option will prioritize file size reduction rather than encoding speed.

--keep-foreign-metadata
For WAVE files or AIFF files this option will cause the resulting FLAC to store all non-audio chunks of data that may be in the source file. Ideally this option should be used during all FLAC encoding and decoding to ensure metadata survives all procedures.

--preserve-modtime
Optional, but I found this handy. This option applies some of the timestamps of the source file to the output, whether going from WAV->FLAC or FLAC->WAV.

--verify
Verify! Digital preservation is always an environment of paranoia. This option will cause the utility to do extra work to make sure that the resulting file is valid.

--delete-input-file
If everything else is successful this will delete the source file when the FLAC is completed.

In addition to these option I recommend logging the stdout, stderr, and original command along with the resulting output file.

Putting this altogether the command would be: flac --best --keep-foreign-metadata --preserve-modtime --verify --delete-input-file audiohere.wav

When running this command the file audiohere.wav will soon disappear and be replaced by a much smaller file called audiohere.flac. To reverse the process add the –decode option: flac --decode --keep-foreign-metadata --preserve-modtime --verify --delete-input-file audiohere.flac and then you get the wav file back.

[/ technical interlude ]

The file size advantages led to benefits in other types of processing. Flac files could be uploaded to the Internet Archive in a third the time as a wav file, we could move more audio data from DATs or CDs to LTO storage.

A few years later I realized another bonus of FLAC as an audio preservation file format that seems fitting within digital preservation which is the strong fixity integrations. Each FLAC file contains an md5 checksum of the encoded audio in the header. With this feature a specific audio recording could be encoded to many different FLAC files which may differ (one FLAC may be encoded for speed, another for size, another containing extra metadata) but each FLAC file would contain the same checksum which represents the source audio data. This is often called the FLAC fingerprint. etree.org has some great resources on the FLAC fingerprint at http://wiki.etree.org/?page=FlacFingerprint. The fingerprint gives all FLAC files a built in checksum and thus any FLAC file could be tested as to the integrity of its encoded data. If a FLAC file is truncated through partial download, corrupted, or manipulated in a way that would affect the audio data then the FLAC file could be identified as invalid or problematic without needing an external checksum file.

Deeper within the FLAC file audio samples are grouped into audio frames which themselves are checksummed with a crc value. If a FLAC file suffers from bit rot or other corruption then a FLAC decoder such as ffmpeg’s can report on precisely where the problem is. This reporting allows an archivist a more efficient ability to resolve the problem.

To show how this works I’ll make a small 5 second FLAC file of a sine wav with ffmpeg like this: ffmpeg -f lavfi -i sine -t 5 sinewav.flac. Then in a hex editor I’ll just change one bit, the smallest corruption. To test the file I can use the test feature in the flac utility like: flac --test sinewav.flac which gives:
sinewav.flac: *** Got error code 2:FLAC__STREAM_DECODER_ERROR_STATUS_FRAME_CRC_MISMATCH
sinewav.flac: ERROR while decoding data
state = FLAC__STREAM_DECODER_READ_FRAME

but this error isn’t very clear. The test shows that a crc checksum stored within the flac files failed validation so that there was some change after encoding, but the report doesn’t show where. FFmpeg does this a little better. If I decode the flac file with FFmpeg like: ffmpeg -loglevel error -i sinewav.flac -f null - then I get more specific news.

FFmpeg reporting a crcerror from a corrupted FLAC file.

FFmpeg reporting a crcerror from a corrupted FLAC file.

PTS stands for presentation timestamp. The value 82,944 here refers to the sample where the problem starts. Since the sample rate of sinewav.flac is 44,100 then I can divide 82,944/44,100 to get 1.88 seconds which is where I can find the problem. Here is the corresponding area as shown by a waveform image in Audacity.

Audacity showing a corrupted flac file.

Audacity showing a corrupted flac file.

Because a FLAC file contains an md5 checksum of all the encoded data and crc checksums for each frame of encoded audio it is possible to discover which fairly accurate precision what areas are affected by corruption. A wav file doesn’t have such a feature, would require an external checksum to allow for any integrity testing, and would not provide a feature to pinpoint corruption to any particular area.

Moving into different archival projects I’m certainly quicker to consider FLAC a significant option in digital audio preservation. “Best practices” in archiving might not necessarily be the best use of current technology. Best practices require ongoing re-evaluation and improvements and I’d rather refer to them as “good-enough-for-now practices”. At least for me, FLAC is good enough for now.