256 Shades of Grey

I’m currently working with BAVC on a project called qctools which is developing software to analyze the results of analog audiovisual digitization efforts.

One challenge is producing a software-based waveform display that accurately depicts the luminosity data of digital video. A waveform monitor is useful when digitizing video to ensure that the brightness and contrast of the video signal are set properly before digitizing the video and if not then the waveform allows a means of measuring luminosity so that adjustments could be made with a signal processors such those often available on a timebase corrector.

A Tektronix WFM 300A

A Tektronix WFM 300A

While working with a draft waveform display that I had arranged via ffmpeg’s histogram filter I realized that my initial presentation was inaccurate. In order to start testing I needed a video that showed all possible shades of gray that an 8 bit video might have (two to the power of 8 is 256). I was then going to use this video as a control to put through various other software- and hardware-based waveform displays to make some measurements, but producing an accurate video of the 256 shades was difficult.

I eventually figured out of way to write values in hexadecimal from 0×00 to 0xFF and then insert a 0×80 as every other byte and then copied that raw data into a quicktime container as raw uyvy422 video (2vuy) to make this result.

256 shades of gray, separated into sections of 4 and 16

256 shades of gray, separated into sections of 4 and 16

This video is a 1 frame long 8 bit 4:2:2 uncompressed video that contains the absolute darkest and lightest pixels possible in an 8 bit video all possible 8 bit graytones in between separated by thick white or black stripes every 16 shades and thin white or black stripes every 4 shades.

In a waveform monitor such as Final Cut’s waveform display below, the result should be a diagonal line with dotted lines at the top and bottom that show the highest and lowest shades of grey allowed.

Putting the 256_shades file in Final Cut's waveform shows that Final Cut does not plot values from 0-7.5 IRE but does plot the rest all the way up to the 110 IRE limit.

Putting the 256_shades file in Final Cut’s waveform shows that Final Cut does not plot values from 0-7.5 IRE but does plot the rest all the way up to the 110 IRE limit.

However, Final Cut’s waveform display does not plot the lowest graytone values. Columns from 1-16 are not display. By divided the graytone shade number by approximately 2.33 you get the IRE value. So from 0-7.5 IRE is not plotted in this display but all crushed together at 7.5 IRE.

And here is the same video displayed through ffmpeg’s histogram filter in waveform mode. A few other filtering options are added to the display to give guidelines to show values that are outside of broadcast range, from 0-7.5 IRE in blue and 100-110 IRE in red.

256_shades file in ffmpeg's histogram filter showing the full range of 0-110 IRE (boundary lines mark broadcast range at 7.5 and 100 IRE)

256_shades file in ffmpeg’s histogram filter showing the full range of 0-110 IRE (boundary lines mark broadcast range at 7.5 and 100 IRE)

In qctools all 256 shades of gray are plotted appropriately, showing a diagonal line going from one corner of the image to another, with the white and black spacing columns show as a half-line of dots and dashes at the very first and very last night of video.

Follow the qctools project at http://bavc.org/qctools for more information.

FLAC in the archives

The first time I heard about FLAC was from a co-worker within the early days of my first full-time audiovisual archivist gig. I was trying to start digitization projects and figure out preservation practices. He was working in a half-IT and half-broadcast-engineer capacity and happy to support archival work where he could help. We were discussing audio preservation and digitization of 1/4″ audio reels and he remarked on how FLAC was really an ideal choice for this type of work. I hadn’t heard much about FLAC before but based on the list-servs of ARSC and AMIA knew that when an archivist is asked to select a digital audio format that really Broadcast Wave Format (BWF) was the only legitimate choice. We went on to debate preservation objectives and the advantages and disadvantages of one format versus the other. Broadcast Wave was the “best practice” in digital audio archiving, but by the end of the conversion I was questioning why I was defending it.

My colleague clarified that the choice between FLAC and BWF was not about audio quality since FLAC is a lossless audio encoding. A FLAC encoding of an audio signal and a BWF encoding of an audio signal (at the same specifications) will decode back to the same audio signal, but the FLAC file was much smaller (about a third the size of the uncompressed audio). He clarified that FLAC is an open format well supported by free software. During this conversion I was imagining the shock and disbelief that may emit from various archival communities to know that a n00b archivist was being lured towards the lossless audio codecs of Free Software. For BWF I didn’t have much of a defense; it was a well-respected standard across the audio archiving community, but at that point I didn’t know why. I feebly tried a BWF defense by pointing out that because the BWF file is larger than FLAC that it may be more resilient since a little bit corruption would have a more damaging effect on the compact FLAC as opposed to the vast BWF file.

Following this conversion I searched archival listservs for references to FLAC and didn’t find much though I did find references to FLAC in archival environments at http://wiki.etree.org and band sites. This research also led me to the communities that develop FLAC and related applications. Around that time their work was especially productive as noted in their change log. All this left me confused as if FLAC and BWF play the same singular role in two parallel archival community universes.

For the time, I would digitize analog audio to BWF and sleep well. There was a large amount of audio cassette transfers, CD ripping, and reel-to-reel work and we worked to keep the decks running day-after-day to achieve our preservation goals. As the data piled up digital storage became an increasing complicated issue. The rate of audio data that was being created was simply larger than the rate of digital storage expansion. As storage stresses began to grow FLAC looked more and more tempting. Finally in 2007 FLAC 1.2.1 added an option called –keep-foreign-metadata which meant that not only could I make a FLAC file from a BWF that losslessly compressed the audio but I could also keep of the non-audio data of the BWF as well (descriptive information, embedded dates, bext chunks, cart chunks, etc). Basically this update meant that one could compress a BWF to a FLAC file and then uncompress that FLAC back to the original BWF file; bit-for-bit. Knowing that I could completely undo the FLAC decision at any time with these new options, I finally went FLAC. Using the FLAC utilities and tools such as X Lossless Decoder I compressed all the BWF files to FLAC, recovering substantial amounts of digital storage. This process involved a lot of initial testing and workflow tinkering to make sure that the FLAC compression was a fully reversible process, it was, and I was happy to finally make the preservation-standard switch and invest in learning FLAC inside and out.

[ technical interlude ]

If you wish to convert WAVE files to FLAC files in a preservation context here is how I recommend you do it. Firstly, use the official FLAC utility to get the options mentioned below or a GUI that gives you access to these options. The following are a list of FLAC utility options that I found relevant:

We can wait for the most beneficial result. The –best option will prioritize file size reduction rather than encoding speed.

For WAVE files or AIFF files this option will cause the resulting FLAC to store all non-audio chunks of data that may be in the source file. Ideally this option should be used during all FLAC encoding and decoding to ensure metadata survives all procedures.

Optional, but I found this handy. This option applies some of the timestamps of the source file to the output, whether going from WAV->FLAC or FLAC->WAV.

Verify! Digital preservation is always an environment of paranoia. This option will cause the utility to do extra work to make sure that the resulting file is valid.

If everything else is successful this will delete the source file when the FLAC is completed.

In addition to these option I recommend logging the stdout, stderr, and original command along with the resulting output file.

Putting this altogether the command would be: flac --best --keep-foreign-metadata --preserve-modtime --verify --delete-input-file audiohere.wav

When running this command the file audiohere.wav will soon disappear and be replaced by a much smaller file called audiohere.flac. To reverse the process add the –decode option: flac --decode --keep-foreign-metadata --preserve-modtime --verify --delete-input-file audiohere.flac and then you get the wav file back.

[/ technical interlude ]

The file size advantages led to benefits in other types of processing. Flac files could be uploaded to the Internet Archive in a third the time as a wav file, we could move more audio data from DATs or CDs to LTO storage.

A few years later I realized another bonus of FLAC as an audio preservation file format that seems fitting within digital preservation which is the strong fixity integrations. Each FLAC file contains an md5 checksum of the encoded audio in the header. With this feature a specific audio recording could be encoded to many different FLAC files which may differ (one FLAC may be encoded for speed, another for size, another containing extra metadata) but each FLAC file would contain the same checksum which represents the source audio data. This is often called the FLAC fingerprint. etree.org has some great resources on the FLAC fingerprint at http://wiki.etree.org/?page=FlacFingerprint. The fingerprint gives all FLAC files a built in checksum and thus any FLAC file could be tested as to the integrity of its encoded data. If a FLAC file is truncated through partial download, corrupted, or manipulated in a way that would affect the audio data then the FLAC file could be identified as invalid or problematic without needing an external checksum file.

Deeper within the FLAC file audio samples are grouped into audio frames which themselves are checksummed with a crc value. If a FLAC file suffers from bit rot or other corruption then a FLAC decoder such as ffmpeg’s can report on precisely where the problem is. This reporting allows an archivist a more efficient ability to resolve the problem.

To show how this works I’ll make a small 5 second FLAC file of a sine wav with ffmpeg like this: ffmpeg -f lavfi -i sine -t 5 sinewav.flac. Then in a hex editor I’ll just change one bit, the smallest corruption. To test the file I can use the test feature in the flac utility like: flac --test sinewav.flac which gives:
sinewav.flac: ERROR while decoding data

but this error isn’t very clear. The test shows that a crc checksum stored within the flac files failed validation so that there was some change after encoding, but the report doesn’t show where. FFmpeg does this a little better. If I decode the flac file with FFmpeg like: ffmpeg -loglevel error -i sinewav.flac -f null - then I get more specific news.

FFmpeg reporting a crcerror from a corrupted FLAC file.

FFmpeg reporting a crcerror from a corrupted FLAC file.

PTS stands for presentation timestamp. The value 82,944 here refers to the sample where the problem starts. Since the sample rate of sinewav.flac is 44,100 then I can divide 82,944/44,100 to get 1.88 seconds which is where I can find the problem. Here is the corresponding area as shown by a waveform image in Audacity.

Audacity showing a corrupted flac file.

Audacity showing a corrupted flac file.

Because a FLAC file contains an md5 checksum of all the encoded data and crc checksums for each frame of encoded audio it is possible to discover which fairly accurate precision what areas are affected by corruption. A wav file doesn’t have such a feature, would require an external checksum to allow for any integrity testing, and would not provide a feature to pinpoint corruption to any particular area.

Moving into different archival projects I’m certainly quicker to consider FLAC a significant option in digital audio preservation. “Best practices” in archiving might not necessarily be the best use of current technology. Best practices require ongoing re-evaluation and improvements and I’d rather refer to them as “good-enough-for-now practices”. At least for me, FLAC is good enough for now.

X-Face for Video

Recently ffmpeg added encoding support for X-Face. An x-face image is a square image, 48 pixels high and 48 pixels wide, and only composed of black and white values. A 2,304 pixel image can not contain much detail but the image was intended to accompany an email as a tiny visual depiction of the sender. Here’s a gallery of jpegs created from xface data to give an idea of what x-faces looked like.

I wanted to try to use the xface encoder to produce a low quality moving image. One issue was that there doesn’t seem to be an audiovisual container that supports xface encoded data, so xface in AVI or xface in QuickTime wasn’t happening. Instead I found I could simulate an xface video experience by transcoding to xface and back to a more normal video format. I used these commands:

Take a video called input.mp4 and export xface bitmap files, one per frame.
ffmpeg -i input.mp4 -c:v xface -s 48x48 xface-%05d.bmp

Export the audio from the input.mp4 file.
ffmpeg -i input.mp4 audio.wav

Reading the xfaces with the audio to make a video file (probably better ways to do this).
ffmpeg -c:v xface -f image2 -i xface-%05d.bmp -f nut - | ffmpeg -i - -i audio.wav -map 0 -map 1 -c:v rawvideo -s 48x48 -pix_fmt monow -c:v ffv1 -c:a pcm_s16le xface-48x48.mov

I then had a file that showed what xface looks like in video form, but if I increased the size of the video from 48×48 in my QuickTime player then the video would blur and deteriorate at larger sizes.


Pixel art gets mushy

The xface image is a crisp 48×48 pixels but when scaled up to 480×480 for easier viewing the black and white pixels became rounded and fuzzy. Scaling raster images from one size to another can be tremendously lossy and in this example I really noticed the effect of my xface pixel art turning to mush as I increase the width and height. Finding a fix led me to ffmpeg’s documentation on scaling video. From here I could scale my images with the neighbor+full_chroma_inp option which preserves the blocky look at the small pixels while scaling it to a larger size.

ffmpeg -i xface-48x48.mov -c:v ffv1 -sws_flags neighbor+full_chroma_inp -vf scale=480:480,pad=720:480:120:0 -c:a libfaac xface.mov

Once I figured out this process I searched for a video that would still visually represent its content even at 48×48 frame size. This was a lot harder that I would have assumed and most of the video ended up as indecipherable black and white blocks. Eventually I found a Sanka coffee commercial that seemed to work. The first 10 seconds are hard to make out but the movements of the spoon, coffee cup, and coffee pitcher are all identifiable. So if you’ve ever wondered what xface video might look like, here’s a sample.

And here’s the original from the Prelinger Collection

Display video difference with ffmpeg’s overlay filter

I was seeking a method to show the difference between two videos and found this method using some of the recent features of ffmpeg. This process could be useful to illustrate how lossy particular encoding settings are to a video source. An original digital video and a lossless encoding of it should show no difference; whereas, a high-quality lossy encoding (like an h264 encoding at 1000 kilobits per second) should show visual differences compared to the original. The less efficient the codec, the lower the bitrate, or the more mangled the transcoding process, the greater the difference will be between the pixel values of the original video and the derived encoding.

Here’s what I used:
ffmpeg -y -i fileA.mov -i fileB.mov -filter_complex '[1:v]format=yuva444p,lut=c3=128,negate[video2withAlpha],[0:v][video2withAlpha]overlay[out]' -map [out] fileA-B.mov

To break this command down into a narrative, there are two file inputs fileA.mov and fileB.mov to compare. The second input (fileB.mov) is converted to the yuva444p pixel format (YUV 4:4:4 with an alpha channel), the ‘lut’ filter (aka lookup-table filter) sets the alpha channel to 50% (the ’128′ is half of 2^8 which is the bit depth of the pixel format), and then the video is negated (all values are inverted). In other words one video is made half-transparent, changed to its negative image, and overlaid on the other video so that all similarities would cancel out and leave only the differences. I know there are a few flaws in this process since depending on the source this may invoke a colorspace or chroma subsampling change that may cause additional loss than what exists between the two inputs (but close enough for a quick demonstration). This process also is intended to compare two files that have a matching presentation, same amount of frames, and same presentation times for all frames.

Here is an example of the output. This first one depicts the differences between an mpeg2 file (found here) and an mpeg1 derivative of it. Closer to middle gray indicates no visual loss in the encoding, but deviations from middle gray show how much was lost (unfortunately YouTube’s further encoding of the demonstration embedded here flattens the results a bit).

Here’s another version of the output, this time comparing the same mpeg2 file with a prores derivative of it. Here it is very difficult to discern any data loss since nearly the whole frame is middle gray. There is still some deviation (prores is not a lossless codec) but the loss is substantially less than with mpeg1.

Here’s another example with different material. In this case an archivist was digitizing the same tape on two different digitization stations and noticed that, although the waveforms and vectroscopes on each station shows the same levels, the results appearto be slightly different. I took digitized color bar videos from each of the two stations and processed them through yuvdiag to make videos of the waveform and vectroscope output and then used the comparison process outlined above to illustrate the differences that should have been illustrated by the original waveform monitor and vectroscope.

The results showed that although the vectroscope and waveform on each of the two digitization stations showed the same data during the original digitization that at least one of them was inaccurate. By digitizing the same color bar through both stations and analyzing the resulting video with yuvdiag we could see the discrepancy between the chroma and luma settings and calibrate appropriately.

Reconsidering Checksums published in IASA Journal

Last month the IASA Journal published an article I wrote on error detection and fixity issues. While IASA agreed to publish the article under an open license, in this case CC-BY-ND, the journal does not (yet) have an open access policy.

The article discusses two different approaches used in the application of checksums for audiovisual data: embedded checksums data used to audit transmission (MPEG CRCs, FLAC Fingerprints, and DV parity data) and external whole file checksums (more typical to digital preservation environments). In the article I outline how the effectiveness of a whole file checksum does not scale well for audiovisual data and make proposals on how formats such as ffmpeg’s framemd5 can enable more granular and efficient checksums for audiovisual data.

terminal output of ffmpeg evaluating framemd5 for an input file

The article may be found in IASA Journal Number 39 (login required) or re-posted on this blog here.

learn on film | work on data

This blog will (mostly|probably) be about moving image archiving, audiovisual processing, and use of open source software for archival objectives. I may also throw in posts about silent film, independent and grassroots media, and technology manipulation. This is only the first post so all this is a prediction, but at the moment most of my writing veers towards these topics eventually.

At work I recently started acquiring film rewinds, splicers, and various tools for archival work with film. I’ve enjoyed getting reacquainted with these tools since I haven’t had much opportunity to use them since graduating from the L. Jeffrey Selznick School of Film Preservation in 2004. These tools remind me of how different my archival education is from my later archival experience and how, although film archivists and digital media archivists share goals, their tools and skill requirements are practically in alternative universes. Back in school I would crank through film from one reel to another with my fingers resting on the edge of the print so that I could examine the frames, splices, and sprockets by sight and by feel to assess the structural and visual condition of the print. The film benches were stocked with tools for fixes such as edge-repair tape, brushes, trichlorethylene, and other machines and bottles. Beyond the tools, film archiving work was packed with tricks of the trade and experience from experiments. In film preservation there was a variety of challenges along with decades of rich technical expertise and creative solutions.

After graduating I became the archivist of a digital media collection (no film) at Democracy Now! Although I understood the aims of audiovisual archiving and rewards of the work, I started my first archival job with little experience in the means or methods necessary to preserve a digital collection. There seemed a particular gap of knowledge within the media archiving community about preservation strategy for non-file-based digital media, such as the DAT and DV tapes that I worked with. With film materials, the archivist could interact with the content through two methods. Firstly, the archivist could project the film and see what it looks like and how it renders itself to an audience. Secondly, an archivist could handle a film print on a bench and spool through a print manually to feel for edge damage, detect physical abnormalities, and make repairs. It is hard to imagine a film archivist working with only one of these methods for media handling, both are essential. However, once out of school and into a digital archive I could initially only use that first type of handling by viewing how a media file ‘projected’ through QuickTime or VLC. The bits inside, codec structure, and architectural patterns of digital media were lost to me. For me, going from a film-based education to responsibilities over a digital collection illuminated a gap in the technical handling skills within the moving image archiving profession. Although I understood the chemistry and structure of sprockets and film frames, I had no practical knowledge of DIF blocks and GOP headers or how to handle them, which seemed particularly important when digital media could be non-standard, broken, or obscure.

A further obstacle was that, at least at the time of my education, many film archivists were slow to respond to the incoming predominance of digital representations of media and how this affected preservation. In one lesson we would learn to use specialist methods and tools to bring a deteriorated nitrate print to a healthy enough state for duplication; however, in another lesson we were taught that digital media will deteriorate in a manner where suddenly the media is completely unworkable and beyond the possibility of repair. From this I tried to imagine the opposite of this situation, such as an unskilled projectionist blindly forcing a fragile film print through a projector, the edge damage of the initial frames causing the projector’s teeth to further damage the film, and fail to present the media. Unplayable and broken digital media may be fixed just as an unplayable film print may be fixed. In both cases success may be limited or unfeasible, but the creative restoration efforts that are possible within a film archive do have a parallel in the digital realm, even though there has been little overlap between these two types of expertise.

One of the distinctions of the Selznick School of Film Preservation is that, although the examples were primarily with film, they teach well the means and reasons for an archivist to have a high degree of technical control over the collections that they manage. In my later work I would struggle to gain the same understanding and control over digital media collections that I had been taught with film collections. With the current educational options for moving image archivists it can still be a challenge to break beyond n00b status to grasp the inner workings of digital media formats, codecs, decoders, and muxers. For me, the best enablers for this have been the use and collaboration with open source projects such as ffmpeg, cdparanoia, dvgrab, mediainfo and others. I hope to highlight some of these applications and their relevance to media archivists in upcoming posts.