Auto-Tune, autocorrelation, and seismic analysis

As Auto-Tune is making its way into everything nowadays, public awareness of the process is rising. The recent Time article about Auto-Tune and its creator is a good read, but it oversimplifies the principles behind the algorithm.

In the article, Andy Hildebrand’s background in seismic analysis is viewed as the key to his later work with pitch correction. Hildebrand apparently used autocorrelation for seismic mapping, which is viewed as the key to his later success with Auto-Tune:

He was debating the next chapter of his life at a dinner party when a guest challenged him to invent a box that would allow her to sing in tune. After he tinkered with autocorrelation for a few months, Auto-Tune was born in late 1996.

What the article fails to mention is that autocorrelation has been used for pitch detection since at least the 1970’s. Rabiner and Schafer’s book “Digital Processing of Speech Signals” describes the process in detail, and it was published in 1978. Eventide pitch shifters used autocorrelation starting in the 1980’s, to peform their splicing detection and pitch correction.

EDIT, June 2014: The next few paragraphs are just flat out WRONG. I’ll put the corrections in after these paragraphs, but will leave the following paragraphs unaltered for the historical record:

In addition, the basic pitch shifting method used by Hildebrand was described by Keith Lent in a Computer Music Journal article in 1989. The idea is to chop up the vocal signal into small windowed grains, where each grain holds a single period of the input signal, and then to spit the grains out at a rate corresponding to the new pitch. This can be viewed as a form of pitch synchonous granular synthesis, where both the grains and the grain rate are determined by analysis of an incoming signal. Lent’s technique has been used by most of the formant preserving pitch shifters, including algorithms from IVL, Digitech and Eventide. The technique was independently developed by France Telecom, and is often referred to as PSOLA (Pitch Synchronous Overlap and Add).

The key to Hildebrant’s innovation is how he combined Lent’s pitch shifting method with the robust pitch detection that autocorrelation provides. Lent’s original paper used a simple time-domain method for determining the input periodicity, which resulted in audible distortion for certain input signals. I have written Lent-style pitch shifters before, and the pitch detection algorithm is critical in avoiding octave jumps, unnaturally hoarse voices, or metallic syllabants. My code had all of those problems, although my boss at the time was able to fix many of the issues. Hildebrant’s patent describes how he uses sample rate reduction and some clever mathematical tricks to create a robust pitch detector that runs much faster than standard autocorrelation.

So, if you are watching The Backyardigans, and the overly pitch corrected vocals drive you crazy, don’t just blame Andy Hildebrant – blame Keith Lent.

UPDATE, JUNE 2014: I received an email from Andy Hildebrand, letting me know that I was incorrect about the Keith Lent / PSOLA algorithm being used in Autotune:

No, I don’t use the Lent algorithm: way too imprecise. You are close on the detection algorithm. The math in the patent is absolutely precise to what I do. But that math is used continuously to track pitch as well. I always know exactly what the pitch is. I run a simple rate converter from that point and when I have to repeat a cycle (going sharper) or delete a cycle (going flatter) I can because I know exactly what the period is at every instant.

FYI see our web site for guitar applications.

Best regards
Andy”

So, PSOLA is NOT used in Auto-Tune. And, it turns out I was spelling Andy Hildebrand’s name incorrectly. I must have been confusing the spelling with the Brothers Hildebrandt, which says as much about my own nerdy childhood as it does about my poor proofreading skills.

My apologies to Andy Hildebrand for the errors in the original article, and I thank him for writing in with the corrections.

I stand beside my dislike of The Backyardigan’s vocals, however. That is some annoying music right there.

	obsoletemachines on Stability through Time Variati…
	Tom on Early examples of modulated…
	seancostello on Early examples of modulated…
	Charlie Domingo on Early examples of modulated…
	seancostello on Slides from my AES Reverb…

The Halls of Valhalla

Auto-Tune, autocorrelation, and seismic analysis

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply