Shimmer: Modulation, auto-correlation, and decorrelation

In my previous post, I discussed the Eno/Lanois shimmer sound, and how it is based around a pitch shifter and a digital reverb placed in a global feedback loop. It is worth exploring what is going on in this signal chain at the micro level, and how a fairly simple signal routing can create such a complex sound.

The AMS pitch shifter used by Eno and Lanois used a de-glitching board in its architecture, to find the ideal points for splicing together the time-scaled waveform chunks. This presumably worked in a similar manner to the H949 de-glitching card, in that autocorrelation was used to find the most similar segments of the waveform, and the delay time of one of the channels was adjusted for an ideal splice. It is also possible that the auto-correlation would trigger a new splice, such that the rate between splices was a function of the periodicity of the input signal.

Auto-correlation works well for determining splicing points, assuming that the input signal has a certain degree of correlation. A single sustained guitar note, for example, can have a high auto-correlation factor after the initial attack. But what happens when the signal to be shifted has a very low auto-correlation factor? Such a signal is said to be decorrelated; that is, the auto-correlation or cross-correlation is said to be greatly reduced compared to the original signal.

In the audio world, decorrelation often refers to randomization of the phases of the signal while preserving the frequencies, or to a time-varying process to slightly shift the frequencies of a signal to prevent feedback. Both of these processes are present, to a large extent, within time varying reverbs such as the Lexicon 224 and EMT250 used by Eno and Lanois.

The Lexicon 224 Concert Hall algorithm is made up of a number of allpass delays, which preserve the input frequencies while completely scrambling the phase response. In addition, the Concert Hall algorithm uses time varying delays inside of the recursive delay network, which increased the perceived modal density of the reverb, and also impart a beautiful chorusing to the reverb decay. This lushness from time-varying delay lines is very prominent in 1980′s Eno/Lanois productions – in addition to the Concert Hall algorithm and EMT250, they made use of the multi-voice chorus algorithms in the Lexicon units, as well as the Symphonic preset in the Yamaha SPX-90.

So, what happens when a pitch shifter that uses auto-correlation to find the ideal splicing points is put into a feedback loop with a reverb that is highly decorrelated and time-varying? The answer: chaos. The pitch shifter will NOT be able to find ideal splicing points, as the phase of the reverb output is continually being scrambled.

The pitch shifter HAS to splice, whether or not it is a perfect situation, so it will pick the best possible match, but this will probably be a fairly random location each time. The result will be random delays for each new splicing point, or random sizing of the grain windows, depending on how the auto-correlation is used within the pitch shifter. This randomization will cause the sidebands of the input signal to be spread out, such that an individual sinusoid would be turned into a band of frequencies centered around the original (that has been shifted up by an octave).

Add in the additional octaves produced by the feedback, the random sideband spread caused by the modulation within the reverb, and harmonics that are created by analog nonlinearities in the feedback path, and the result is a HUGE amount of sonic complexity generated from a simple system. Put a sine wave into this type of feedback system, and the output can approach near orchestral levels of thickness.

In this light, it is interesting to think about Eno’s use of the DX7 around this time. The DX7 can produce chaotic sounds through the use of cascaded FM, but it can also produce gentle, minimalist textures through the use of parallel operators (sine oscillators). A simple DX7 patch with several parallel sine oscillators and a low FM index may produce a fairly boring sound on its own, but would create an enormous yet controllable sound when fed into a complex feedback loop of digital processing.

Coming up: more on the topic of generating complexity through simple systems with feedback applied to them, both from a technical and creative perspective.

Pitch Shifting: The H949, and “de-glitching”

In 1977, Eventide released the H949 Harmonizer:

The H949 built upon the harmonizing features of the H910, and added more memory (for longer delays), randomized delay, reversed delays, flanging, and a micropitch mode for small pitch shift intervals. However, from a DSP developer’s perspective, the most interesting feature was a new circuit board, the LU618 or “ALG-3″ board, that was an option for earlier H949s and was added as a standard mode to later units.

A somewhat technical review of the situation:

  • In the H910 and H949 pitch shift modes, information is being read into delay memory, and being read out at faster or slower rates, to change the pitch of the signal. Reading out of a delay line at a different rate than the data is written will quickly create a situation where the delay line runs out of samples to read.
  • In a modern delay line based around a circular buffer, if the read tap is moving through the buffer at a different rate than the write pointer, it will soon run into the write pointer, either by catching up to it or by being overtaken by it. Resetting the read tap to a different point avoids the issue of running out of memory or running into the write pointer, but this causes an audible popping sound as the read tap jumps instantaneously to some random point in the delay.
  • Pitch shifters deal with this artifact by fading the value of the read tap down to zero before making this jump, and then fading the volume back up again after the jump. In a 2-tap pitch shifter like the H910 and H949, the volume change can be viewed as a crossfade between the 2 read taps. This is directly analogous to what happens in the rotary head tape pitch shifters, as a given read head rotates away from the tape.
  • However, this crossfading is not without its problems. If the crossfading happens over too long of a time, the result is a metallic coloration of the sound, as the 2 read taps have a constant relative distance from each other that results in comb filtering. Having the crossfading take place over a shorter interval helps to reduce the comb filtering, but results in an audible “glitch,” as the phase differences between the 2 read taps causes cancellations in the frequency response that is heard as a volume drop during the crossfading period. This can be heard as a “stuttering” artifact in the pitch shifted sound.

The LU618 / ALG-3 board on the H949 works on eliminating this “glitch” artifact through a clever trick called autocorrelation. As described in an Eventide patent by Anthony Agnello, the ALG-3 board looks at the 2 delayed signals, and compares them to see where they share the most similarities – not just zero crossings, but true phase similarities. The H949 then calculates a delay offset, such that the new segment that is to be faded in is in phase alignment (or as close to phase alignment as possible) with the segment that is being faded out during the crossfade time. If the ALG-3 has calculated the delay offset correctly, the 2 segments that are being crossfaded between will be almost identical, which will result in the least cancellations in the frequency and amplitude response. Voila, glitch-free pitch shifting!

If only it were so easy. The H949 “de-glitcher,” and the de-glitching mode used in most time-domain pitch shifters that followed the H949, work well with signals that are as close to periodic as possible – i.e. a single monophonic musical line. Periodic signals have a high degree of autocorrelation, so the de-glitching hardware can usually find excellent splicing points. Voice can be de-glitched fairly, as can a monophonic guitar line. Once polyphonic signals (i.e. chords) enter the picture, it becomes harder and harder to find similar points to splice together. Noisy signals, like drums, will have almost no similar splice points (i.e. a very low autocorrelation value). In such a case, the de-glitcher will find the most similar points to splice together, but there is no guarantee that they will be in any way similar, so the result is more likely to have amplitude glitches.

Next week, we will discuss the various pitch shifting schemes and how they relate to the generation of the Eno/Lanois “shimmer” sound.

Auto-Tune, autocorrelation, and seismic analysis

As Auto-Tune is making its way into everything nowadays, public awareness of the process is rising. The recent Time article about Auto-Tune and its creator is a good read, but it oversimplifies the principles behind the algorithm.

In the article, Andy Hildebrand’s background in seismic analysis is viewed as the key to his later work with pitch correction. Hildebrand apparently used autocorrelation for seismic mapping, which is viewed as the key to his later success with Auto-Tune:

He was debating the next chapter of his life at a dinner party when a guest challenged him to invent a box that would allow her to sing in tune. After he tinkered with autocorrelation for a few months, Auto-Tune was born in late 1996.

What the article fails to mention is that autocorrelation has been used for pitch detection since at least the 1970′s. Rabiner and Schafer’s book “Digital Processing of Speech Signals” describes the process in detail, and it was published in 1978. Eventide pitch shifters used autocorrelation starting in the 1980′s, to peform their splicing detection and pitch correction.

EDIT, June 2014: The next few paragraphs are just flat out WRONG. I’ll put the corrections in after these paragraphs, but will leave the following paragraphs unaltered for the historical record:

In addition, the basic pitch shifting method used by Hildebrand was described by Keith Lent in a Computer Music Journal article in 1989. The idea is to chop up the vocal signal into small windowed grains, where each grain holds a single period of the input signal, and then to spit the grains out at a rate corresponding to the new pitch. This can be viewed as a form of pitch synchonous granular synthesis, where both the grains and the grain rate are determined by analysis of an incoming signal. Lent’s technique has been used by most of the formant preserving pitch shifters, including algorithms from IVL, Digitech and Eventide. The technique was independently developed by France Telecom, and is often referred to as PSOLA (Pitch Synchronous Overlap and Add).

The key to Hildebrant’s innovation is how he combined Lent’s pitch shifting method with the robust pitch detection that autocorrelation provides. Lent’s original paper used a simple time-domain method for determining the input periodicity, which resulted in audible distortion for certain input signals. I have written Lent-style pitch shifters before, and the pitch detection algorithm is critical in avoiding octave jumps, unnaturally hoarse voices, or metallic syllabants. My code had all of those problems, although my boss at the time was able to fix many of the issues. Hildebrant’s patent describes how he uses sample rate reduction and some clever mathematical tricks to create a robust pitch detector that runs much faster than standard autocorrelation.

So, if you are watching The Backyardigans, and the overly pitch corrected vocals drive you crazy, don’t just blame Andy Hildebrant – blame Keith Lent.

UPDATE, JUNE 2014: I received an email from Andy Hildebrand, letting me know that I was incorrect about the Keith Lent / PSOLA algorithm being used in Autotune:

No, I don’t use the Lent algorithm: way too imprecise. You are close on the detection algorithm. The math in the patent is absolutely precise to what I do. But that math is used continuously to track pitch as well. I always know exactly what the pitch is. I run a simple rate converter from that point and when I have to repeat a cycle (going sharper) or delete a cycle (going flatter) I can because I know exactly what the period is at every instant.

FYI see our web site for guitar applications.

Best regards
Andy”

So, PSOLA is NOT used in Auto-Tune. And, it turns out I was spelling Andy Hildebrand’s name incorrectly. I must have been confusing the spelling with the Brothers Hildebrandt, which says as much about my own nerdy childhood as it does about my poor proofreading skills.

My apologies to Andy Hildebrand for the errors in the original article, and I thank him for writing in with the corrections.

I stand beside my dislike of The Backyardigan’s vocals, however. That is some annoying music right there.