Experiment: Segmentation and alignment of Goldberg's variations.

Experiment: Segmentation and Time Alignment of (some) Goldberg's Variations.

Rémi Mignot

Presentation of the tests

Here is an experiment of segmentation and audio-to-audio time alignment of Goldberg's variations. The Goldberd's Variations of J.S. Bach are a series of 32 music pieces initially played on harpsichord, and commonly played now on piano. A lot of recordings have been made, e.g. the famous Glenn Gould's recordings, and in some cases, repetitions are made (see e.g. https://github.com/petal2020/petal_bach_goldberg-variations ). Using the recording of Cédric Pescia, without any repetitions, as reference, we try here to detect the repetition using an automatic segmentation method. Then, the detected segments are aligned in time. The test are based on the method of: “Invariant Audio Prints for Music Indexing and Alignment”, see https://anasynth.papers.ircam.fr/2024/CBMI/ .

For some variations (the Aria, and the variations 1, 3, 5, 7, 18, 25 and 26), we got the recordings of the following musicians:

Cédric Pescia (2004, for piano) (see youtube playlist),
this recording is used as reference, because we know that it does not contain any repetition.
Glenn Gould (1955, for piano) (see youtube playlist),
Glenn Gould (1981, for piano) (see youtube playlist),
Wilhelm Kempff (1969, for piano) (see youtube playlist),
Lang Lang (2020, for piano, studio recording) (see youtube playlist),
Andreas Staier (2009, for harpsichord) (see youtube playlist),
Wanda Landowska (1933, for harpsichord) (see youtube playlist).

For each variation, using the Pescia's recording as reference, named Ref, the recording of the other musicians are analysed, named Test.

The signal of Test is segmented using Ref as reference signal,
Each detected segment in Ref is realigned in time with the corresponding segment of Test.

Presented sounds and figures

For each test, one audio is created, with two stereo channels:

Left channel: Full unchanged track of the tested recording (Test) of a musician other than Cédric Pescia.
Right channel: Detected segment of the Pescia's recording realigned in time with the correponding segment of the tested recording.

So, we can hear several times the recordings of Cédric Pescia, but synchronised on the (time-varying) tempi of the other recordings. For example, the tempi of the Aria played by Glenn Gould are very different according to the version (1955: fast, and 1981: slow).

Remark: this transformation is based on a phase vocoder with time varying stretching factor. In some cases, with strong factors, the sound quality of the transformed segment is not good.
The following tables present for each tested variation the previews of the original recordings, and of the realigned recordings.

Listen to these previews to evaluate the accuracy of the time alignment estimation proposed in this work.

When the cursor is over a realignment audio preview, two figures appear below the table. The axes are:

x-axis: time of the tested recording (in seconds on the left figure, and in frame indice on the right figure),
y-axis: time of the reference recording of Cédric Pescia,

The two figures are:

Left figure: Detected segments and time alignment.
Right figure: "Local distance matrix". It is the Hamming's distance of the binary audio codes (200 bits), used for the indexing, segmentation and alignment.

Preliminary comments of the results

As a general conclusion about the results, presented below, the segmentation is almost good, and the time alignment is not bad.

Few tested variations are not well segmented, especially this of Wandna Landowska because: first the instrument is different than the reference instrument (harpsichord vs. piano), the recording is old (1933) and has noise, and some repetitions are different (repetition ABA' see https://github.com/petal2020/petal_bach_goldberg-variations/blob/main/Goldberg%20Variations_Repeats.tsv).

Nevertheless, the Andreas Staier's recording, also with harpsichord, is well segmented and aligned. Moreover, we can note that the tuning used for the recording is different that the tuning of Pescia's recording. Thanks to the invariance properties of the used audio codes (pitch shifting, and timbre change), the method succeds in this case.