Welcome to the companion web site of our paper “Invariant Audio Prints for Music Indexing and Alignment”. This page provides illustrations of the experiments.
Cite the paper:
Rémi Mignot, Geoffroy Peeters
"Invariant Audio Prints for Music Indexing and Alignment"
21st Int. Conf. on Content-based Multimedia Indexing (CBMI)
Reykjavik, Iceland, September, 2024.
Here is an illustration of the results of the experiment "Audio Indexing and Segmentation of Medleys" of Sec. IV.A. A music medley has been created using excerpts of 6 pop songs. The excerpts last between 15 and 30 sec. According to the configuration, each excerpt is transposed by 0, ±1, ±2 or ±4 half-tones, and the time is stretched with a factor: 0 cents, ±33 cents or ±66 cents. For example, with 33 cents, the time is accelerated by a factor 1.26, and with 66 cents: 1.58. First without audio degradations, then with audio degradations: filtering, noise addition, 40kbps MP3 coding, and distortion (see the paper).
The objective of this experiment is to found the original excerpts of the reference songs among a catalog of approximately 40,000 songs, with their time position in the reference songs, and the used stretching factor. Note that the pitch shifting factor is not estimated. This experiment tries to demonstrate the robustness of the developped audio prints to different transformations: pitch shifting, time stretching and other degradations (noise addition, distortion, filtering, ...). Finally, using the time alignment of the method, we can estimate also the time stretching factor in order realign the reference excerpts to the modified medley.
Below, for each configuration, 2 music previews are available:
Remark: when the mouse cursor is over a cell of the table, the figure on the right displays the result for the corresponding configuration. The colored rectangles represent: the true songs on the bottom half, and the detected songs on the top half, and the white spaces of the upper part are for time ranges when no song is detected. The black lines (resp. red) plot the time mapping between the medley time (x-axis) and the time of the true (resp. detected) reference songs (y-axis). The slope of the black lines informs about the time stretching of the segments.
Remark: here it is only an illustration for one medley. The displayed results of the paper have been averaged over 730 different medleys. And remind that the audio prints of the catalog are computed on the original songs, without transformation.
Time stretching (cents) | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pitch Shiftings (½ tones)
|
|
Time stretching (cents) | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pitch Shiftings (½ tones)
|
|
This section illustrates the results of the experiment "Audio-to-Audio Alignment" of Sec. IV.B. An original MIDI file has been modified in order to constinuously change the tempo of the song. By comparing the synthesized audio signals of the original song and the modified song, this experiment tries to estimate the time mapping between the two signals.
To evaluate also the robustness to the other modifications, the pitch has been changed, by shifting the MIDI note numbers, and for the second set of tests, the instruments were also randomly changed, and the drum track was removed.
The configurations are:
Below, for each confugation, 2 music previews are available:
Remark: when the mouse cursor is over a cell of the table, the figure on the right displays the result for the corresponding configuration. The blue curve of the upper figure presents the time error of the estimation time mapping. The second figure compares the true time stretching factor (varying in time) and the estimated time stretching factor. Finally, the third figure displays the local distance matrix based on the Hamming distance of the Audio Prints.
Remark: here it is only an illustration for one MIDI song. The displayed results of the paper have been averaged over 238 different MIDI songs.
Original music ( ): | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Time stretching (cents) | ||||||||||||||||||||||
Pitch Shiftings
(½ tones)
|
|
Original music ( ): | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Time stretching (cents) | ||||||||||||||||||||||
Pitch Shiftings (½ tones)
|
|
Here is an additional experiment (not presented in the paper) for a real-world example. This experiment tries to align in time the original recording of "Little Wing" of the "Jimi's Hendrix Experience" with a cover played by Corey Heuvel on an acoustic guitar. After the analysis of the original recording and of the cover, the original recording is realigned in time to the cover, and added to the cover video.
For the first realigned video, the left channel contains the cover (unchanged) and the right channel contains the realigned original recording of Jimi Hendrix. For the second realigned video, only the realigned original recording is played.
Remark that whereas the original song has a tempo of 70 BPM approximately, the average tempo of the cover is almost 60 BPM, and changes a little in time. Additionnally, some transitions between the introduction, the verses and the solo are longer on the cover. For example, listen to the transition between the second verse and the solo at 1:42 of the cover: the time of the original recording is strongly dilated in order to be synchronised before and after the transition.
Original Jimi Hendrix - Little Wing | Cover Corey Heuvel - Little Wing (acoustic) | ||
---|---|---|---|
Alignement left: cover, right: aligned original | Alignement Only aligned original | ||
---|---|---|---|