Sunday, September 12, 2010

AP186 Act11 - Playing notes by image processing

Just as the title suggests, we will be playing some notes by using image processing. A simple musical piece was selected where only one note is present in every column. This is again another activity where we have to integrate the skills and lessons learned from past activities.

My selected piece was the nursery rhyme "Mary had a little lamb". (right hand only, no bass) Special thanks to Troy Gonzales for composing a copy for me. :)

music piece used for this activity

First I chopped up the image into two images, one for each line and they both have the same dimensions and as much as possible, the positions of the staves in the images are the same. Then their colors were inverted such that the notes are white and the background is black.


inverted and "chopped" version of the music sheet

My approach in detecting the notes is to correlate them with templates of different notes. Luckily, this piece has only two kinds of notes: quarter note and half note. So, templates of these notes where made with the note being white and the background being black.

quarter note and half note templates for matching

The 2 lines of the piece were loaded and binarized by determining the threshold from their grayscale histograms:

black and white version of the two lines of notes


Then their correlation with the quarter note and half note templates were taken. The result is a grayscale matrix and thresholding was once again done to reduce it into blobs. These blobs represent locations where there is a match. Afterwards these blobs were reduced into points or a single pixel for the next step - determining their relative locations from the staves.

line 1 correlated with the quarter note

line 1 correlated with the half note

black and white version of correlation of line 1 with quarter note

reduced to single pixels (line 1 corr w/ quarter note)

reduced to single pixels (line 1 corr w/ half note)

However, one problem encountered was that since the quarter and half notes differ by only just a little (shaded and not shaded), their correlation values with one another are high. Therefore when I used the quarter note template, all the quarter and half notes showed up. Adjusting the threshold didn't help as the true positives would disappear along with the false ones. Instead I remedied this by also taking the correlation with the half note and thresholding it such that only the quarter note showed up... and from these information, the type of note can be deduced.

The next task is to determine their relative vertical positions in order to determine their frequency. Due to the correlation, there is an offset between the position of the pixel from where the "body/blob" of the note should be. This is easily fixed by determining that offset. Also, the positions of the lines in the staves as well as the center of the spaces between them are taken. Then using an if-elseif-else statement, the relative positions of the notes can be determined. An error of +- 3 pixels is incorporated to compensate for the small fluctuations of the positions of the notes.

Finally, by combining the frequency of the note and it's timing (quarter or half), the piece can be played by the computer and the sound file is saved using wavwrite function. Here it is:

http://www.mediafire.com/?4wlr5qfjwba88at

Yay! :D

Lastly to grade myself, I give a 10/10 for being able to produce the required output and understand and integrate past lessons.

Score: 10/10

I would also like to thank Dr. Soriano, Arvin Mabilangan, Gino Leynes, Troy Gonzales, BA Racoma, Tisza Trono, Joseph Bunao for the very helpful discussions. :)


References:
1. M. Soriano, "A11 - Playing Notes by Image Processing
2. Physics of Music - Notes, (http://www.phy.mtu.edu/~suits/notefreqs.html) for the frequencies of the notes.


Appendix: (Code)
// AP186 Act11 Playing note by image processing

function n = note(f,t)
n = sin(2*%pi*f*t)
endfunction;


A1 = gray_imread("C:\maryline1.png");
T = gray_imread("C:\quarternote.png");

// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);

// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));

//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);

[L, n] = bwlabel(Bbw);

// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end

// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);

FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));

//threshold chosen to be 0.96
Bbw = im2bw(B, 0.96);

[L, n] = bwlabel(Bbw);

// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end

allnotes = halfnotecorr;

// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);

[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));

halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end

// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);
t_quarter = soundsec(0.25);
t_half = soundsec(0.5);
t_error = soundsec(3);

Cnote = 2*261.63;
Dnote = 2*293.66;
Enote = 2*329.63;
Fnote = 2*349.23;
Gnote = 2*392.00;

err = 3;
offset = 17; // determined from pics
Clvl = 86;
Dlvl = 80;
Elvl = 73;
Flvl = 66;
Glvl = 60;

s=[];

halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);

// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);

for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end




///////////////////////////////////////////////////////



// FOR THE 2ND LINE OF MARY HAD A LITTLE LAMB
A1 = gray_imread("C:\maryline2.png");
T = gray_imread("C:\quarternote.png");

// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);

// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));

//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);

[L, n] = bwlabel(Bbw);

// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end

// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);

FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));

//threshold chosen to be 0.995
Bbw = im2bw(B, 0.96);

[L, n] = bwlabel(Bbw);

// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end

allnotes = halfnotecorr;

// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);

[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));

halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end

// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);

halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);

// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);

for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end


sound(s);

--------------------------end----------------------------

No comments:

Post a Comment