My selected piece was the nursery rhyme "Mary had a little lamb". (right hand only, no bass) Special thanks to Troy Gonzales for composing a copy for me. :)
First I chopped up the image into two images, one for each line and they both have the same dimensions and as much as possible, the positions of the staves in the images are the same. Then their colors were inverted such that the notes are white and the background is black.
My approach in detecting the notes is to correlate them with templates of different notes. Luckily, this piece has only two kinds of notes: quarter note and half note. So, templates of these notes where made with the note being white and the background being black.
The 2 lines of the piece were loaded and binarized by determining the threshold from their grayscale histograms:
Then their correlation with the quarter note and half note templates were taken. The result is a grayscale matrix and thresholding was once again done to reduce it into blobs. These blobs represent locations where there is a match. Afterwards these blobs were reduced into points or a single pixel for the next step - determining their relative locations from the staves.
line 1 correlated with the quarter note
line 1 correlated with the half note
black and white version of correlation of line 1 with quarter note
reduced to single pixels (line 1 corr w/ quarter note)
line 1 correlated with the half note
black and white version of correlation of line 1 with quarter note
reduced to single pixels (line 1 corr w/ quarter note)
However, one problem encountered was that since the quarter and half notes differ by only just a little (shaded and not shaded), their correlation values with one another are high. Therefore when I used the quarter note template, all the quarter and half notes showed up. Adjusting the threshold didn't help as the true positives would disappear along with the false ones. Instead I remedied this by also taking the correlation with the half note and thresholding it such that only the quarter note showed up... and from these information, the type of note can be deduced.
The next task is to determine their relative vertical positions in order to determine their frequency. Due to the correlation, there is an offset between the position of the pixel from where the "body/blob" of the note should be. This is easily fixed by determining that offset. Also, the positions of the lines in the staves as well as the center of the spaces between them are taken. Then using an if-elseif-else statement, the relative positions of the notes can be determined. An error of +- 3 pixels is incorporated to compensate for the small fluctuations of the positions of the notes.
Finally, by combining the frequency of the note and it's timing (quarter or half), the piece can be played by the computer and the sound file is saved using wavwrite function. Here it is:
http://www.mediafire.com/?4wlr5qfjwba88at
Yay! :D
Lastly to grade myself, I give a 10/10 for being able to produce the required output and understand and integrate past lessons.
Score: 10/10
I would also like to thank Dr. Soriano, Arvin Mabilangan, Gino Leynes, Troy Gonzales, BA Racoma, Tisza Trono, Joseph Bunao for the very helpful discussions. :)
References:
1. M. Soriano, "A11 - Playing Notes by Image Processing
2. Physics of Music - Notes, (http://www.phy.mtu.edu/~suits/notefreqs.html) for the frequencies of the notes.
Appendix: (Code)
// AP186 Act11 Playing note by image processing
function n = note(f,t)
n = sin(2*%pi*f*t)
endfunction;
A1 = gray_imread("C:\maryline1.png");
T = gray_imread("C:\quarternote.png");
// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);
// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end
// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);
FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.96
Bbw = im2bw(B, 0.96);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end
allnotes = halfnotecorr;
// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);
[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end
// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);
t_quarter = soundsec(0.25);
t_half = soundsec(0.5);
t_error = soundsec(3);
Cnote = 2*261.63;
Dnote = 2*293.66;
Enote = 2*329.63;
Fnote = 2*349.23;
Gnote = 2*392.00;
err = 3;
offset = 17; // determined from pics
Clvl = 86;
Dlvl = 80;
Elvl = 73;
Flvl = 66;
Glvl = 60;
s=[];
halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);
// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);
for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end
///////////////////////////////////////////////////////
// FOR THE 2ND LINE OF MARY HAD A LITTLE LAMB
A1 = gray_imread("C:\maryline2.png");
T = gray_imread("C:\quarternote.png");
// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);
// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end
// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);
FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.995
Bbw = im2bw(B, 0.96);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end
allnotes = halfnotecorr;
// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);
[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end
// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);
halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);
// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);
for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end
sound(s);
--------------------------end----------------------------
References:
1. M. Soriano, "A11 - Playing Notes by Image Processing
2. Physics of Music - Notes, (http://www.phy.mtu.edu/~suits/notefreqs.html) for the frequencies of the notes.
Appendix: (Code)
// AP186 Act11 Playing note by image processing
function n = note(f,t)
n = sin(2*%pi*f*t)
endfunction;
A1 = gray_imread("C:\maryline1.png");
T = gray_imread("C:\quarternote.png");
// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);
// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end
// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);
FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.96
Bbw = im2bw(B, 0.96);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end
allnotes = halfnotecorr;
// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);
[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end
// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);
t_quarter = soundsec(0.25);
t_half = soundsec(0.5);
t_error = soundsec(3);
Cnote = 2*261.63;
Dnote = 2*293.66;
Enote = 2*329.63;
Fnote = 2*349.23;
Gnote = 2*392.00;
err = 3;
offset = 17; // determined from pics
Clvl = 86;
Dlvl = 80;
Elvl = 73;
Flvl = 66;
Glvl = 60;
s=[];
halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);
// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);
for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end
///////////////////////////////////////////////////////
// FOR THE 2ND LINE OF MARY HAD A LITTLE LAMB
A1 = gray_imread("C:\maryline2.png");
T = gray_imread("C:\quarternote.png");
// threshold for A1 is determined to be 0.3
A1bw = im2bw(A1, 0.3);
Tbw = im2bw(T, 0.5);
// correlate with quarternote
FTA = fft2(A1bw);
FTT = fft2(Tbw);
FTAconj = conj(FTA);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.88
Bbw = im2bw(B, 0.88);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
quarternotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
quarternotecorr(rhalfvalue, chalfvalue) = 1;
end
// correlate with halfnote
T = gray_imread("C:\halfnote.png");
Tbw = im2bw(T, 0.5);
FTT = fft2(Tbw);
B = fftshift(abs(fft2(FTAconj.*FTT)));
//threshold chosen to be 0.995
Bbw = im2bw(B, 0.96);
[L, n] = bwlabel(Bbw);
// To reduce blobs to dots
Bbwsize = size(Bbw);
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2));
for i=1:n
[r, c] = find(L==i);
rc = [r' c'];
r = rc(:,1);
c = rc(:,2);
chalfvalue = c(int((length(c)+1)/2));
rhalfvalue = r(int((length(find(c==chalfvalue))+1)/2));
halfnotecorr(rhalfvalue, chalfvalue) = 1;
end
allnotes = halfnotecorr;
// To determine which ones are halfnotes
strel = ones(25,25);
quarternotecorr_dilated = dilate(quarternotecorr, strel, [13,13]);
[r c] = find(halfnotecorr==1);
rc=[r' c'];
len = length(rc(:,1));
halfnotecorr = zeros(Bbwsize(1), Bbwsize(2)); // reuse variables
for i=1:len
halfr = rc(i,1);
halfc = rc(i,2);
value = quarternotecorr_dilated(halfr, halfc);
if value == 0
halfnotecorr(halfr, halfc) = 1;
end
end
// Determine the positions of the blobs and timing
[L, Nnotes] = bwlabel(allnotes);
halfnotecorr_dilated = dilate(halfnotecorr, strel, [13,13]);
// BWLABEL IS NOT RELIABLE!!
[row, col] = find(allnotes==1);
n = length(row);
for i=1:n
r = row(i);
c = col(i);
valueQTR = quarternotecorr_dilated(r, c);
valueHLF = halfnotecorr_dilated(r, c);
r = r + offset;
if r >= (Dlvl + 3)
f = Cnote;
elseif r > (Elvl+3)
f = Dnote;
elseif r > (Flvl +3)
f = Enote;
elseif r >= (Glvl + 3)
f = Fnote;
else
f = Gnote;
end
if valueQTR == 1
t = t_quarter;
elseif valueHLF == 1
t = t_half;
else
t = t_error;
end
s = [s note(f, t)];
clear r;
clear c;
end
sound(s);
--------------------------end----------------------------
No comments:
Post a Comment