Acoustic Location

Where is that sound coming from?

For a while now I was wondering how hard is it to find out. Smart speakers are doing it with arrays of microphones and sophisticated DSP algorightms, but how complicated is it really? Is Gecho's hardware and CPU power enough to get at least some meaningful results? How complex code do we need?

In Theory...

Sound travels at around 343 m/s in the air at 20 degrees celsius (this speed varies with the temperature rather than atmoshperic pressure, here is a handy calculator). In other units that are easier for us to imagine, this is about 1235 kilometres or 767 miles per hour. Or, the other way around: what time it takes for sound to travel for example 10cm? It is 0.1 / 343 = 0.00029 seconds, or 290 microseconds. Is that too short? Perhaps our processors operate at that scale. Let's see.

The traditional method of determining direction in which sound source is located, is "time difference of arrival". Using only two microphones it is possible to tell where the sound source lies along one dimension. For example, if microphones are placed horizontally, we can tell left from right. If they are one above another, we can tell if the source is above, beyond or in front of us. The method relies of measuring of tiny differences between timing of two signals. If the sound source is "right in front", sound reaches both microphones precisely at the same time. But if the source is at angle, the distance to one of the microphones is smaller than to another and this results in small shift in time in which the sound waves arrive.

Note that we can't really determine the distance, as we don't know at which moment was the sound emitted. But the advantage is that we should get the same results regardless how far the sound source is (given that it is loud enough). Car beeping outside the window, or people chatting on the sofa - all that matters is the angle.

In Practice...

We want to find out what difference can we expect in real world scenario. How much later the sound wave arrives to one of the mics than another. Gecho's microphones are spaced 7.5cm apart, we will call this Dm. Let's consider a bird half metre away from them.

Pythagorean theorem says a2 = b2 + Dm2 and after replacing the known values (converted to metres) we get a2 = 0.52 + 0.0752 = 0.255625, therefore a = sqrt(0.255625) = 0.505593710... Looks like side a of the triangle is only 5.59mm longer than side b. Is it enough to measure any change already?

How long does the sound take to travel this distance? Expressed in metres and divided by speed of sound, it gives 0.00559 / 343 = 0.000016297 seconds, or 16.297ms. And compared to our sound processing speed - how much is it in relation to one sample at 22.050kHz? It takes 1/22050 = 0.000045351 seconds, or 45.35ms. Three times more. This is great, it is not 10 or 100 times more, and it means measuring such small changes is within our reach.

If the bird moved 7.5cm and ended up facing MIC1 for a change, the same situation happens in reverse - now it is MIC2 that receives the signal 16.3ms later. If this movement of 7.5cm at a distance of 50cm is resulting in 32.6ms of relative difference in signal arrival, we can see that to get to 45.35ms of difference (one sample) the bird needs to move by: 7.5 / 32.6 * 45.35 = 10.433 (centimetres).

To know by how much the sound source has moved, we simply measure how the signal shifted. With current hardware, the shift of one sample indicates that source has moved by cca 10.5cm. It follows that by increasing sampling rate to 44.1kHz we can double the precision. Similarly, if mics were spaced twice as far from each other, proportionally smaller movement would result in the same shift in signal. Of course, the reasoning demonstrated here is only a crude approximation and does not hold for wider angles, because the relation is not linear as it appears judging from these two cases, however for practical purposes it is sufficient.

The Code

For details about what's needed for expanding Gecho's functionality, please see some of the previous tutorials which explain how to build a new channel and how to flash your new firmware to the unit. You can implement this experiment in a stand-alone function that does not rely on other channels, like it was shown here in Granular Sampler example.

Let's add this test as a new channel, for example #11223. In file Channels.cpp, function custom_program_init(), we will add this if statement to invoke our new function.

else if (prog == 11223) //acoustic location test
ADC_configure_MIC(BOARD_MIC_ADC_CFG_STRING); //use built-in microphones

The body of the function can be placed for example to file Test_fn.cpp (and don't forget to add function declaration to Test_fn.h too, so it is visible to other modules). Here we allocate buffers for sampled sound (512 samples each), reset codec and LEDs.

void acoustic_location_test()
#define N_SAMPLES 512
float *capture_buf_l, *capture_buf_r;
capture_buf_l = (float*)malloc(N_SAMPLES * sizeof(float));
capture_buf_r = (float*)malloc(N_SAMPLES * sizeof(float));

The rest happens in an infinite loop. First, capture the sound until buffers are full. Let the blue "signal" LED indicates how long it takes.

for(int i=0;i {
capture_buf_r[i] = (float)(4096/2 - (int16_t)ADC1_read()) / 4096.0f * PREAMP_BOOST;
capture_buf_l[i] = (float)(4096/2 - (int16_t)ADC2_read()) / 4096.0f * PREAMP_BOOST;
//timing by Codec
while (!SPI_I2S_GetFlagStatus(CODEC_I2S, SPI_I2S_FLAG_TXE));
SPI_I2S_SendData(CODEC_I2S, 0);

In following block we are going to collect differences between both buffers, not only as they were recorded, but shifting them relatively certain amount of samples up and down. We are moving "from" and "to" locations to avoid comparing data outside of a buffer, and the shift variable moves from negative to positive values, centered around 0. The scale in which this happens is defined by TEST_RANGE value; from our previous calculations, a shift of 24 samples would mean sound source moving by 2.5 metres (if it worked in a linear way).

#define TEST_RANGE 24
float differences[TEST_RANGE];
for(int test_pos = 0;test_pos {
differences[test_pos] = 0;
int shift = test_pos - TEST_RANGE/2;
int from,to;
from = 0;
else if(shift<0)
from = -shift;
else if(shift>0)
from = 0;
to = N_SAMPLES-shift;
for(int i=from;i<to;i++)
differences[test_pos] += fabs(capture_buf_l[i] - capture_buf_r[i+shift]);

Finally, time to get results. The simplest way is to look for minimal value in the differences array. Since the range is wider than number of available LEDs, we need to remap it by simple scaling function. Resulting position in the array, representing sample shift with smallest difference, will be reflected to one of the 8 red LEDs, glow for 100ms, and then reset.

int minimum_pos = find_minimum(differences,TEST_RANGE);
if(minimum_pos>=0) //if any sound detected
int to_scale = minimum_pos / (TEST_RANGE / 8) + 1;
if(to_scale>7) { LED_R8_7_ON; }
else if(to_scale>6) { LED_R8_6_ON; }
else if(to_scale>5) { LED_R8_5_ON; }
else if(to_scale>4) { LED_R8_4_ON; }
else if(to_scale>3) { LED_R8_3_ON; }
else if(to_scale>2) { LED_R8_2_ON; }
else if(to_scale>1) { LED_R8_1_ON; }
else if(to_scale>0) { LED_R8_0_ON; }
} //end while(1)

Curious how it performs?

Check it out! :)