Rendering the audio captured by a Windows Phone device

This small app displays the audio captured by the microphone of a Windows Phone device and displays it as a continous waveform on the screen using the XNA framework.

A slightly modified version of the app (changing color when touching the screen) can be found in the Windows Phone Marketplace.


In order to capture the audio on a Windows Phone device, you need an instance to the default microphone (Microphone.Default), decide how often you want samples using the BufferDuration-property and hook up the BufferReady-event. Then you control the capturing with the Start() and Stop() methods.

The microphone is giving you samples at a fixed rate of 16 000 Hz, i.e. 16 000 samples per second. There is a property SampleRate that will tell this value. This means that you won’t be able to capture audio of higher frequency than 8000 Hz (without distortion) according to the sampling theorem.

You are also limited when it comes to choose the value for the BufferDuration-property; it must be between 0.1 and 1 seconds (100 – 1000 ms) in 10ms-steps. This means that you must choose a value of 100, 110, 120, …, 990, 1000 milliseconds.

When the microphone event BufferReady is fired, you should call the microphone.GetData(myBuffer)-method, in order to copy the samples from the microphone’s internal buffer to a buffer that belongs to you. The recorded audio comes in the form of a byte-array, but since the samples are actually signed 16-bits integers (i.e. an integer in the range of -32’768 … 32’767), you will probably need to do some convertion before you can process them.

How this application works

The way this application works is keeping a fixed number of narrow images, here called “(image) slices”, arranged in a linked list. The images are rendered on the screen and smoothly moved from the right to the left. When the left-most slice has gone off the screen, it is moved to the far right (still outside the screen) in order to create the illusion of an unlimited number of images.

Each slice holds the rendered samples from the content of one microphone buffer. When the buffer is filled by the microphone mechanism, the rightmost slice (outside of the screen) is rendered with these new samples and started to be moved inwards the screen.

The speed of how fast the slices are moving across the screen is correlated to the duration of the buffer in such a way that the slices are moved a total of “one slice width” during the time the microphone is capturing the next buffer.

Since the buffer of captured audio is rendered as graphic on a texture as soon it is received, there is no reason to keep any old buffer data. Therefore the application only keeps one buffer in memory which is reused over and over.

A flag is set each time the microphone buffer is ready. Since the BufferReady event is fired on the main thread, there is no need for any lock-mechanism.

In the Update()-method of the XNA app, the flag is checked whether new data has arrived, and if so, the slice in line is drawn. In the Draw()-method, the slices are drawn on the screen and slightly moved as time goes by.

The complete Visual Studio solution file can be downloaded from here.

Here’s a description of the structure of the main “Game”-class.

Some constants:

Fields regarding the microphone and the captured data:

Choose a color that is almost transparent (the last of the four parameters; it’s the red, green, blue and alpha-component of the color). The reason is that many samples are drawn on top of each other, and keeping each individual sample almost see-through makes an interesting visual effect.

The drawing classes. The white pixel texture is doing all the drawing.

The size of each image slice.

There’s no need to keep a reference to the linked list itself; just the first and last link. These links keeps references to their neighbors. The currentImageSlice is the one to draw on the next time.

The speed of the slices moving across the screen.

In order to know how far the current samples should be moved, the application must keep track of when they appeared.

 The signal that tells the Update()-method that there is new data to handle.

 The density of samples per pixel.

Here’s the constructor. In it the graphics mode is set and the microphone is wired up and asked to start listening.

In the XNA’s LoadContent nothing is actually loaded since the app is not dependent on any predrawn images. The SpriteBatch is created, the white pixel texture is generated and the image slices are initialized (as black images).

The CreateSliceImages is calculating how many slices that are needed to cover the entire screen (plus two so there’s room for movement). In the end of the method the regular RenderSamples-method is called in order to initial all the images. Since there is no data yet (all samples are zero) it will generate black images.

The XNA’s UnloadContent is just cleaning up what the LoadContent created.

The event handler to the microphone’s BufferReady-event. It copies the data from the microphone buffer and raises the flag that new data has arrived.

The XNA’s Update method checks the phone’s Back-button to see if it’s time to quit. After that it checks the flag to see if new data has been recorded. If so, the new samples are rendered by calling the RenderSamles-method.

The XNA’s Draw-method takes care of drawing the rendered slices. It handles the two screen orientation modes; landscape and portrait, by scaling the images accordingly. If it is landscape mode the height of the images are squeezed and if it is portrait mode the width of the images are squeezed.

When all is setup, the method iterates through the images and render them one-by-one on the screen, adjusted a bit along the X-axis to make up for the time that has passed.

The RenderSamples is taking a RenderTarget2D as an argument, which is the texture to be drawn on. The routine iterates through the samples and render them one by one.