Summary
I developed a respiration detection algorithm for a VR meditation biofeedback application. The user sees their breath in VR as a sphere that expands as they inhale and contracts as they exhale. I used used a Polar H10 accelerometer to measure respiration. I was responsible for the algorithm development, data collection, user experience design, and system implementation. I worked on this project with Lynda Joy Gerry at the University of Auckland’s Empathic Computing Lab.
Project Description
Background
An academic lab is developing a VR meditation biofeedback system.
The biofeedback signal is the user’s respiration waveform (i.e., real-time respiration signal, as opposed to respiration rate).
Goal
A sphere expands and shrinks as the user breaths.
The sphere expands as the user inhales, and shrinks as the user exhales.
Spec
Use the commercially-available Polar H10 wearable chest strap, which provides 3-axis accelerometer data.
The Polar H10 is placed around the participant’s torso, beneath their chest.
Participants perform a meditation while seated in a stationary chair.
Receive data in Unity via Lab Streaming Layer (LSL) stream.
Process
Set up & troubleshoot third party application/library to evaluate viability. We opted to use this application/library.
Create accelerometer data recording interface in Unity.
Collect respiration datasets for development, placing the sensor in different positions and varying breath type (e.g. chest vs. belly).
Prototype and refine respiration detection algorithm in a Jupyter Notebook, using the development datasets (with emulated packets to match the online processing requirement).
Place algorithm logic into a Python server. Run server locally on the development machine and connect the Unity application to the server via websockets for low-latency communication.
Write a C# script in Unity to control the biofeedback animation based on the breath events.
Assess and improve round-trip latency.
Conduct user testing.
Synthesize results and determine next steps.
Algorithm Strategy
When the user inhales, their diaphragm expands, so the acceleration on the z-axis will be positive. When the user exhales, the diaphragm contracts, so the acceleration on the z-axis will be negative. This implies that inhales and exhales both begin at zero-crossings, and the slope of the data around the zero-crossing determines whether it’s an inhale (positive slope) or exhale (negative slops).
Since our goal is for the user to receive visual biofeedback from a sphere that grows they inhale and shrinks when they exhale, there are at least two potential approaches.
Event-based. In this strategy, we detect discrete respiration events (most obviously, inhales and exhales, but we could get more nuanced). When we detect an event, we trigger the corresponding animation. For example, an inhale event would trigger the “grow” animation for sphere. An exhale event, by contrast, would trigger the “shrink” animation for the sphere. Another way to conceive of this strategy is that we use our prior knowledge of the respiratory waveform/physiology to define breath events which should trigger the sphere’s animation.
Pros: Simple. We don’t have to worry about mapping from acceleration to displacement (nor do we have to consider the nuances of diaphragmatic displacement vs. diaphragmatic volume vs. air flow).
Cons: The user’s control of the sphere is limited to the events we decide to detect, such as inhales and exhales. We have to test whether the user feels in control of the sphere.
Direct. In this strategy, we would work with the full respiration waveform, directly changing the sphere’s size as a function of the respiratory signal.
Pros: The biofeedback visualization is highly responsive to the user, potentially increasing the user’s felt sense of agency.
Cons: Complex. We have to transform the acceleration to displacement (or breath phase). We need to be very careful about our filtering, because the user will be able to see subtleties in the waveform that don’t necessarily correspond to their breath (e.g. a movement artifact).
For simplicity, we chose the event-based approach. As a result, we incorporated predictions of the inhale/exhale durations so we can time the animation. Additionally, we use logic in the animation code to handle over/under estimation of the inhales or exhales relative to their actual durations.
Algorithm Design
DEMEANING
Our scheme to detect inhales based on positive z-axis acceleration values and exhalations based on negative z-axis acceleration values depends on the z-axis accelerometer data being centered on 0. However, this isn’t the case. If the z-axis of the accelerometer has any tilt relative to the plane tangent to the earth’s surface (in this case, we specifically care about the sagittal direction), it will have a non-zero mean.
Below is a real dataset I collected wearing the Polar H10. I used two different breathing styles, the boundary between which I made visible by hitting the sensor to introduce a large motion artifact (around 16 seconds). In the first breathing style, I was breathing normally without any intentional control over my breath. In the second breathing style, I was intentionally taking large, strong inhales and exhales. Immediately, it’s clear that the instructions we give to the participant about how to breath (and the idiosyncrasies of a given participant’s respiratory physiology) will result in significant variation in the respiratory waveform.
Therefore, we need to subtract off a moving average. We tested different window sizes empirically and settled on 1000 samples (200Hz -> 5 seconds). A future improvement might be to adjust the window dynamically based on the wavelength of the breath.
Notice that, due to the realtime setting, it takes ~5s (i.e. our moving average kernel size) before the data are centered about zero. This means that for the biofeedback system, we simply wait for the user to take several calibration breaths before we begin displaying the sphere animation.
Amplitude normalization
We opted not to normalize the amplitude of the demeaned data (by its standard deviation) because the amplitude variability does not affect our respiration detection algorithm, and the magnitude of the acceleration contains relevant information about the breath which we may use in a future iteration of the algorithm.
Smoothing
Next, we smooth the data. Given the visible high-frequency data, we employ a low-pass filter. Additionally, this is a real-time setting where the breath events must be detected at sub-perceptual latency relative to the true event; this means our filter needs to have minimal phase delay, lest we detect a breath event after it has occurred due to the phase shift in the data induced by the filter.
Noise can be introduced into the signal in several ways. There is baseline high-frequency noise, either due to intrinsic properties of the sensor or subtle movements of the participant. Additionally, the user’s heartbeats are visible as vertical spikes in the data. Finally, around the 21 second mark we can see a vertical deflection in the data. This was likely a motion artifact, and rather than handling it in the filtering step we’ll handle it with dedicated event detection validation logic, since the frequency band of this motion artifact is too close to the frequency band of a true breath.
After experimentation, we selected a simple moving average with window size s and stride d. In addition to performing a low-pass filter, this downsamples the data as well (the sensor records 200Hz data, which is much higher than we need to detect respiration).
There are clearly some imperfections in the smoothed signal. For example, the large spike at 16 seconds makes it through. This spike was introduced intentionally by hitting the sensor. We also see the filtering fail at 21 seconds, as we discussed above. Around 27 seconds, at the peak, the filtered signal still shows a dip. Similar to the artifact at 21 seconds, the frequency content of this dip is too close to that of the true respiratory waveform for us to filter it out without losing information about the respiratory waveform.
Down the road, we may improve this filtering process. However, for the current system where we just need to detect the zero-crossings, this method is sufficient.
Breath event detection
Finally, we detect the breath events.
Detect zero crossings. We detect an inhale initiation when the demeaned, smoothed signal crosses zero moving in the positive direction. We detect an exhale initiation when the demeaned, smoothed signal crosses zero moving in the negative direction.
Validate the breath event. Upon detecting a breath event (inhale or exhale), we validate it. We tried several different validation heuristics, and ultimately determined that two simple heuristics capture a large percentage of false events.
Duration since last breath event. If we detect a breath event a very short period of time after the previous event (e.g. 200ms), we can be confident this isn’t a real breath event. Therefore we use a duration threshold to specify the minimum time between two subsequent breath events.
Data span is above a certain threshold. If the span of the accelerometer data is too small relative to the magnitude of the respiration waveform, it’s probably not a breath, either. Therefore, to validate a breath event, we check whether the span of the current zero-crossing window is above a threshold.
Predict the breath duration. Finally, since our intention is to trigger the sphere’s animation with each breath event, we need to predict how long the breath event will last. We do this simply by averaging across the last n breath events (e.g. to predict an inhale duration, we average across the duration of the last three inhales). The animation logic uses this predicted value to set the speed of the animation.
We can see that before the data is zero-centered, the algorithm mis-performs. This is handled by the breath calibration period at the beginning of the biofeedback experience.
Next Steps
There are various directions to pursue for improving the respiration detection algorithm, and we are prioritizing them based on the user experience. Several cases to handle are:
The participant moves and/or sensor moves during the meditation session.
Making the algorithm more robust to the sensor placement by incorporating the x-axis accelerometer data (i.e. accounting for the tilt of the sensor in the sagittal plane).
Modifying the algorithm to handle certain breathing patterns where it currently fails, such as box-breathing.