Using Machine Learning to standardize sleep scores across wearables
Sleep scores are provided by various wearable manufacturers such as Oura, Fitbit, and Whoop. They're used to signify how well you slept and usually range from 0-100. The higher the score the better you slept. We used ML to create a provider agnostic score.
At Vital we are obsessed with data. At our company offsite we recently had a team hackathon where we looked into standardising sleep scores across various providers using machine learning.
What are Sleep Scores
Sleep scores are provided by various wearable manufacturers such as Oura, Fitbit, and Whoop. They're used to signify how well you slept and usually range from 0-100. The higher the score the better you slept. We decided to look into how these scores are calculated to see if we could come up with our own score that is provider agnostic.
1. What biomarkers correlate with Sleep Scores
The first step is to understand what biomarkers correlate most with Sleep Scores, to do this we connected our own Oura rings and Fitbit devices using Vital. We then looked at each sleep score being returned by the providers and calculated the Pearson coefficients to understand what sleep biomarkers correlated most with the scores from the provider.
Correlation values close to 1 means there is a strongpositive correlation, values close to 0 means there is weak to no correlation and values closer to -1 means there is strong negative correlation. As we can see the factors that affect most sleep scores are the time in rem, total time asleep, the midpoint time in your sleep cycle, efficiency as well as your consistency (bedtime_start_delta).
Using the above knowledge we decided to take the factors that were strongly correlated with existing sleep scores and train a model using these as input parameters.
2. Training a model
These days there are multiple libraries and prebuilt models that can be utilised to train a machine learning model with little ease. After building a basic 2 layer neural net in pytorch and seeing some progress, we decided to increase the complexity of our model.
Using a Tabular Model we trained a model to predict sleep scores using the above biomarkers as input parameters, and the output of the model was the predicted sleep score.
By using data that spanned multiple providers we were able to create a model that could predict sleep scores regardless of the provider that was inputting the data. As you can see by the validation and training loss, after 20 epochs the loss was decreasing hence, the model was learning.
We trained the model for over 100 epochs and began to see the validation loss stabilising, which meant the model had stopped learning. We then tested our model on a separate test dataset that was independent of our model training. We did this by inputting the biomarkers from the test dataset into our model and getting the predicted scores. We then compared our predicted sleep scores against the provider sleep scores and determined what the average difference in scores was.On average our model was predicting sleep scores that were only 2 points away from the sleep scores that wearable manufacturers such as Oura and Fitbit were producing.
We're excited to see where else we can provide this standardisation of scores going forward too.