Volume estimation of fluid intake using regression models

Monitoring of water intake is critical for managing the health and wellness of individuals with various health conditions, including young children, sick adults, the elderly, and individuals seeking better weight control. The research presented in this paper studies the use of different regression methods to estimate water intake using wireless surface electromyography (sEMG). The advantage of using regression is that it can provide more consistent values for different swallow volumes. In addition, the setup reported in this research employs a less controlled environment, providing stronger evidence of the practical feasibility of the used setup. Neural networks-based regression achieved an R2 of 0.99 and a root-mean-squared error of 0.14 and 0.08 after feature selection. The relative immunity of sEMG as a sensing technique and the accuracy levels achieved with the used mobile sEMG device can provide a robust system for volume estimation of fluid intake in real-world situations.


Introduction
Water is the most important nutrient for human health.Its health advantages are numerous [1], as it helps regulate blood pressure, lubricate the joints, regulate body temperature, and deliver oxygen throughout the body by forming 90% of blood constituency, just to name a few.Fluids in general and water in particular are essential for body hydration.Diseases of the mind and body can develop if the usual body fluids are depleted due to dehydration.Breath, urine, and the skin all contribute to the ongoing loss of bodily fluids [2].Most healthy individuals control their body fluid levels by drinking according to their feeling of thirst [3].However, this is more challenging for young children, sick adults, and the elderly.The ability to quantify fluid intake can thus prove critical to ensure health for individuals who need help with managing their hydration levels.Various smart bottles for tracking water intake have lately been introduced in the market [4,5].Using a phone app connected to the bottle via Bluetooth, these bottles can alert the user to drink more fluids in order to meet her daily hydration goals.This solution has obvious practical constraints.Nevertheless, these products are yet to gain mass adoption.
There are numerous methods for monitoring fluid intake.Cohen et al. provides a summary of various strategies [6].Some of these strategies rely on swallowing activity signals.Other methods made use of wrist movement or vision-based technologies.To detect drinking activities, vision-based techniques used cameras and computer vision algorithms or deep learning [7][8][9][10].Many studies in this area have made use of Microsoft Kinect, which calculates depth and captures RGB images.Tham et al. detected numerous hand positions throughout the drinking activity using a Microsoft Kinect placed in front of the individual [7].They concentrated solely on exploiting the detailed information to eliminate privacy concerns.Using dynamic time warping (DTW), drinking events were categorized with 89% accuracy [7].Several studies [11][12][13][14][15] employed wrist-mounted inertial measurement units (IMUs) with threshold-based algorithms to measure liquid consumption.Shen et al., for example, used thresholding to segment events depending on wrist roll values.In an unconstrained situation, they reported a low sensitivity of 66-75% for drinking detection [16].
Liquids are moved from the oral cavity to the stomach via the swallowing process [17].In order to observe and quantify someone's fluid intake, monitoring their swallowing activities can provide an accurate and fairly seamless surrogate [18,19].
Textile applications can provide more information including chewing and swallowing detection.However, they are less practical and are often incorporated into shirts in the form of a turtleneck.It takes the form of bands around the neck, containing electrodes detecting swallows.
Cheng et al. initially measured variations in capacitance in the pharynx using textile-based electrodes inserted into a turtleneck shirt [20].This technique was utilized to detect eating, swallowing, speaking, and sighing in various head orientations while sitting or walking [20].Despite the authors' claims that the proposed textile technique did not require direct skin contact or significant body fixation, a large quantity of data was lost in their preliminary tests.Overall classification accuracy was 77% when sitting and 69% when walking when utilizing a threshold-based method [20].
It was reported that fluid intake could be automatically monitored using a throat microphone or mechanical sensors, but the volume of fluid consumed was not calculated in that study [19].In another study, a throat microphone was used to estimate the amount of fluid a person consumed, with an accuracy rate of 80% for amounts between 5 and 15 ml per subject [21].
Another investigation employing a microphone and two surface electromyography (sEMG) channels focused on the behavior of swallowing and chewing as well as bolus volume and material consistency [22].Fluid intake estimation accuracy increased from 73 to 84% when sEMG and a microphone were used together.Previously reported study focused on estimating fluid volume intake using sEMG [23].This previous study focused on modeling the water sip swallows.Swallows were modelled as either one sip or two sips.The results were promising for many data points (accuracy above 99.5%),but this approach was not enough to model all possible sip swallows.Regression techniques are among the most popular statistical techniques used for predictive modeling.The main idea of regression is to minimize a predefined error according to the regression model type.Decision tree (DT), extra trees (ET), ada boost (AB), gradient boosting (GB), support vector regression (SVR), and Gaussian regressors (GBR) are well-known types of regression [24,25].
• DT regression: A decision tree constructs regression as a tree structure.It incrementally develops an associated decision tree while breaking down a dataset into smaller and smaller sections.Regression tree is also a binary tree where each branching node is split based on the values of input and output [24,25].• ET regression: It employs averaging to increase predicted accuracy and control over-fitting by fitting a number of randomized decision trees on various subsamples of the dataset [24,25].• AB regression: A meta-estimator that starts by fitting a regressor on the original dataset and then fits more copies of the regressor on the same dataset but where the weights of instances are altered dependent on the error of the current prediction [24].• GB regression: It constructs an additive model in a forward stage-wise manner and can optimize any differentiable loss functions.At each level, a regression tree is fitted using the negative gradient of the specified loss function [24].• SVR: Used to find a function (hyperplane) that approximates the relationship between two continuous input variables while minimizing prediction error [25].
This study employs the mentioned regression schemes for the purpose of more accurately and more generally estimate various sip sizes as can occur in real-world situations.Another contribution of this study is using an experimental setup that more closely mimics real-life, less-controlled environments.

Experiment protocol and signal acquisition
Twenty healthy individuals (age: 25 ± 1.8 years, BMI: 25.7 ± 4.5 kg/m 2 ) participated in the experiment.All participants provided informed written consents prior enrollment to the study including consent to publish.Participants did not present any medical condition that may interfere with the normal swallowing process.Each participant performed the experiment in one 20-min session, where drinking activities were investigated.Drinking activities included 40 water sips equally split among 4 fixed volumes (10, 20, 30, and 40 ml), for a total volume of 1 l per participant in each experiment.Sip volumes were controlled and verified through using graded cups prior to starting each experiment session.For the first 10 s, the participant did not have any water intake activity.Within the next 10 s, the participant raised the water cup and sipped the water in it without swallowing it.Then, the participant swallowed the water during the last 10 s.The experiment protocol is shown in Fig. 1.A total of 800 water swallows were collected.Swallowing signals were collected via a wearable sensor (a mobile BITalino single electrode sEMG).The BITalino revolution kit is a Bluetooth compact bio-signals platform designed for research purposes [26].The sEMG electrode (Fig. 2) was placed on the left sternocleidomastoid muscle, and signals were acquired using the BITalino open-source software.The left sternocleidomastoid was chosen as it is considered the least uncomfortable neck location to place the electrodes during swallowing.This muscle has also been proven in previous studies to produce swallowing signals of better quality [27].

Volume prediction
The research problem was divided into two stages: swallow detection (preprocessing) and volume estimation (feature extraction, feature selection, and regression models), as illustrated in Fig. 3.

Swallow detection (preprocessing)
The collected EMG signals (phase S3 in Fig. 1) were processed using MATLAB R2020a version.The segments of interest for each signal record were divided into drink swallows.These swallow events appeared as sudden changes in each record.There are many algorithms that can be used to detect sudden changes in bio-signals.One of these algorithms is memory-based graph theoretic technique (MB-GT).This is a rapid adaptive technique for detecting sudden changes.It may operate on arbitrary unknown data distributions before and after a change.It computes the average Euclidean distance between all pairs of data points prior to and following the hypothesized change [28].The algorithm operates on two windows of varying sizes, taking into account all conceivable partitions of a memory buffer of size N containing previous data readings.The algorithm computes Euclidean distance in proportion to the likelihood that a change happened within the current buffer's memory span.MB-GT shows more accurate detection than other detection algorithms when applied to a single sudden change in the record (one sip swallow) [25].We applied MB-GT on the S3 segment of records shown in Fig. 1.This segment corresponded to one sip event.

Volume estimation
Forty features calculated in time domain and commonly used with EMG signals in the literature were used in this study [29][30][31].These features are selected due to their simplicity and promising performances in previous work [30,31].These features are described in Table 2 (in the Appendix).We used chi-square test as feature selection technique.
The sips dataset was split into two groups (75-25%): 600 sips were used for training, and 200 sips were used for testing.Since sEMG data usually suffers from intersubject variability, we used among-subjects validation as a validation method.This means that both training and testing data were sampled evenly from all subjects.The regression approach was used to generate direct estimates of sip volumes using the forty features described before.DT regression with squared error was used to measure the quality of a split, as well as all features.Nodes were expanded until all leaves were pure.Moreover, ET regression with 100 trees in the forest were used in the proposed study with squared error loss function.Also, in AB regression with learning rate value 1, 50 estimators were used.When updating the weights after each boosting step, a linear loss function is utilized.GB regression with a squared error loss function, a learning rate of 0.1, and 100 estimators were used.Furthermore, SVR using RBF kernel with degree 3 and C = 1 was employed in this study.Also, GBR were used for this purpose and validated in a 5-fold cross-validation Fig. 3 A block diagram of the proposed fluid intake volume prediction method manner.Furthermore, a two-layer feed-forward neural network was tested with both Levenberg Marquardt (NN-LM) and Bayesian regularization (NN-BR), along with 50 hidden layers and using 100 epochs [23].
The main evaluation parameters in regression models are root-mean-square error (RMSE) and a coefficient of determination (R 2 ).RMSE measures how far predictions deviate from measured true values using Euclidean distance.To compute RMSE, compute the residual (difference between prediction and truth) for each data point, the norm of residual for each data point, the mean of residuals, and the square root of that mean [32].R 2 determines how effectively a statistical model predicts an outcome.The model's dependent variable represents the outcome.R 2 can have any value between 0 and 1, with 0 being the lowest and 1 being the highest.The better a model predicts, the closer its R 2 will be near 1 [32].

Swallow detection
A swallow detection window using MBGT is shown in Fig. 4. As shown, the MBGT provides a window that delineates the start and end points of each swallow signal.

Volume estimation
The RMSE and R 2 values for each of the training regression models using 5-fold validation are shown in Figs. 5 and 6.GPR, NN-LM, and NN-BR are the best training models (lowest RMSE and highest R 2 ).Table 1 shows the RMSE and R 2 values for testing with the best training models.The most accurate outcomes are produced by NN-BR. Figure 7 depicts the relationship between predicted (output) and observed (target) swallow volumes using NN-BR using all features.

Fig. 4 MBGT detection window
Figure 8 shows a bar plot with the chi-square test scores for the forty features.After removing the fourteen features with low scores (P-value > 5%), twenty-six features remained.Figure 9 shows the final results for the best three regression models with all features and with the remaining twenty-six features after using chi-square test.

Discussion
In this study, we proposed a method for fluid intake volume estimation using sEMG.The method was evaluated using a dataset of 800 water swallows collected from 10 healthy individuals.The results of this study show that the proposed method was able to estimate the volume of water intake with a high degree of accuracy with a root-mean-square error (RMSE) of 0.14 and 0.08 after feature selection and a coefficient of determination (R 2 ) of 0.99.There have been very few studies that attempt to estimate fluid volume using sEMG.Malvuccio et al. and Ismail et al. used sEMG recordings of both individual and continuous swallows to estimate the amount of fluid consumed; however, their study had a higher RMSE than ours [33,34].The  incorporation of more features and different regressors results in a lower average RMSE for the system than other studies [33,34].As seen in Fig. 9, using the chisquare test improves the RMSE.This is important for a number of applications related to maintaining or restoring the state of health for individuals with certain conditions.These applications include those for young children with type II diabetes and those for the elderly with dementia [35].Such circumstances could benefit from a technology that can estimate fluid or water intake in real time.For example, hydration of an elderly living independently can be monitored remotely using a device utilizing the methods reported in this paper.Another example is the ability to give a dietician follow-up on his or her patients to remotely monitor their fluid intake patterns.
The proposed method has a number of advantages over other methods for fluid intake volume assessment.First, the method is able to accurately estimate the volume of water swallowed even in the presence of external audible noise -a disadvantage attributed to microphone-based methods.Second, the method is able to work with good accuracy in real time.Third, the method is non-invasive and does not require any sophisticated equipment.Finally, the method lends itself to being implemented using flexible electronics utilizing the edge-computing paradigm.
The presented study, however, has a number of limitations.First, the tested method was only evaluated using a small dataset of water swallows collected from healthy individuals.Further research is needed to evaluate the performance of the method in a larger and more diverse population with sufficient representation of various health conditions.Second, the method is only able to assess the volume of water swallowed in a single sip.Further research is needed to develop a method that can predict the volume of water swallowed over multiple sips.
The implementation of among-subject validation scheme can be explained by the well-known high intersubject variability in sEMG.One direction for the future of this platform could be to investigate various pre-processing techniques for the sEMG signals and use a range of features to cut down on variability with different validation schmemes.
Overall, the results of this study show that the proposed method is a promising new approach for fluid intake volume estimation.For example, the study reported in [25] showed that sEMG could be used to estimate water intake volume with an RMSE of 0.25.However, that study was conducted in a lab-controlled environment, while the current study was conducted in a more natural setting.The results of the current study suggest that sEMG can be used to quantify fluid volume intake in a more natural setting with more practical regression models.

Conclusion
In this work, we were able to estimate the fluid volume intake using a mobile, singlechannel sEMG.For the more difficult task of estimating sip amounts that correspond to actual swallows in a less controlled environment, regression models demonstrated the ability to estimate water volume intake with RMSE of about 0.14.In the future, we plan to use a wearable sEMG and employ more subjects.In addition, we plan to compile data on typical water volumes consumed by individuals each day in a free setting and take that into consideration when designing setups for longer-term experiments.

Fig. 5 A
Fig. 5 A bar plot that summarizes the RMSE for the seven regressors

Fig. 6 A
Fig. 6 A bar plot of R 2 for the seven regressors

Fig. 8 Fig. 9
Fig. 8 Scores % of the features using chi-square test

Table 1
RMSE and R 2 for testing using the best training models