Separability between signal and noise components using the distribution of scaled Hankel matrix eigenvalues with application in biomedical signals.
Authors: Alharbi, N.
Biomedical signals are records from human and animal bodies. These records are considered as nonlinear time series, which hold important information about the physiological activities of organisms, and embrace many subjects of interest. However, biomedical signals are often corrupted by artifacts and noise, which require separation or signal extraction before any statistical evaluation. Another challenge in analysing biomedical signals is that their data is often non-stationary, particularly when there is an abnormal event observed within the signal, such as epileptic seizure, and can also present chaotic behaviour. The literature suggests that distinguishing chaos from noise continues to remain a highly contentious issue in the modern age, as it has been historically. This is because chaos and noise share common properties, which in turn make them indistinguishable. We seek to provide a viable solution to this problem by presenting a novel approach for the separability between signal and noise components and the diﬀerentiation of noise from chaos.
Several methods have been used for the analysis of and discrimination between different categories of biomedical signals, but many of these are based on restrictive assumptions of the normality, stationarity and linearity of the observed data. Therefore, an improved technique which is robust in its analysis of non-stationary time series is of paramount importance in accurate diagnosis of human diseases. The SSA (Singular Spectrum Analysis) technique does not depend on these assumptions, which could be very helpful for analysing and modelling biomedical data. Therefore, the main aim of the thesis is to provide a novel approach for developing the SSA technique, and then apply it to the analysis of biomedical signals.
SSA is a reliable technique for separating an arbitrary signal from a noisy time series (signal+noise). It is based upon two main selections: window length, L; and the number of eigenvalues, r. These values play an important role in the reconstruction and forecasting stages. However, the main issue in extracting signals using the SSA procedure lies in identifying the optimal values of L and r required for signal reconstruction. The aim of this thesis is to develop theoretical and methodological aspects of the SSA technique, to present a novel approach to distinguishing between deterministic and stochastic processes, and to present an algorithm for identifying the eigenvalues corresponding to the noise component, and thereby choosing the optimal value of r relating to the desired signal for separability between signal and noise. The algorithm used is considered as an enhanced version of the SSA method, which decomposes a noisy signal into the sum of a signal and noise. Although the main focus of this thesis is on the selection of the optimal value of r, we also provide some results and recommendations to the choice of L for separability. Several criteria are introduced which characterise this separability.
The proposed approach is based on the distribution of the eigenvalues of a scaled Hankel matrix, and on dynamical systems, embedding theorem, matrix algebra and statistical theory. The research demonstrates that the proposed approach can be considered as an alternative and promising technique for choosing the optimal values of r and L in SSA, especially for biomedical signals and genetic time series.
For the theoretical development of the approach, we present new theoretical results on the eigenvalues of a scaled Hankel matrix, provide some properties of the eigenvalues, and show the eﬀect of the window length and the rank of the Hankel matrix on the eigenvalues. The new theoretical results are examined using simulated and real time series. Furthermore, the eﬀect of window length on the distribution of the largest and smallest eigenvalues of the scaled Hankel matrix is also considered for the white noise process. The results indicate that the distribution of the largest eigenvalue for the white noise process has a positive skewed distribution for diﬀerent series lengths and diﬀerent values of window length, whereas the distribution of the smallest eigenvalue has a diﬀerent pattern with L; the distribution changes from left to right when L increases. These results, together with other results obtained by the diﬀerent criteria introduced and used in this research, are very promising for the identiﬁcation of the signal subspace.
For the practical aspect and empirical results, various biomedical signals and genetics time series are used. First, to achieve the objectives of the thesis, a comprehensive study has been made on the distribution, pattern; and behaviour of scaled Furthermore, the normal distribution with diﬀerent parameters is considered and the eﬀect of scale and shape parameters are evaluated. The correlation between eigenvalues is also assessed, using parametric and non-parametric association criteria. In addition, the distribution of eigenvalues for synthetic time series generated from some well known low dimensional chaotic systems are analysed in-depth. The results yield several important properties with broad application, enabling the distinction between chaos and noise in time series analysis. At this stage, the main result of the simulation study is that the ﬁndings related to the series generated from normal distribution with mean zero (white noise process) are totally diﬀerent from those obtained for other series considered in this research, which makes a novel contribution to the area of signal processing and noise reduction.
Second, the proposed approach and its criteria are applied to a number of simulated and real data with diﬀerent levels of noise and structures. Our results are compared with those obtained by common and well known criteria in order to evaluate, enhance and conﬁrm the accuracy of the approach and its criteria. The results indicate that the proposed approach has the potential to split the eigenvalues into two groups; the ﬁrst corresponding to the signal and the second to the noise component. In addition, based on the results, the optimal value of L that one needs for the reconstruction of a noise free signal from a noisy series should be the median of the series length. The results conﬁrm that the performance of the proposed approach can improve the quality of the reconstruction step for signal extraction.
Finally, the thesis seeks to explore the applicability of the proposed approach for discriminating between normal and epileptic seizure electroencephalography (EEG) signals, and ﬁltering the signal segments to make them free from noise. Various criteria based on the largest eigenvalue are also presented and used as features to distinguish between normal and epileptic EEG segments. These features can be considered as useful information to classify brain signals. In addition, the approach is applied to the removal of nonspeciﬁc noise from Drosophila segmentation genes. Our ﬁndings indicate that when extracting signal from diﬀerent genes, for optimised signal and noise separation, a diﬀerent number of eigenvalues need to be chosen for each gene.