Computer vision based posture estimation and fall detection.
Authors: Adhikari, K.
Conference: Bournemouth University, Faculty of Media and CommunicationAbstract:
Falls are a major health problem, especially in the elderly population. Increasing fall events demands a high quality of service and dedicated medical treatment which is an economic burden. Serious injuries due to fall can cost lives in the absence of immediate care and support. There- fore, a monitoring system that can accurately detect fall events and generate instant alerts for immediate care is extremely necessary. To address this problem, this research aims to develop a computer vision-based fall detection system. This study proposes fall detection in three stages: (A) Detection of human silhouette and recognition of the pose, (B) Detection of the human as three regions for different postures including fall and (C) Recognise fall and non-fall using locations of human body regions as distinguishing features. The first stages of work comprise human silhouette detection and identification of activities in the form of different poses. Identifying a pose is important to understand a fall event where a change of pose defines its characteristics. A fall event comprises of sequential change of poses and ends up in a lying pose. Initial pose during a fall can be standing, sitting or bending but the final pose is usually a lying pose. It would, therefore, be beneficial if lying pose is recognised more accurately than other normal activities such as standing, sitting, bending or crawling to address a fall. Hence in the first stage, Background Subtraction (BS) is used to detect human silhouette. After background subtraction, the foreground images were used in a Convolutional Neural Network (CNN) to recognise different poses. The RGB and the Depth images were captured from a Kinect Sensor. The fusion of RGB and Depth images were explored for feeding to a convolutional neural net- work. Depth together with RGB complimented each other to overcome their weakness respectively and proved to be a significant strategy. The classification was performed using CNN to recognise different activities with 81% accuracy on validation. The other challenge in fall detection is the tracking of a person during a fall. Background Subtraction is not sufficient to track a fallen person especially when there are lighting and viewpoint variations in the environment and present of another object like furniture, a pet or even another person. Furthermore, tracking be- comes tougher during the fall in comparison to normal activities like walking or sitting because the rate of change pose is higher during a fall. To overcome this, the idea is to locate the regions in the body in every frame and consider it as a stable tracking strategy. The location of the body parts provides crucial information to distinguish falls from the other normal activities as the person is detected all the time during these activities. Hence the second stage of this research consists of posture detection using the pose estimation technique. This research proposes to use CNN based pose estimation using simplified human postures. The available joints are grouped according to three regions: Head, Torso and Leg and then finally fed to the CNN model with just three inputs instead of several available joints. This strategy added stability in pose detection and proved to be more effective against complex poses observed during a fall. To train the CNN model, transfer learning technique was used. The model was able to achieve 96.7% accuracy in detecting the three regions on different human postures on the publicly available dataset. A system which considers all the lying poses as falls can also generate a higher false alarm. Lying on bed or sofa can easily generate a fall alarm if they are recognised as falls. Hence, it is important to recognise actual fall by considering a sequence of frames that defines a fall and not just the lying pose. In the third and final stage, this study proposes Long Short-Term Memory (LSTM) recurrent networks-based fall detection. The proposed LSTM model uses the detected three region’s location as input features. LSTM is capable of using contextual information from the sequential input patterns. Therefore, the LSTM model was fed with location features of different postures in a sequence for training. The model was able to learn fall patterns and distinguish them from other activities with 88.33% accuracy. Furthermore, the precision of the fall class was 1.0. This is highly desirable in the case of fall detection as there is no false alarm and this means that the cost incurred in calling medical support for a false alarm can be completely avoided.