Hand Gesture Recognition for AR/MR/VR Headsets

Kyeongeun Seo, Hyeonjoong Cho

Egocentric hand pose estimation is significant for wearable cameras since the hand interactions are captured from an egocentric viewpoint. Several studies on hand pose estimation have recently been presented based on RGBD or RGB sensors. Although these methods provide accurate hand pose estimation, they have several limitations. For example, RGB-based techniques have intrinsic difficulty in converting relative 3D poses into absolute 3D poses, and RGBD-based techniques only work in indoor environments. Recently, stereo-sensor-based techniques have gained increasing attention owing to their potential to overcome these limitations. However, to the best of our knowledge, there are few techniques and no real datasets available for egocentric stereo vision. In this paper, we propose a top-down pipeline for estimating absolute 3D hand poses using stereo sensors, as well as a novel dataset for training. Our top-down pipeline consists of two steps: hand detection and hand pose estimation. Hand detection detects hand areas and then is followed by hand pose estimation, which estimates the positions of the hand joints. In particular, for hand pose estimation with a stereo camera, we propose an attention-based architecture called StereoNet, a geometry-based loss function called StereoLoss, and a novel 2D disparity map called StereoDMap for effective stereo feature learning. To collect the dataset, we proposed a novel annotation method that helps reduce human annotation efforts. Our dataset is publicly available at https://github.com/seo0914/SEH . We conducted comprehensive experiments to demonstrate the effectiveness of our approach compared with the state-of-the-art methods.

[Open Dataset]

Our SEH (Stereo Egocentric Hand) dataset is the first hand pose dataset for egocentric stereo vision with accurate GT 3D hand poses. It includes three types of interactions : tapping, drawing, and toruch gestures. These interactions were performed on three types of backgrounds: white, wood, and cluttered.

Link : https://github.com/seo0914/SEH

[Related Publications]

Kyeongeun Seo, Hyeonjoong Cho, Daewoong Choi, Taewook Heo, "Stereo Feature Learning based on Attention and Geometry for Absolute Hand Pose Estimation in Egocentric Stereo Views", IEEE Access (To appear)

Kyeongeun Seo, Hyeonjoong Cho, Daewoong Choi, Sangyub Lee, Jaekyu Lee, Jaejing Ko, "TWOHANDSMUSIC: Multitask Learning-Based Egocentric Piano-playing Gesture Recognition System For Two Hands", IEEE International Conference on Image Processing, 2019.

Kyeongeun Seo, Hyeonjoong Cho, "Tentap: a piano-playing gesture recognition system based on ten fingers for virtual piano ", ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. ACM 2018.