Finger/Hand Gesture Recognition using Kinect

Question

Let me explain my need before I explain the problem. I am looking forward for a hand controlled application. Navigation using palm and clicks using grab/fist.

Currently, I am working with Openni, which sounds promising and has few examples which turned out to be useful in my case, as it had inbuild hand tracker in samples. which serves my purpose for time being.

What I want to ask is,

1) what would be the best approach to have a fist/grab detector ?

I trained and used Adaboost fist classifiers on extracted RGB data, which was pretty good, but, it has too many false detections to move forward.

So, here I frame two more questions

2) Is there any other good library which is capable of achieving my needs using depth data ?

3)Can we train our own hand gestures, especially using fingers, as some paper was referring to HMM, if yes, how do we proceed with a library like OpenNI ?

Yeah, I tried with the middle ware libraries in OpenNI like, the grab detector, but, they wont serve my purpose, as its neither opensource nor matches my need.

Apart from what I asked, if there is something which you think, that could help me will be accepted as a good suggestion.

What operating system were you using to get along with the same? Were you by any chance running a Mac, or was it a Windows you were using? — TheLuminor, Sep 30 '15 at 07:11

score 7 · Answer 1 · answered Feb 26 '14 at 20:53

7

You don't need to train your first algorithm since it will complicate things. Don't use color either since it's unreliable (mixes with background and changes unpredictably depending on lighting and viewpoint)

Assuming that your hand is a closest object you can simply segment it out by depth threshold. You can set threshold manually, use a closest region of depth histogram, or perform connected component on a depth map to break it on meaningful parts first (and then select your object based not only on its depth but also using its dimensions, motion, user input, etc). Here is the output of a connected components method:
Apply convex defects from opencv library to find fingers;
Track fingers rather than rediscover them in 3D.This will increase stability. I successfully implemented such finger detection about 3 years ago.

answered Feb 26 '14 at 20:53

Vlad

4,223
1
28
38

I have this implementation, I tried in a similar way, rather than on color mode, I tried convex defects with threshold and some depth range which you mentioned and yeah, only on hand extracted region, the approach is very good, but, wasnt that rubust like hand skeleton. But ,out of all approaches mentioned, this looks the best approach with few more modifications. – 4nonymou5 Feb 28 '14 at 13:07
The devil is in the details, as they say. You have to explore the cases when performance is not robust - may be you can improve your implementation. Ultimately, you can have a feedback loop from multiple analysed shapes back to pre-processing and selection stages meaning that your post-processing should inform your otherwise imperfect pre-processing on what to select. However these loops are dangerous and should be done with perfect understanding. – Vlad Feb 28 '14 at 18:03

RobAu · Answer 2 · 2014-08-25T13:28:48.127

Read my paper :) http://robau.files.wordpress.com/2010/06/final_report_00012.pdf

I have done research on gesture recognition for hands, and evaluated several approaches that are robust to scale, rotation etc. You have depth information which is very valuable, as the hardest problem for me was to actually segment the hand out of the image.

My most successful approach is to trail the contour of the hand and for each point on the contour, take the distance to the centroid of the hand. This gives a set of points that can be used as input for many training algorithms.

I use the image moments of the segmented hand to determine its rotation, so there is a good starting point on the hands contour. It is very easy to determine a fist, stretched out hand and the number of extended fingers.

Note that while it works fine, your arm tends to get tired from pointing into the air.

score 2 · Answer 3 · answered Feb 25 '14 at 07:48

It seems that you are unaware of the Point Cloud Library (PCL). It is an open-source library dedicated to the processing of point clouds and RGB-D data, which is based on OpenNI for the low-level operations and which provides a lot of high-level algorithm, for instance to perform registration, segmentation and also recognition.

A very interesting algorithm for shape/object recognition in general is called implicit shape model. In order to detect a global object (such as a car, or an open hand), the idea is first to detect possible parts of it (e.g. wheels, trunk, etc, or fingers, palm, wrist etc) using a local feature detector, and then to infer the position of the global object by considering the density and the relative position of its parts. For instance, if I can detect five fingers, a palm and a wrist in a given neighborhood, there's a good chance that I am in fact looking at a hand, however, if I only detect one finger and a wrist somewhere, it could be a pair of false detections. The academic research article on this implicit shape model algorithm can be found here.

In PCL, there is a couple of tutorials dedicated to the topic of shape recognition, and luckily, one of them covers the implicit shape model, which has been implemented in PCL. I never tested this implementation, but from what I could read in the tutorial, you can specify your own point clouds for the training of the classifier.

That being said, you did not mentioned it explicitly in your question, but since your goal is to program a hand-controlled application, you might in fact be interested in a real-time shape detection algorithm. You would have to test the speed of the implicit shape model provided in PCL, but I think this approach is better suited to offline shape recognition.

If you do need real-time shape recognition, I think you should first use a hand/arm tracking algorithm (which are usually faster than full detection) in order to know where to look in the images, instead of trying to perform a full shape detection at each frame of your RGB-D stream. You could for instance track the hand location by segmenting the depthmap (e.g. using an appropriate threshold on the depth) and then detecting the extermities.

Then, once you approximately know where the hand is, it should be easier to decide whether the hand is making one gesture relevant to your application. I am not sure what you exactly mean by fist/grab gestures, but I suggest that you define and use some app-controlling gestures which are easy and quick to distinguish from one another.

Hope this helps.

Oh, yeah, I heard of it, but never thought of implementing, your explanation sounds attractive and I ll surely look into it, yeah, you guessed it right, I am expecting a real time and robust detection. I already have a pretty robust hand tracker, which cant get better than that, the problem stays with the grab gesture, inclear words, with the finger gesture of grab, I want to trigger a click. your last two paragraphs is something I tried with Adaboost classifier, though I ll try implementing it with pcl as it sounds good. Do you think it would work for some finger gestures, like for thumbs up? — 4nonymou5, Feb 25 '14 at 09:08
If you look at the end of the PCL tutorial about implicit shape model, you'll see that the classifier is trained to distinguish 5 classes (including cat, horse, lioness and wolf, which are not so easy). So if your hand/finger gestures are discriminative enough (e.g. fist, open hand, thumb up, thumb down etc), this approach has a good chance to work. Anyway, it's worth experimenting with the implementation. — BConic, Feb 25 '14 at 11:55
PCL is pretty famous and many people are aware of it. Object registration works for creating 3D models and is pretty slow while shape modelling is good for recognizing rigid shapes as opposed to highly deformable hand. — Vlad, Feb 28 '14 at 21:41
@Vlad True, however you have to distinguish two tasks in the OP's question: hand tracking and gesture recognition. I agree that the _implicit shape model_ is not appropriate for hand tracking (I said so in my answer), however it is very appropriate for gesture recognition. The only unknown is on the the compatibility with the real - time constraint. — BConic, Mar 01 '14 at 08:42
Good point. I though about very simple gestures like swipe or push pull that are closely related to tracking. More complex gestures of course will require some learning and modeling. I would use convexity defects as features though since standard corners can poorly model hand parts. — Vlad, Mar 01 '14 at 08:50

score 2 · Answer 4 · edited May 23 '17 at 12:09

The fast answer is: Yes, you can train your own gesture detector using depth data. It is really easy, but it depends on the type of the gesture.

Suppose you want to detect a hand movement:

Detect the hand position (x,y,x). Using OpenNi is straighforward as you have one node for the hand
Execute the gesture and collect ALL the positions of the hand during the gesture.
With the list of positions train a HMM. For example you can use Matlab, C, or Python.
For your own gestures, you can test the model and detect the gestures.

Here you can find a nice tutorial and code (in Matlab). The code (test.m is pretty easy to follow). Here is an snipet:

%Load collected data
training = get_xyz_data('data/train',train_gesture);
testing = get_xyz_data('data/test',test_gesture); 

%Get clusters
[centroids N] = get_point_centroids(training,N,D);
ATrainBinned = get_point_clusters(training,centroids,D);
ATestBinned = get_point_clusters(testing,centroids,D);

% Set priors:
pP = prior_transition_matrix(M,LR);

% Train the model:
cyc = 50;
[E,P,Pi,LL] = dhmm_numeric(ATrainBinned,pP,[1:N]',M,cyc,.00001);

Dealing with fingers is pretty much the same, but instead of detecting the hand you need to detect de fingers. As Kinect doesn't have finger points, you need to use a specific code to detect them (using segmentation or contour tracking). Some examples using OpenCV can be found here and here, but the most promising one is the ROS library that have a finger node (see example here).

what ever you gave is too good, the links would be of a great help. But, your answer mostly concentrated on hand, when i say, grab, I meant capturing the finger movement, sorry, I didnt think in that depth, i should have framed the question as fingure gestures. any specific suggestion based on finger gestures ? as the codes/algorithm mentioned mostly deals with hand movement rather than a specific fingure gesture — 4nonymou5, Feb 25 '14 at 08:56
@4nonymou5 Check the edit of the post. I added one interesting reference related to finger detection — phyrox, Feb 25 '14 at 10:45
Yeah, I did something similar to this, used depth data with convexity hull defects and few other things, it was good, but, not capable of handling my needs, though, the link of ROS looks clean, I ll try it in that direction also, your answer looks impressive. — 4nonymou5, Feb 25 '14 at 11:03
@4nonymou5 I did some work with the Kinect, and had some trouble with depth. The resolution for the IR camera wasn't great, so you may want to take that into account, especially if the depth movement is restricted to how far a finger can move. I'm obviously not familiar with your dataset, but I think it's something you should keep in mind. — wbest, Feb 28 '14 at 00:26

score 2 · Answer 5 · answered Feb 27 '14 at 07:38

If you only need the detection of a fist/grab state, you should give microsoft a chance. Microsoft.Kinect.Toolkit.Interaction contains methods and events that detects the grip / grip release state of a hand. Take a look at the HandEventType of InteractionHandPointer . That works quite good for the fist/grab detection, but does not detect or report the position of individual fingers.

The next kinect (kinect one) detects 3 joint per hand (Wrist, Hand, Thumb) and has 3 hand based gestures: open, closed (grip/fist) and lasso (pointer). If that is enough for you, you should consider the microsoft libraries.

Is it a second version of Kinect based on ToF? Is there a library for Linux? — Vlad, Mar 01 '14 at 08:52

score 0 · Answer 6 · answered Feb 15 '14 at 00:27

0

1) If there are a lot of false detections, you could try to extend the negative sample set of the classifier, and train it again. The extended negative image set should contain such images, where the fist was false detected. Maybe this will help to create a better classifier.

answered Feb 15 '14 at 00:27

Milan Tenk

1,525
1
12
20

yeah, that would be a choice, but, what I thought is, the inclusion of depth data might be added advantage in an other algorithm through which accuracy might be increased. so, more of interested towards that. – 4nonymou5 Feb 15 '14 at 11:29

score 0 · Answer 7 · answered Feb 28 '14 at 11:22

0

I've had quite a bit of succes with the middleware library as provided by http://www.threegear.com/. They provide several gestures (including grabbing, pinching and pointing) and 6 DOF handtracking.

answered Feb 28 '14 at 11:22

Nallath

2,015
20
35

score 0 · Answer 8 · answered Mar 02 '17 at 00:51

You might be interested in this paper & open-source code:

Robust Articulated-ICP for Real-Time Hand Tracking

Code: https://github.com/OpenGP/htrack

Screenshot: http://lgg.epfl.ch/img/codedata/htrack_icp.png

YouTube Video: https://youtu.be/rm3YnClSmIQ

Paper PDF: http://infoscience.epfl.ch/record/206951/files/htrack.pdf

Finger/Hand Gesture Recognition using Kinect

8 Answers8

Linked