Content Master Technology Blog: Kinect SDK: Gesture Recognition Pt II

Introduction

In my previous blog post I outlined the approach I’d be taking to gesture recognition, and discussed how I’d detect whether the user is moving or stationary. As a reminder, my high-level approach to the gesture recognition process is as follows:

Detect whether the user is moving or stationary.
Detect the start of a gesture (a posture).
Capture the gesture.
Detect the end of a gesture (a posture).
Identify the gesture.

This blog post will focus on detecting the start of a gesture – in this case, a posture. I’d previously written a PoseRecognition class which I’ve since refactored to be more robust and extensible. The refactored PostureRecognizer class recognises three poses – the left hand waving hello, the right hand waving hello, and both hands being clasped together. New poses can easily be added to the class.

Implementation

The XAML for the UI is shown below. An Image shows the video stream from the sensor, with a Canvas being used to overlay the skeleton of the user on the video stream. A Slider controls the elevation of the sensor via bindings to a Camera class, thus eliminating some of the references to the UI from the code-behind file. TextBlocks are bound to IsSkeletonTracking, IsStable, and PostureRecognizer.CurrentPosture, to show if the skeleton is being tracked, if it is stable, and the current posture, respectively.

<Window x:Class="KinectDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:conv="clr-namespace:KinectDemo.Converters"
        Title="Gesture Recognition" ResizeMode="NoResize" SizeToContent="WidthAndHeight"
        Loaded="Window_Loaded" Closed="Window_Closed">
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="Auto" />
            <ColumnDefinition Width="300" />
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0">
            <TextBlock HorizontalAlignment="Center"
                       Text="Tracking..." />
            <Viewbox>
                <Grid ClipToBounds="True">
                    <Image Height="300"
                           Margin="10,0,10,10"
                           Source="{Binding ColourBitmap}"
                           Width="400" />
                    <Canvas x:Name="skeletonCanvas" />
                </Grid>
            </Viewbox>
        </StackPanel>
        <StackPanel Grid.Column="1">
            <GroupBox Header="Motor Control"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="290">
                    <Slider x:Name="elevation"
                            AutoToolTipPlacement="BottomRight"
                            IsSnapToTickEnabled="True"
                            LargeChange="10"
                            Maximum="{Binding ElevationMaximum}"
                            Minimum="{Binding ElevationMinimum}"
                            HorizontalAlignment="Center"
                            Orientation="Vertical"
                            SmallChange="3"
                            TickFrequency="3"
                            Value="{Binding Path=ElevationAngle, Mode=TwoWay}" />
            </GroupBox>
            <GroupBox Header="Information"
                      Height="150"
                      VerticalAlignment="Top"
                      Width="290">
                <GroupBox.Resources>
                    <conv:BooleanToStringConverter x:Key="boolStr" />
                    <conv:PostureToStringConverter x:Key="postureStr" />
                </GroupBox.Resources>
                <StackPanel>
                    <StackPanel Orientation="Horizontal" 
                                Margin="10">
                        <TextBlock Text="Frame rate: " />
                        <TextBlock Text="{Binding FramesPerSecond}"
                                   VerticalAlignment="Top"
                                   Width="50" />
                    </StackPanel>
                    <StackPanel Margin="10,0,0,0" 
                                Orientation="Horizontal">
                        <TextBlock Text="Tracking skeleton: " />
                        <TextBlock Text="{Binding IsSkeletonTracking, 
                                          Converter={StaticResource boolStr}}"
                                   VerticalAlignment="Top"
                                   Width="30" />
                    </StackPanel>
                    <StackPanel Orientation="Horizontal"
                                Margin="10">
                        <TextBlock Text="Stable: " />
                        <TextBlock Text="{Binding Path=IsStable, 
                                                  Converter={StaticResource boolStr}}"
                                   VerticalAlignment="Top"
                                   Width="30" />
                    </StackPanel>
                    <StackPanel Orientation="Horizontal" 
                                Margin="10,0,0,0">
                        <TextBlock Text="Current posture: " />
                        <TextBlock Text="{Binding Path=PostureRecognizer.CurrentPosture, 
                                                  Converter={StaticResource postureStr}}"
                                   VerticalAlignment="Top"
                                   Width="160" />
                    </StackPanel>
                </StackPanel>
            </GroupBox>
        </StackPanel>
    </Grid>
</Window>

The StreamManager class (inside my KinectManager library) contains a property to access to the PostureRecognizer class, which is initialized in the StreamManager constructor.

        public PostureRecognizer PostureRecognizer { get; private set; }

The SkeletonFrameReady event handler hooks into a method in the StreamManager class called GetSkeletonStream. It retrieves a frame of skeleton data and processes it as follows: e.SkeletonFrame.Skeletons is an array of SkeletonData structures, each of which contains the data for a single skeleton. If the TrackingState field of the SkeletonData structure indicates that the skeleton is being tracked, the IsSkeletonTracking property is set to true, and the position of the skeleton is added to a collection in the BaryCenter class. Then the IsStable method of the BaryCenter class is invoked to determine if the user is stable or not. If the user is stable, the TrackPostures method from the PostureRecognizer class is invoked. Finally, the IsStable property is updated.

        public void GetSkeletonStream(SkeletonFrameReadyEventArgs e)
        {
            bool stable = false;
            foreach (var skeleton in e.SkeletonFrame.Skeletons)
            {
                if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
                {
                    this.IsSkeletonTracking = true;
                    this.baryCenter.Add(skeleton.Position, skeleton.TrackingID);
                    stable = this.baryCenter.IsStable(skeleton.TrackingID) ? true : false;
                    if (stable)
                    {
                        this.PostureRecognizer.TrackPostures(skeleton);
                    }
                }
            }
            this.IsStable = stable;
        }

The Posture enumeration, used by the PostureRecognizer class, is shown below.

    public enum Posture
    {
        None,
        HandsClasped,
        LeftHello,
        RightHello
    }

The PostureRecognizer class defines a property called CurrentPosture, along with a backing store variable, and two constants.

        private const float Epsilon = 0.1f;
        private const float MaxRange = 0.25f;
        private Posture currentPosture = Posture.None;
        public Posture CurrentPosture
        {
            get { return this.currentPosture; }
            set
            {
                this.currentPosture = value;
                this.OnPropertyChanged("CurrentPosture");
            }     
        }

The PostureRecognizer.TrackPostures method is shown below. It enumerates the Joints in the skeleton and gets the positions of the head, left and right hands and stores them as Vectors. It then calls two helper methods to determine if the user’s hands are clapsed or if the user is waving hello with their left or right hand. If a posture is identified, the RaisePostureDetected method is invoked. If a posture is not identified, the CurrentPosture property is set to Posture.None.

        public void TrackPostures(SkeletonData skeleton)
        {
            if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
            {
                return;
            }
            Vector? headPosition = null;
            Vector? leftHandPosition = null;
            Vector? rightHandPosition = null;
            foreach (Joint joint in skeleton.Joints)
            {
                if (joint.Position.W < 0.8f || 
                    joint.TrackingState != JointTrackingState.Tracked)
                {
                    continue;
                }
                switch (joint.ID)
                {
                    case JointID.Head:
                        headPosition = joint.Position;
                        break;
                    case JointID.HandLeft:
                        leftHandPosition = joint.Position;
                        break;
                    case JointID.HandRight:
                        rightHandPosition = joint.Position;
                        break;
                }
            }
            if (this.AreHandsClasped(rightHandPosition, leftHandPosition))
            {
                this.RaisePostureDetected(Posture.HandsClasped);
                return;
            }
            if (this.IsHello(headPosition, leftHandPosition))
            {
                this.RaisePostureDetected(Posture.LeftHello);
                return;
            }
            if (this.IsHello(headPosition, rightHandPosition))
            {
                this.RaisePostureDetected(Posture.RightHello);
                return;
            }
            this.CurrentPosture = Posture.None;
        }

The IsHello and AreHandsClasped helper methods are shown below. They both examine the Vector parameters to see if they have values, and if they do, perform simple arithmetic to determine whether the hello posture has been made, or whether the user’s hands are clasped together. In particular, the AreHandsClasped method uses the Vector extension methods I developed in my previous blog post.

        private bool IsHello(Vector? headPosition, Vector? handPosition)
        {
            if (!headPosition.HasValue || !handPosition.HasValue)
                return false;
            if (Math.Abs(handPosition.Value.X - headPosition.Value.X) < MaxRange)
                return false;
            if (Math.Abs(handPosition.Value.Y - headPosition.Value.Y) > MaxRange)
                return false;
            if (Math.Abs(handPosition.Value.Z - headPosition.Value.Z) > MaxRange)
                return false;
            return true;
        }
        private bool AreHandsClasped(Vector? leftHandPosition, Vector? rightHandPosition)
        {
            if (!leftHandPosition.HasValue || !rightHandPosition.HasValue)
                return false;
            Vector result = leftHandPosition.Value.Subtract(rightHandPosition.Value);
            float distance = result.Length();
            if (distance > Epsilon)
                return false;
            return true;
        }

The RaisePostureDetected method is shown below. It simply updates the CurrentPosture property to the identified posture, providing that the two are not identical. The method name could be considered curious, as the method is not raising an event. However, the point of recognizing a posture is that it will be the start or end point of a gesture. Therefore, this method will eventually fire a PostureDetected event that will capture a gesture if a start posture has been identified, or stop the gesture capture if an end posture has been identified. I will return to this topic in my next blog post.

        private void RaisePostureDetected(Posture posture)
        {
            if (this.currentPosture != posture)
            {
                this.CurrentPosture = posture;
            }
        }

The UI uses the PostureToStringConverter class (code not shown) to convert the Posture enumeration to it’s string representation for display on the UI.

The application is shown below. It uses skeleton tracking to determine whether the user is moving or stationary, and identifies the user’s pose if they are stationary.

Conclusion

The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. My gesture recognition process now determines whether the user is moving or stationary, and recognizes the start of a gesture – in this case, a posture. The next step in the process will be to capture the gesture that occurs between the starting posture and the ending posture.

Thursday, 11 August 2011

Kinect SDK: Gesture Recognition Pt II

Introduction

Implementation

Conclusion

1 comment:

Tags

Bloggers

Blog Archive

Content Master Books

Disclaimer