Thursday 14 July 2011

Kinect SDK: Pose Recognition

Introduction

Pose recognition is an active research area in computer vision. However, due to the skeleton tracking algorithms present in the Kinect sensor, it is relatively trivial to construct a simple pose recogniser using the Kinect SDK. The simple pose recogniser documented here recognises five poses - the left arm outstretched, the right arm outstretched, the left arm up, the right arm up, and both hands being clasped together.

Implementation

The XAML for the UI of the application is shown below. It defines a GroupBox that contains TextBlocks that display whether different poses have been identified or not. It uses a BooleanToString converter (not shown) to convert the boolean value of the bound properties to a string value (Yes/No) for display.

<Window x:Class="KinectDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:conv="clr-namespace:KinectDemo.Converters"
        Title="Pose Recognition" ResizeMode="NoResize" SizeToContent="WidthAndHeight"
        Loaded="Window_Loaded" Closed="Window_Closed">
    <Grid>
        <Grid.Resources>
            <conv:BooleanToStringConverter x:Key="boolStr" />
            <conv:BooleanToVisibilityConverter x:Key="boolVis" />
        </Grid.Resources>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="Auto" />
            <ColumnDefinition Width="Auto" />
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0">
            <TextBlock Margin="0,10,0,10" 
                       HorizontalAlignment="Center"
                       Text="Video Stream" />
            <Viewbox Margin="10,0,10,10">
                <Grid Height="240"
                      Width="320">
                    <Image Source="{Binding ColourBitmap}" />
                    <Rectangle Height="{Binding Box.Height}"
                               HorizontalAlignment="Left"
                               RadiusX="5"
                               RadiusY="5"
                               Stroke="Red"
                               StrokeThickness="2"
                               VerticalAlignment="Top"
                               Visibility="{Binding Path=IsUserDetected, 
                                                    Converter={StaticResource boolVis}}"
                               Width="{Binding Box.Width}">
                        <Rectangle.RenderTransform>
                            <TranslateTransform X="{Binding Box.X}"
                                                Y="{Binding Box.Y}" />
                        </Rectangle.RenderTransform>
                    </Rectangle>
                </Grid>
            </Viewbox>
        </StackPanel>
        <StackPanel Grid.Column="1"
                    Margin="10">
            <GroupBox Header="Motor Control"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="300">
                <StackPanel HorizontalAlignment="Center" 
                            Margin="10"
                            Orientation="Horizontal">
                    <Button x:Name="motorUp"
                            Click="motorUp_Click"
                            Content="Up"
                            Height="30"
                            Width="70" />
                    <Button x:Name="motorDown"
                            Click="motorDown_Click"
                            Content="Down"
                            Height="30"
                            Margin="10,0,0,0"
                            Width="70" />
                </StackPanel>
            </GroupBox>
            <GroupBox Header="Pose"
                      Height="150"
                      VerticalAlignment="Top"
                      Width="300">
                <Grid Margin="10,10,0,0">
                    <Grid.ColumnDefinitions>
                        <ColumnDefinition Width="130" />
                        <ColumnDefinition Width="30" />
                        <ColumnDefinition Width="40" />
                        <ColumnDefinition Width="50" />
                    </Grid.ColumnDefinitions>
                    <Grid.RowDefinitions>
                        <RowDefinition />
                        <RowDefinition />
                        <RowDefinition />
                        <RowDefinition />
                        <RowDefinition />
                    </Grid.RowDefinitions>
                    <TextBlock Text="Left arm outstretched: " />
                    <TextBlock Grid.Column="1"
                               Text="{Binding PoseRecognizer.IsLeftArmOutStretched, 
                                              Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    <TextBlock Grid.Column="2" 
                               Text="Value: " />
                    <TextBlock Grid.Column="3" 
                               HorizontalAlignment="Left"
                               Text="{Binding PoseRecognizer.LeftArmOutStretchedValue}"
                               VerticalAlignment="Top"
                               Width="40" />
                    <TextBlock Grid.Row="1" 
                               Text="Right arm outstretched: " />
                    <TextBlock Grid.Column="1"
                               Grid.Row="1"
                               Text="{Binding PoseRecognizer.IsRightArmOutStretched, 
                                              Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    <TextBlock Grid.Column="2" 
                               Grid.Row="1"
                               Text="Value: " />
                    <TextBlock Grid.Column="3" 
                               Grid.Row="1"
                               HorizontalAlignment="Left"
                               Text="{Binding PoseRecognizer.RightArmOutStretchedValue}"
                               VerticalAlignment="Top"
                               Width="40" />
                    <TextBlock Grid.Row="2" 
                               Text="Left arm up: " />
                    <TextBlock Grid.Column="1"
                               Grid.Row="2"
                               Text="{Binding PoseRecognizer.IsLeftArmUp, 
                                              Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    <TextBlock Grid.Column="2" 
                               Grid.Row="2"
                               Text="Value: " />
                    <TextBlock Grid.Column="3" 
                               Grid.Row="2"
                               HorizontalAlignment="Left"
                               Text="{Binding PoseRecognizer.LeftArmUpValue}"
                               VerticalAlignment="Top"
                               Width="40" />
                    <TextBlock Grid.Row="3" 
                               Text="Right arm up: " />
                    <TextBlock Grid.Column="1"
                               Grid.Row="3"
                               Text="{Binding PoseRecognizer.IsRightArmUp, 
                                              Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    <TextBlock Grid.Column="2" 
                               Grid.Row="3"
                               Text="Value: " />
                    <TextBlock Grid.Column="3" 
                               Grid.Row="3"
                               HorizontalAlignment="Left"
                               Text="{Binding PoseRecognizer.RightArmUpValue}"
                               VerticalAlignment="Top"
                               Width="40" />
                    <TextBlock Grid.Row="4" 
                               Text="Hands together: " />
                    <TextBlock Grid.Column="1"
                               Grid.Row="4"
                               Text="{Binding PoseRecognizer.AreHandsTogether, 
                                              Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    <TextBlock Grid.Column="2" 
                               Grid.Row="4"
                               Text="Value: " />
                    <TextBlock Grid.Column="3" 
                               Grid.Row="4"
                               HorizontalAlignment="Left"
                               Text="{Binding PoseRecognizer.AreHandsTogetherValue}"
                               VerticalAlignment="Top"
                               Width="40" />     
                </Grid>
            </GroupBox>
            <GroupBox Header="Information"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="300">
                <StackPanel>
                    <StackPanel Margin="10" 
                            Orientation="Horizontal">
                        <TextBlock Text="Frame rate: " />
                        <TextBlock Text="{Binding FramesPerSecond}"
                               VerticalAlignment="Top"
                               Width="50" />
                    </StackPanel>
                    <StackPanel Margin="10,0,0,0" 
                            Orientation="Horizontal">
                        <TextBlock Text="Tracking skeleton: " />
                        <TextBlock Text="{Binding SkeletonTracking, 
                                                  Converter={StaticResource boolStr}}"
                               VerticalAlignment="Top"
                               Width="30" />
                    </StackPanel>
                </StackPanel>
            </GroupBox>
        </StackPanel>
    </Grid>
</Window>

The constructor initializes the StreamManager class (contained in my KinectManager library), which handles the stream processing. The DataContext of MainWindow is set to kinectStream for binding purposes, and the UseBoundingBox property of the StreamManager class is set to true so that bounding box processing occurs.
        public MainWindow()
        {
            InitializeComponent();
            this.kinectStream = new StreamManager();
            this.DataContext = this.kinectStream;
            this.kinectStream.UseBoundingBox = true;
        }

The Window_Loaded event handler initializes the required subsystems of the Kinect pipeline, and invokes SmoothSkeletonData. Finally, event handlers are registered for the subsystems of the Kinect pipeline.
        private void Window_Loaded(object sender, RoutedEventArgs e)
        {
            this.runtime = new Runtime();
            try
            {
                this.runtime.Initialize(RuntimeOptions.UseDepthAndPlayerIndex | 
                    RuntimeOptions.UseSkeletalTracking | RuntimeOptions.UseColor);
                this.cam = runtime.NuiCamera;
            }
            catch (InvalidOperationException)
            {
                MessageBox.Show
                    ("Runtime initialization failed. Ensure Kinect is plugged in");
                return;
            }
            try
            {
                this.runtime.DepthStream.Open(ImageStreamType.Depth, 2, 
                    ImageResolution.Resolution320x240, ImageType.DepthAndPlayerIndex);
                this.runtime.VideoStream.Open(ImageStreamType.Video, 2, 
                    ImageResolution.Resolution640x480, ImageType.Color);
            }
            catch (InvalidOperationException)
            {
                MessageBox.Show
                    ("Failed to open stream. Specify a supported image type/resolution.");
                return;
            }
            this.kinectStream.LastTime = DateTime.Now;
            this.SmoothSkeletonData();
            this.runtime.SkeletonFrameReady += 
                new EventHandler<SkeletonFrameReadyEventArgs>(runtime_SkeletonFrameReady);
            this.runtime.DepthFrameReady += 
                new EventHandler<ImageFrameReadyEventArgs>(runtime_DepthFrameReady);
            this.runtime.VideoFrameReady += 
                new EventHandler<ImageFrameReadyEventArgs>(runtime_VideoFrameReady);
        }

SmoothSkeletonData simply initializes the smoothing and filtering parameters that will be applied to the skeleton tracking data returned from the sensor. The SkeletonFrameReady event handler simply invokes the GetSkeletonStream method in the StreamManager class to process the received data.
        private void SmoothSkeletonData()
        {
            this.runtime.SkeletonEngine.TransformSmooth = true;
            var parameters = new TransformSmoothParameters
            {
                Smoothing = 0.75f,
                Correction = 0.0f,
                Prediction = 0.0f,
                JitterRadius = 0.05f,
                MaxDeviationRadius = 0.04f
            };
            this.runtime.SkeletonEngine.SmoothParameters = parameters;
        }
        private void runtime_SkeletonFrameReady(object sender, 
            SkeletonFrameReadyEventArgs e)
        {
            this.kinectStream.GetSkeletonStream(e);
        }

Two important properties from the StreamManager class are shown below. IsSkeletonTracking is set to true if the Kinect sensor is tracking the skeleton of the detected user. PoseRecognizer is the instance of the PoseRecognition type, and is initialized in the StreamManager constructor.
        public bool IsSkeletonTracking { get; private set; }
        public PoseRecognition PoseRecognizer { get; private set; }

The GetSkeletonStream method invokes the RecognizePose method in the PoseRecognizer class, provided that the users skeleton data is being tracked. It also updates the IsSkeletonTracking property as appropriate.
        public void GetSkeletonStream(SkeletonFrameReadyEventArgs e)
        {
            SkeletonData skeletonData = e.SkeletonFrame.Skeletons[0];
            if (SkeletonTrackingState.Tracked == skeletonData.TrackingState)
            {
                this.IsSkeletonTracking = true;
                this.OnPropertyChanged("IsSkeletonTracking");
                this.PoseRecognizer.RecognizePose(skeletonData);
            }
            else
            {
                this.IsSkeletonTracking = false;
                this.OnPropertyChanged("IsSkeletonTracking");
            }
        }

The PoseRecognizer class recognizes five basic poses – the left arm outstretched, the right arm outstretched, the left arm up, the right arm up, and both hands being clasped together. The class defines backing store variables for each pose.
        private bool areHandsTogether;
        private bool isLeftArmOutStretched;
        private bool isRightArmOutStretched;
        private bool isLeftArmUp;
        private bool isRightArmUp;
        private float areHandsTogetherValue;
        private float leftArmOutStretchedValue;
        private float rightArmOutStretchedValue;
        private float leftArmUpValue;
        private float rightArmUpValue;

The class also defines properties for each of the backing store variables. These properties are bound to from the UI, via the PoseRecognizer object in the StreamManager class.
        public bool AreHandsTogether
        {
            get
            {
                return this.areHandsTogether;
            }
            private set
            {
                this.areHandsTogether = value;
                this.OnPropertyChanged("AreHandsTogether");
            }
        }
        public float AreHandsTogetherValue
        {
            get
            {
                return this.areHandsTogetherValue;
            }
            private set
            {
                this.areHandsTogetherValue = value;
                this.OnPropertyChanged("AreHandsTogetherValue");
            }
        }
        public bool IsLeftArmOutStretched
        {
            get
            {
                return this.isLeftArmOutStretched;
            }
            private set
            {
                this.isLeftArmOutStretched = value;
                this.OnPropertyChanged("IsLeftArmOutStretched");
            }
        }
        public bool IsRightArmOutStretched
        {
            get
            {
                return this.isRightArmOutStretched;
            }
            private set
            {
                this.isRightArmOutStretched = value;
                this.OnPropertyChanged("IsRightArmOutStretched");
            }
        }
        public float LeftArmOutStretchedValue
        {
            get
            {
                return this.leftArmOutStretchedValue;
            }
            private set
            {
                this.leftArmOutStretchedValue = value;
                this.OnPropertyChanged("LeftArmOutStretchedValue");
            }
        }
        public float RightArmOutStretchedValue
        {
            get
            {
                return this.rightArmOutStretchedValue;
            }
            private set
            {
                this.rightArmOutStretchedValue = value;
                this.OnPropertyChanged("RightArmOutStretchedValue");
            }
        }
        public bool IsLeftArmUp
        {
            get
            {
                return this.isLeftArmUp;
            }
            private set
            {
                this.isLeftArmUp = value;
                this.OnPropertyChanged("IsLeftArmUp");
            }
        }
        public bool IsRightArmUp
        {
            get
            {
                return this.isRightArmUp;
            }
            private set
            {
                this.isRightArmUp = value;
                this.OnPropertyChanged("IsRightArmUp");
            }
        }
        public float LeftArmUpValue
        {
            get
            {
                return this.leftArmUpValue;
            }
            private set
            {
                this.leftArmUpValue = value;
                this.OnPropertyChanged("LeftArmUpValue");
            }
        }
        public float RightArmUpValue
        {
            get
            {
                return this.rightArmUpValue;
            }
            private set
            {
                this.rightArmUpValue = value;
                this.OnPropertyChanged("RightArmUpValue");
            }
        }

The RecognizePose method updates the class properties to indicate whether the pose has occurred or not. It simply manipulates the X and Y values of the Position property of different Joints, in order to recognize a pose.
        public void RecognizePose(SkeletonData skeletonData)
        {
            this.AreHandsTogetherValue = Math.Abs(
                skeletonData.Joints[JointID.HandRight].Position.Y - 
                skeletonData.Joints[JointID.HandLeft].Position.Y) +
                Math.Abs(skeletonData.Joints[JointID.HandRight].Position.X - 
                skeletonData.Joints[JointID.HandLeft].Position.X);
            this.LeftArmOutStretchedValue = 
                skeletonData.Joints[JointID.HandLeft].Position.X - 
                skeletonData.Joints[JointID.ShoulderLeft].Position.X;
            this.RightArmOutStretchedValue = 
                skeletonData.Joints[JointID.HandRight].Position.X - 
                skeletonData.Joints[JointID.ShoulderRight].Position.X;
            this.LeftArmUpValue = 
                skeletonData.Joints[JointID.HandLeft].Position.Y - 
                skeletonData.Joints[JointID.Head].Position.Y;
            this.RightArmUpValue = 
                skeletonData.Joints[JointID.HandRight].Position.Y - 
                skeletonData.Joints[JointID.Head].Position.Y;
            this.AreHandsTogether = this.AreHandsTogetherValue < 0.04;
            this.IsLeftArmOutStretched = this.LeftArmOutStretchedValue < -0.4;
            this.IsRightArmOutStretched = this.RightArmOutStretchedValue > 0.4;
            this.IsLeftArmUp = this.LeftArmUpValue > 0;
            this.IsRightArmUp = this.RightArmUpValue > 0;
        }

The application is shown below. Once a user is identified from the depth stream returned from the Kinect sensor, the Minimum Bounding Box (MBB) is derived and the Rectangle representing the MBB is made visible over the video stream. For an explanation of how this works see this previous post. As the user assumes different poses, the information in the Poses group box in the UI is updated to indicate whether a pose has been recognized or not. It would be straightforward to add additional poses to the PoseRecognition class, such as crouching and jumping.

poserecognition

Conclusion


The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. The skeleton tracking data makes it easy to perform simple pose recognition, which is a starting basis for performing gesture recognition.

2 comments:

Anonymous said...

Hi there,

I encounter a problem as below.

'Vector' is an ambiguous reference between 'System.Windows.Vector' and 'Microsoft.Research.Kinect.Nui.Vector'

Can you advice how to fix it?

Thanks - Eric

Anonymous said...

Sorry, wrong post to the topic.