Friday 29 July 2011

Kinect SDK: Gesture Recognition Pt 1

Introduction

Previous blog posts have looked at a number of topics including simple gesture recognition, skeleton tracking, pose recognition, and smoothing skeleton data. It’s now time to link these topics together in order to produce a robust and extensible gesture recognizer, that can be used in different NUI (natural user interface) applications.

My high-level approach to the gesture recognition process will be as follows:

  • Detect whether the user is moving or stationary.
  • Detect the start of a gesture (a posture).
  • Capture the gesture.
  • Detect the end of a gesture (a posture).
  • Identify the gesture.

A gesture can be thought of as a sequence of points. The coordinates of these points will define a distance to the sensor and a gesture recognizer will have to recognize a gesture regardless of the distance to the sensor. Therefore, it will be necessary to scale gestures to a common reference. It will then be possible to robustly identify gestures captured at any distance from the sensor.

There are a large number of algorithmic solutions for gesture recognition, and I will write more about this in a future blog post. The focus of this post is on detecting whether the user is moving or stationary. This can be undertaken by examining the center of mass of the user.

The center of mass is the mean location of all the mass in a system, and is also known as the barycenter. The common definition of barycenter comes from astrophysics, where it is the center of mass where two or more celestial bodies orbit each other. The barycenter of a shape is the intersection of all straight lines that divide the shape into two parts of equal moment about the line. Therefore, it can be thought of as the mean of all points of the shape.

The application documented here uses a barycenter in order to determine if the user is moving (stable) or stationary (not stable).

Implementation

The XAML for the UI is shown below. An Image shows the video stream from the sensor, with a Canvas being used to overlay the skeleton of the user on the video stream. A Slider now controls the elevation of the sensor via bindings to a Camera class, thus eliminating some of the references to the UI from the code-behind file. TextBlocks are bound to IsSkeletonTracking and IsStable, to show if the skeleton is being tracked, and if it is stable, respectively.

<Window x:Class="KinectDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:conv="clr-namespace:KinectDemo.Converters"
        Title="Gesture Recognition" ResizeMode="NoResize" SizeToContent="WidthAndHeight"
        Loaded="Window_Loaded" Closed="Window_Closed">
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="Auto" />
            <ColumnDefinition Width="300" />
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0">
            <TextBlock HorizontalAlignment="Center"
                       Text="Tracking..." />
            <Viewbox>
                <Grid ClipToBounds="True">
                    <Image Height="300"
                           Margin="10,0,10,10"
                           Source="{Binding ColourBitmap}"
                           Width="400" />
                    <Canvas x:Name="skeletonCanvas" />
                </Grid>
            </Viewbox>
        </StackPanel>
        <StackPanel Grid.Column="1">
            <GroupBox Header="Motor Control"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="290">
                    <Slider x:Name="elevation"
                            AutoToolTipPlacement="BottomRight"
                            IsSnapToTickEnabled="True"
                            LargeChange="10"
                            Maximum="{Binding ElevationMaximum}"
                            Minimum="{Binding ElevationMinimum}"
                            HorizontalAlignment="Center"
                            Orientation="Vertical"
                            SmallChange="3"
                            TickFrequency="3"
                            Value="{Binding Path=ElevationAngle, Mode=TwoWay}" />
            </GroupBox>
            <GroupBox Header="Information"
                      Height="200"
                      VerticalAlignment="Top"
                      Width="290">
                <GroupBox.Resources>
                    <conv:BooleanToStringConverter x:Key="boolStr" />
                </GroupBox.Resources>
                <StackPanel>
                    <StackPanel Orientation="Horizontal" 
                                Margin="10">
                        <TextBlock Text="Frame rate: " />
                        <TextBlock Text="{Binding FramesPerSecond}"
                                   VerticalAlignment="Top"
                                   Width="50" />
                    </StackPanel>
                    <StackPanel Margin="10,0,0,0" 
                                Orientation="Horizontal">
                        <TextBlock Text="Tracking skeleton: " />
                        <TextBlock Text="{Binding IsSkeletonTracking, 
                                          Converter={StaticResource boolStr}}"
                                   VerticalAlignment="Top"
                                   Width="30" />
                    </StackPanel>
                    <StackPanel Orientation="Horizontal"
                                Margin="10">
                        <TextBlock Text="Stable: " />
                        <TextBlock Text="{Binding Path=IsStable, 
                                          Converter={StaticResource boolStr}}"
                                   VerticalAlignment="Top"
                                   Width="30" />
                    </StackPanel>
                </StackPanel>
            </GroupBox>
        </StackPanel>
    </Grid>
</Window>

The StreamManager class (inside my KinectManager library) contains properties and backing store for IsSkeletonTracking and IsStable. It also contains an instance of the BaryCenter class.
        private bool? isStable = null;
        private bool isSkeletonTracking;
        private readonly BaryCenter baryCenter = new BaryCenter();
        public bool IsSkeletonTracking
        {
            get { return this.isSkeletonTracking; }
            private set
            {
                this.isSkeletonTracking = value;
                this.OnPropertyChanged("IsSkeletonTracking");
                if (this.isSkeletonTracking == false)
                {
                    this.IsStable = null;
                }
            }
        }
        
        public bool? IsStable
        {
            get { return this.isStable; }
            private set
            {
                this.isStable = value;
                this.OnPropertyChanged("IsStable");
            }
        }

The SkeletonFrameReady event handler hooks into a method in the StreamManager class called GetSkeletonStream. It retrieves a frame of skeleton data and processes it as follows: e.SkeletonFrame.Skeletons is an array of SkeletonData structures, each of which contains the data for a single skeleton. If the TrackingState field of the SkeletonData structure indicates that the skeleton is being tracked, the IsSkeletonTracking property is set to true, and the position of the skeleton is added to a collection in the BaryCenter class. Then the IsStable method of the BaryCenter class is invoked to determine if the user is stable or not. Finally, the IsStable property is updated.
        public void GetSkeletonStream(SkeletonFrameReadyEventArgs e)
        {
            bool stable = false;
            foreach (var skeleton in e.SkeletonFrame.Skeletons)
            {
                if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
                {
                    this.IsSkeletonTracking = true;
                    this.baryCenter.Add(skeleton.Position, skeleton.TrackingID);
                    stable = this.baryCenter.IsStable(skeleton.TrackingID) ? true : false;
                }
            }
            this.IsStable = stable;
        }

The Vector class in the Microsoft.Research.Kinect.Nui namespace is currently lacking many useful vector operations, including basic vector arithmetic. Therefore I defined extension methods to calculate the length of a vector, and to subtract one vector from another. The extension methods are currently defined in a Helper class. Extension methods enable you to “add” methods to existing types without creating a new derived type, or otherwise modifying the original type. There is no difference between calling an extension method and the methods that are actually defined in a type.
        public static float Length(this Vector vector)
        {
            return (float)Math.Sqrt(vector.X * vector.X + 
                                    vector.Y * vector.Y + 
                                    vector.Z * vector.Z);
        }
        public static Vector Subtract(this Vector left, Vector right)
        {
            return new Vector
            {
                X = left.X - right.X,
                Y = left.Y - right.Y,
                Z = left.Z - right.Z,
                W = left.W - right.W
            };
        }

The Add method and IsStable method of the BaryCenter class are shown below. The Add method stores skeleton positions in a circular queue of type Dictionary, where each entry in the Dictionary is an int and a List of Vectors. The IsStable method returns a boolean value indicating whether the user is stable or not. If there are not enough skeleton positions stored in the Dictionary it returns false. Otherwise it subtracts each of the stored skeleton position vectors from the latest skeleton position vector, and gets the vector length of the resulting subtraction. If the vector length is greater than a threshold it returns false. Otherwise it returns true.
        public void Add(Vector position, int trackingID)
        {
            if (!this.positions.ContainsKey(trackingID))
            {
                this.positions.Add(trackingID, new List<Vector>());
            }
            this.positions[trackingID].Add(position);
            if (this.positions[trackingID].Count > this.windowSize)
            {
                this.positions[trackingID].RemoveAt(0);
            }
        }
        public bool IsStable(int trackingID)
        {
            List<Vector> currentPositions = positions[trackingID];
            if (currentPositions.Count != this.windowSize)
            {
                return false;
            }
            Vector current = currentPositions[currentPositions.Count - 1];
            for (int i = 0; i < currentPositions.Count - 2; i++)
            {
                Vector result = currentPositions[i].Subtract(current);
                float length = result.Length();
                
                if (length > this.Threshold)
                {
                    return false;
                }
            }
            return true;
        }

The application is shown below. It uses skeleton tracking to derive whether the user is moving or stationary, and indicates this on the UI.

barycenter

Conclusion


The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. The first part of my gesture recognition process is to determine whether the user is moving or stationary, and is performed with the BaryCenter class. Coupled with pose recognition, this will produce a robust indication of whether a gesture is being made or not.

Monday 25 July 2011

Enterprise Library and Azure – What do you want to see?

I'm currently working with the team in Microsoft's patterns & practices group who are in the early stages of a project to create a Windows Azure Integration Pack for Enterprise Library. We're currently looking for feedback about the proposed content (or indeed suggestions for content we haven't thought of), so if you're developing applications for Windows Azure or using Enterprise Library in a current project, go and take a look at what's on the list, and cast your vote for what you'd like to see included here.

Thursday 21 July 2011

Kinect SDK: Smoothing Skeleton Data

Introduction

As previously noted, the Kinect sensor does not have sufficient resolution to ensure consistent accuracy of the skeleton tracking data over time. This problem manifests itself as the data seeming to vibrate around their positions. However, the Kinect SDK provides an algorithm for filtering and smoothing incoming data from the sensor, the code for which can be seen below. The smoothing algorithm parameters can be manipulated in order to provide the required level of filtering and smoothing for your desired user experience.

            this.runtime.SkeletonEngine.TransformSmooth = true;
            var parameters = new TransformSmoothParameters
            {
                Smoothing = 1.0f,
                Correction = 0.1f,
                Prediction = 0.1f,
                JitterRadius = 0.05f,
                MaxDeviationRadius = 0.05f
            };
            this.runtime.SkeletonEngine.SmoothParameters = parameters;

A common question I’ve seen online is how does the smoothing work, and what do the various parameters do. This post attempts to answer these questions.

Smoothing and Filtering Data


Filtering is the most commonly used signal processing technique. Filters are usually used to remove or attenuate an undesired portion of a signal’s spectrum while enhancing the desired portions of the signal. The filter made available by the Kinect SDK is based on the Holt double exponential smoothing method, commonly used for statistical analysis of economic data. This algorithm provides smoothing with less latency than many other smoothing filter algorithms.

The exponential function is used to model the relationship in which a change in the independent variable, x, gives the same proportional change in the dependent variable, y.

image
The double exponential function is a constant raised to the power of an exponential function.

image
The simplest way to smooth a time series (data points measured at uniformly spaced time intervals) is to calculate a simple moving average, with the output being the mean of the last x data points. Exponential smoothing builds on this by assigning exponentially decreasing weights as the data points get older; one or more smoothing parameters determine the weights assigned to the data points. The smoothed data becomes a simple weighted average of the previous data point, and the previous smoothed data point. However, “good smoothing” will not be achieved until several data points have been averaged together. In addition, exponential smoothing is problematic when there is a trend in the data.

Double exponential smoothing can be defined as single exponential smoothing applied to an already exponentially smoothed time series. It performs two smoothing passes. In the first pass the smoothed data is adjusted for the trend of the previous period, by adding it to the last smoothed value. In the second pass the trend is updated, which is expressed as the difference between the last two values. Holt’s double exponential smoothing algorithm simply removes the need for a second pass of the data by smoothing the trend values directly; a number of smoothing parameters determine the weights assigned to the data points.

The algorithm parameters that the Kinect SDK allow you to set are shown in the table below, along with their default values.

ParameterDescriptionDefault ValueComments
SmoothingSpecifies the amount of smoothing.0.5Higher values correspond to more smoothing and a value of 0 causes the raw data to be returned. Increasing smoothing tends to increase latency. Values must be in the range [0, 1.0].
CorrectionSpecifies the amount of correction. 0.5Lower values are slower to correct towards the raw data and appear smoother, while higher values correct toward the raw data more quickly. Values must be in the range [0, 1.0].
PredictionSpecifies the number of predicted frames.0.5 
Jitter RadiusSpecifies the jitter-reduction radius, in meters. 0.05The default value of 0.05 represents 5cm. Any jitter beyond the radius is clamped to the radius.
Maximum Deviation RadiusSpecifies the maximum radius that filter positions can deviate from raw data, in meters.0.04Filtered values that would exceed the radius from the raw data are clamped at this distance, in the direction of the filtered value.

There is no set of “best” values to use for these parameters. Experimentation is required on an application-by-application basis in order to provide the required level of filtering and smoothing for your desired user experience.

Conclusion


The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. The Kinect SDK provides access to Holt’s double exponential smoothing algorithm, which produces smoothing with less latency than many other smoothing filter algorithms. The algorithm parameters can be manipulated in order to provide the required level of filtering and smoothing for your desired user experience.

Wednesday 20 July 2011

Kinect SDK: Simple Gesture Recognition

Introduction

Gesture recognition has long been a research area within computer science, and has seen an increased focus in the last decade due to the development of different devices including smart phones and Microsoft Surface. It’s aim is to allow people to interface with a device and interact naturally without any mechanical devices. Multi-touch devices use gestures to perform various actions; for instance a pinch gesture is commonly used to scale content such as images. In this post I will outline the development of a small application that performs simple gesture recognition in order to scale an image using the user’s hands. The application uses skeleton tracking to recognise the user’s hands, then places an image of the Eye of Sauron in the center of the user’s hands. As the user moves their hands apart the size of the image increases, and as the user closes their hands the size of the image decreases.

Implementation

The XAML for the UI of the application is shown below. A Canvas contains an Image that is used to display the Eye of Sauron. The Image uses a TranslateTransform to control the location of the eye, and a ScaleTransform to control the size of the eye. The properties of both RenderTransforms are set using bindings, as is the Visibility property of the Image.

<Window x:Class="KinectDemo.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Eye of Sauron" ResizeMode="NoResize" SizeToContent="WidthAndHeight"
        Loaded="Window_Loaded" Closed="Window_Closed">
    <Grid>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="Auto" />
            <ColumnDefinition Width="200" />
        </Grid.ColumnDefinitions>
        <StackPanel Grid.Column="0">
            <TextBlock HorizontalAlignment="Center"
                       Text="Tracking..." />
            <Image Height="300"
                   Margin="10,0,10,10"
                   Source="{Binding ColourBitmap}"
                   Width="400" />
            <Canvas x:Name="canvas" 
                    Height="300"
                    Margin="10,-310,10,10"
                    Width="400">
                <Image x:Name="eye"
                       Height="400"
                       Width="400"
                       Source="Images/SauronEye.png"
                       Visibility="{Binding ImageVisibility}">
                    <Image.RenderTransform>
                        <TransformGroup>
                            <TranslateTransform X="{Binding TranslateX}"
                                                Y="{Binding TranslateY}" />
                            <ScaleTransform CenterX="{Binding CenterX}"
                                            CenterY="{Binding CenterY}" 
                                            ScaleX="{Binding ScaleFactor}"
                                            ScaleY="{Binding ScaleFactor}" />
                        </TransformGroup>
                    </Image.RenderTransform>
                </Image>
            </Canvas>
        </StackPanel>
        <StackPanel Grid.Column="1">
            <GroupBox Header="Motor Control"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="190">
                <StackPanel HorizontalAlignment="Center" 
                            Margin="10"
                            Orientation="Horizontal">
                    <Button x:Name="motorUp"
                            Click="motorUp_Click"
                            Content="Up"
                            Height="30"
                            Width="70" />
                    <Button x:Name="motorDown"
                            Click="motorDown_Click"
                            Content="Down"
                            Height="30"
                            Margin="10,0,0,0"
                            Width="70" />
                </StackPanel>
            </GroupBox>
            <GroupBox Header="Information"
                      Height="100"
                      VerticalAlignment="Top"
                      Width="190">
                <StackPanel Orientation="Horizontal" Margin="10">
                    <TextBlock Text="Frame rate: " />
                    <TextBlock Text="{Binding FramesPerSecond}"
                               VerticalAlignment="Top"
                               Width="50" />
                </StackPanel>
            </GroupBox>
        </StackPanel>
    </Grid>
</Window>

The constructor initializes the StreamManager class (contained in my KinectManager library), which handles the stream processing. The DataContext of MainWindow is set to kinectStream for binding purposes.
        public MainWindow()
        {
            InitializeComponent();
            this.kinectStream = new StreamManager();
            this.DataContext = this.kinectStream;
        }

The Window_Loaded event handler initializes the required subsystems of the Kinect pipeline, and invokes SmoothSkeletonData (which has now moved to the StreamManager class). For an explanation of what the SmoothSkeletonData method does, see this previous post. Finally, event handlers are registered for the subsystems of the Kinect pipeline.
        private void Window_Loaded(object sender, RoutedEventArgs e)
        {
            this.runtime = new Runtime();
            try
            {
                this.runtime.Initialize(
                    RuntimeOptions.UseColor |
                    RuntimeOptions.UseSkeletalTracking);
                this.cam = runtime.NuiCamera;
            }
            catch (InvalidOperationException)
            {
                MessageBox.Show
                    ("Runtime initialization failed. Ensure Kinect is plugged in");
                return;
            }
            try
            {
                this.runtime.VideoStream.Open(ImageStreamType.Video, 2,
                    ImageResolution.Resolution640x480, ImageType.Color);
            }
            catch (InvalidOperationException)
            {
                MessageBox.Show
                    ("Failed to open stream. Specify a supported image type/resolution.");
                return;
            }
            this.kinectStream.KinectRuntime = this.runtime;
            this.kinectStream.LastTime = DateTime.Now;
            this.kinectStream.SmoothSkeletonData();
            this.runtime.VideoFrameReady +=
                new EventHandler<ImageFrameReadyEventArgs>(nui_VideoFrameReady);
            this.runtime.SkeletonFrameReady +=
                new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
        }

The SkeletonFrameReady event handler invokes the GetSkeletonStream method in the StreamManager class, and passes a number of parameters into it.
        private void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
        {
            this.kinectStream.GetSkeletonStream(
                e, 
                (int)this.eye.Width, 
                (int)this.eye.Height, 
                (int)this.canvas.Width, 
                (int)this.canvas.Height);
        }

Several important properties from the StreamManager class are shown below. The CenterX and CenterY properties are bound to from the ScaleTransform in the UI, and specify the point that is the center of the scale operation. The ImageVisibility property is bound to from the Image control in the UI, and is used to control the visibility of the image of the eye. The KinectRuntime property is used to access the runtime instance from MainWindow.xaml.cs. The ScaleFactor property is bound to from the ScaleTransform in the UI, and is used to resize the image of the eye by the factor specified. The TranslateX and TranslateY properties are bound to from the TranslateTransform in the UI, and are used to specify the location of the image of the eye.
        public float CenterX
        {
            get
            {
                return this.centerX;
            }
            private set
            {
                this.centerX = value;
                this.OnPropertyChanged("CenterX");
            }
        }
        public float CenterY
        {
            get
            {
                return this.centerY;
            }
            private set
            {
                this.centerY = value;
                this.OnPropertyChanged("CenterY");
            }
        }
        public Visibility ImageVisibility
        {
            get
            {
                return this.imageVisibility;
            }
            private set
            {
                this.imageVisibility = value;
                this.OnPropertyChanged("ImageVisibility");
            }
        }
        public Runtime KinectRuntime { get; set; }
        public float ScaleFactor
        {
            get
            {
                return this.scaleFactor;
            }
            private set
            {
                this.scaleFactor = value;
                this.OnPropertyChanged("ScaleFactor");
            }
        }
        public float TranslateX
        {
            get
            {
                return this.translateX;
            }
            private set
            {
                this.translateX = value;
                this.OnPropertyChanged("TranslateX");
            }
        }
        public float TranslateY
        {
            get
            {
                return this.translateY;
            }
            private set
            {
                this.translateY = value;
                this.OnPropertyChanged("TranslateY");
            }
        }

The GetSkeletonStream method, in the StreamManager class, is shown below. It retrieves a frame of skeleton data and processes it as follows: e.SkeletonFrame.Skeletons is an array of SkeletonData structures, each of which contains the data for a single skeleton. If the TrackingState field of the SkeletonData structure indicates that the skeleton is being tracked, the image of the eye is made visible and the handLeft and handRight Vectors are set to the positions of the left and right hands, respectively. The center point between the hands is then calculated as a Vector. The CenterX and CenterY properties are then set to the image space coordinates returned from GetDisplayPosition. The TranslateX and TranslateY properties are then set to the location to display the image of the eye at. Finally, the ScaleFactor property is set to the distance between the right and left hands.
        public void GetSkeletonStream(SkeletonFrameReadyEventArgs e, int imageWidth, 
            int imageHeight, int canvasWidth, int canvasHeight)
        {
            Vector handLeft = new Vector();
            Vector handRight = new Vector();
            JointsCollection joints = null;
            foreach (SkeletonData data in e.SkeletonFrame.Skeletons)
            {
                if (SkeletonTrackingState.Tracked == data.TrackingState)
                {
                    this.ImageVisibility = Visibility.Visible;
                    joints = data.Joints;
                    handLeft = joints[JointID.HandLeft].Position;
                    handRight = joints[JointID.HandRight].Position;
                    break;
                }
                else
                {
                    this.ImageVisibility = Visibility.Collapsed;
                }
            }
            // Find center between hands
            Vector position = new Vector
            {
                X = (handLeft.X + handRight.X) / (float)2.0,
                Y = (handLeft.Y + handRight.Y) / (float)2.0,
                Z = (handLeft.Z + handRight.Z) / (float)2.0
            };
            // Convert depth space co-ordinates to image space
            Vector displayPosition = this.GetDisplayPosition(position, 
                canvasWidth, canvasHeight);
            this.CenterX = displayPosition.X;
            this.CenterY = displayPosition.Y;
            // Position image at center of hands
            displayPosition.X = (float)(displayPosition.X - 
                (imageWidth / 2));
            displayPosition.Y = (float)(displayPosition.Y - 
                (imageHeight / 2));
            this.TranslateX = displayPosition.X;
            this.TranslateY = displayPosition.Y;
            // Derive the scale factor from distance between 
            // the right hand and the left hand
            this.ScaleFactor = handRight.X - handLeft.X;
        }

Skeleton data and image data are based on different coordinate systems. Therefore, it is necessary to convert coordinates in skeleton space to image space, which is what the GetDisplayPosition method does. Skeleton coordinates in the range [-1.0, 1.0] are converted to depth coordinates by calling SkeletonEngine.SkeletonToDepthImage. This method returns x and y coordinates as floating-point numbers in the range [0.0, 1.0]. The floating-point coordinates are then converted to values in the 320x240 depth coordinate space, which is the range that NuiCamera.GetColorPixelCoordinatesFromDepthPixel currently supports. The depth coordinates are then converted to colour image coordinates by calling NuiCamera.GetColorPixelCoordinatesFromDepthPixel. This method returns colour image coordinates as values in the 640x480 colour image space. Finally, the colour image coordinates are scaled to the size of the canvas display in the application, by dividing the x coordinate by 640 and the y coordinate by 480, and multiplying the results by the height or width of the canvas display area, respectively.
        private Vector GetDisplayPosition(Vector position, int width, int height)
        {
            int colourX, colourY;
            float depthX, depthY;
            this.KinectRuntime.SkeletonEngine.SkeletonToDepthImage(position, 
                out depthX, out depthY);
            depthX = Math.Max(0, Math.Min(depthX * 320, 320));
            depthY = Math.Max(0, Math.Min(depthY * 240, 240));
            ImageViewArea imageView = new ImageViewArea();
            this.KinectRuntime.NuiCamera.GetColorPixelCoordinatesFromDepthPixel(
                ImageResolution.Resolution640x480, 
                imageView, 
                (int)depthX, 
                (int)depthY, 
                0, 
                out colourX, 
                out colourY);
            return new Vector
            {
                X = (float)width * colourX / 640,
                Y = (float)height * colourY / 480
            };
        }

The application is shown below. It uses skeleton tracking to recognise the user’s hands, and once they are recognised it makes the Eye of Sauron visible and places it between the user’s hands. As the user moves their hands apart the size of the eye increases, and as the user closes their hands the size of the eye decreases. Therefore, scaling of the image occurs in response to the application recognizing the moving hand gesture.

eye1

eye2

Conclusion


The Kinect for Windows SDK beta from Microsoft research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. The skeleton tracking data returned from the sensor makes it easy to perform simple gesture recognition. The next step will be to generalise and extend the simple gesture recognition into a GestureRecognition class.