Friday, 24 June 2011

Kinect SDK: Skeleton Tracking


A previous post examined the basics of accessing the video stream from the Kinect sensor. This post builds on that, and examines tracking the skeleton of a user.


The NUI API processes data from the Kinect sensor through a multi-stage pipeline. At initialization, you must specify the subsystems that it uses, so that the runtime can start the required portions of the pipeline. An application can choose one or more of the following options:

  • Colour - the application streams colour image data from the sensor.
  • Depth - the application streams depth image data from the sensor.
  • Depth and player index - the application streams depth data from the sensor and requires the user index that the skeleton tracking engine generates.
  • Skeleton - the application uses skeleton position data.

Stream data is delivered as a succession of still-image frames. At NUI initialization, the application identifies the streams it plans to use. It then opens streams with additional stream-specific details, including stream resolution, image type etc.

For skeleton tracking, the user should stand between 4 and 11 feet from the sensor. Additionally, the SDK beta only allows skeleton tracking for standing scenarios, not seated scenarios.

Segmentation Data

When the Kinect sensor identifies users in front of the sensor array, it creates a segmentation map. This map is a bitmap in which the pixel values correspond to the index of the person in the field of view who is closest to the camera, at that pixel position. Although the segmentation data is a separate logical stream, in practice the depth data and segmentation data are merged into a single frame:

  1. The 13 high-order bits of each pixel represent the distance from the depth sensor to the closest object, in millimeters.
  2. The 3 low-order bits of each pixel represent the index of the tracked user who is visible at the pixel’s x and y coordinates.

An index of zero indicates that no one was found at that location. Values of one and two identify users.

Skeleton Tracking

The NUI Skeleton API provides full information about the location of up to two users standing in front of the Kinect sensor, with detailed position and orientation information. The data is provided to an application as a set of endpoints, called skeleton positions, that compose a skeleton. The skeleton represents a user’s current position and pose.


The skeleton system always mirrors the user who is being tracked. If this is not appropriate for your application, you should create a transformation matrix to mirror the skeleton.


The first step in implementing this application (after creating a new WPF project) is to include a reference to Microsoft.Research.Kinect. This assembly is in the GAC, and calls unmanaged functions from managed code. I then developed a basic UI, using XAML, that displays the tracked skeleton, allows motor control for fine tuning the sensor position, and displays the frame rate. The code is shown below. MainWindow.xaml is wired up to the Window_Loaded and Window_Closed events, and contains a Canvas named skeleton that will be used for displaying the skeleton tracking data.

<Window x:Class="KinectDemo.MainWindow"
        Title="Skeletal Tracking" ResizeMode="NoResize" SizeToContent="WidthAndHeight"
        Loaded="Window_Loaded" Closed="Window_Closed">
            <ColumnDefinition Width="Auto" />
            <ColumnDefinition Width="200" />
        <StackPanel Grid.Column="0">
            <TextBlock HorizontalAlignment="Center"
                       Text="Skeletal Tracking" />
            <Canvas x:Name="skeleton"
                    Width="400" />
        <StackPanel Grid.Column="1">
            <GroupBox Header="Motor Control"
                <StackPanel HorizontalAlignment="Center" 
                    <Button x:Name="motorUp"
                            Width="70" />
                    <Button x:Name="motorDown"
                            Width="70" />
            <GroupBox Header="Information"
                <StackPanel Orientation="Horizontal" Margin="10">
                    <TextBlock Text="Frame rate: " />
                    <TextBlock x:Name="fps" 
                               Text="0 fps"
                               Width="50" />

The class-level declarations are shown below. The important one is jointColours, which uses a Dictionary object to associate a brush colour with each skeleton joint for use in rendering.
        private Runtime nui;
        private Camera cam;
        private int totalFrames = 0;
        private int lastFrames = 0;
        private DateTime lastTime = DateTime.MaxValue;
        private Dictionary<JointID, Brush> jointColours = new Dictionary<JointID, Brush>()
            { JointID.HipCenter, new SolidColorBrush(Color.FromRgb(169, 176, 155)) },
            { JointID.Spine, new SolidColorBrush(Color.FromRgb(169, 176, 155)) },
            { JointID.ShoulderCenter, new SolidColorBrush(Color.FromRgb(168, 230, 29)) },
            { JointID.Head, new SolidColorBrush(Color.FromRgb(200, 0,   0)) },
            { JointID.ShoulderLeft, new SolidColorBrush(Color.FromRgb(79,  84,  33)) },
            { JointID.ElbowLeft, new SolidColorBrush(Color.FromRgb(84,  33,  42)) },
            { JointID.WristLeft, new SolidColorBrush(Color.FromRgb(255, 126, 0)) },
            { JointID.HandLeft, new SolidColorBrush(Color.FromRgb(215,  86, 0)) },
            { JointID.ShoulderRight, new SolidColorBrush(Color.FromRgb(33,  79,  84)) },
            { JointID.ElbowRight, new SolidColorBrush(Color.FromRgb(33,  33,  84)) },
            { JointID.WristRight, new SolidColorBrush(Color.FromRgb(77,  109, 243)) },
            { JointID.HandRight, new SolidColorBrush(Color.FromRgb(37,   69, 243)) },
            { JointID.HipLeft, new SolidColorBrush(Color.FromRgb(77,  109, 243)) },
            { JointID.KneeLeft, new SolidColorBrush(Color.FromRgb(69,  33,  84)) },
            { JointID.AnkleLeft, new SolidColorBrush(Color.FromRgb(229, 170, 122)) },
            { JointID.FootLeft, new SolidColorBrush(Color.FromRgb(255, 126, 0)) },
            { JointID.HipRight, new SolidColorBrush(Color.FromRgb(181, 165, 213)) },
            { JointID.KneeRight, new SolidColorBrush(Color.FromRgb(71, 222,  76)) },
            { JointID.AnkleRight, new SolidColorBrush(Color.FromRgb(245, 228, 156)) },
            { JointID.FootRight, new SolidColorBrush(Color.FromRgb(77,  109, 243)) }

The Window_Loaded event handler, creates the NUI runtime object, opens the video and skeletal streams, and registers the event handler that the runtime calls when a skeleton frame is ready.
        private void Window_Loaded(object sender, RoutedEventArgs e)
            this.nui = new Runtime();
                nui.Initialize(RuntimeOptions.UseColor | RuntimeOptions.UseSkeletalTracking);
       = nui.NuiCamera;
            catch (InvalidOperationException)
                MessageBox.Show("Runtime initialization failed. Ensure Kinect is plugged in");
                this.nui.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);
            catch (InvalidOperationException)
                MessageBox.Show("Failed to open stream. Specify a supported image type/resolution.");
            this.lastTime = DateTime.Now;
            this.nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);

The SkeletonFrameReady event handler retrieves a frame of skeleton data, clears any skeletons from the display, and then draws line segments to represent the bones and joints of each tracked skeleton. skeletonFrame.Skeletons is an array of SkeletonData structures, each of which contains the data for a single skeleton. If the TrackingState field of the SkeletonData structure indicates that the skeleton is being tracked, getBodySegment is called multiple times to draw lines that represent the connected bones of the skeleton. The calls to skeleton.Children.Add add the returned lines to the skeleton display. Once the bones are drawn, the joints are drawn. Each joint is drawn as a 6x6 box, in the colour that was defined in the jointColours dictionary.
        private void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
            SkeletonFrame skeletonFrame = e.SkeletonFrame;
            Brush brush = new SolidColorBrush(Color.FromRgb(128, 128, 255));
            foreach (SkeletonData data in skeletonFrame.Skeletons)
                if (SkeletonTrackingState.Tracked == data.TrackingState)
                    // Draw bones
                    this.skeleton.Children.Add(this.getBodySegment(data.Joints, brush, JointID.HipCenter, JointID.Spine, 
                        JointID.ShoulderCenter, JointID.Head));
                    this.skeleton.Children.Add(this.getBodySegment(data.Joints, brush, JointID.ShoulderCenter, JointID.ShoulderLeft, 
                        JointID.ElbowLeft, JointID.WristLeft, JointID.HandLeft));
                    this.skeleton.Children.Add(this.getBodySegment(data.Joints, brush, JointID.ShoulderCenter, JointID.ShoulderRight, 
                        JointID.ElbowRight, JointID.WristRight, JointID.HandRight));
                    this.skeleton.Children.Add(this.getBodySegment(data.Joints, brush, JointID.HipCenter, JointID.HipLeft, 
                        JointID.KneeLeft, JointID.AnkleLeft, JointID.FootLeft));
                    this.skeleton.Children.Add(this.getBodySegment(data.Joints, brush, JointID.HipCenter, JointID.HipRight, 
                        JointID.KneeRight, JointID.AnkleRight, JointID.FootRight));
                    // Draw joints
                    foreach (Joint joint in data.Joints)
                        Point jointPos = this.getDisplayPosition(joint);
                        Line jointLine = new Line();
                        jointLine.X1 = jointPos.X-3;
                        jointLine.X2 = jointLine.X1 + 6;
                        jointLine.Y1 = jointLine.Y2 = jointPos.Y;
                        jointLine.Stroke = jointColours[joint.ID];
                        jointLine.StrokeThickness = 6;

getBodyPosition creates a collection of points and joins them with line segments. It returns a Polyline that connects the specified joints.
        private Polyline getBodySegment(JointsCollection joints, Brush brush, params JointID[] ids)
            PointCollection points = new PointCollection(ids.Length);
            for (int i = 0; i < ids.Length; ++i)
            return new Polyline()
                Points = points,
                Stroke = brush,
                StrokeThickness = 5

Skeleton data and image data are based on different coordinate systems. Therefore, it is necessary to convert coordinates in skeleton space to image space, which is what getDisplayPosition does. Skeleton coordinates in the range [-1.0, 1.0] are converted to depth coordinates by calling SkeletonEngine.SkeletonToDepthImage. This method returns x and y coordinates as floating-point numbers in the range [0.0, 1.0]. The floating-point coordinates are then converted to values in the 320x240 depth coordinate space, which is the range that NuiCamera.GetColorPixelCoordinatesFromDepthPixel currently supports. The depth coordinates are then converted to colour image coordinates by calling NuiCamera.GetColorPixelCoordinatesFromDepthPixel. This method returns colour image coordinates as values in the 640x480 colour image space. Finally, the colour image coordinates are scaled to the size of the skeleton display in the application, by dividing the x coordinate by 640 and the y coordinate by 480, and multiplying the results by the height or width of the skeleton display area, respectively. Therefore, getDisplayPosition transforms the positions of both bones and joints.
        private Point getDisplayPosition(Joint joint)
            int colourX, colourY;
            float depthX, depthY;
            this.nui.SkeletonEngine.SkeletonToDepthImage(joint.Position, out depthX, out depthY);
            depthX = Math.Max(0, Math.Min(depthX * 320, 320));
            depthY = Math.Max(0, Math.Min(depthY * 240, 240));
            ImageViewArea imageView = new ImageViewArea();
                imageView, (int)depthX, (int)depthY, 0, out colourX, out colourY);
            return new Point((int)(skeleton.Width * colourX / 640.0), (int)(skeleton.Height * colourY / 480));

CalculateFps calculates the frame rate for skeletal processing.
        private void CalculateFps()
            DateTime current = DateTime.Now;
            if (current.Subtract(this.lastTime) > TimeSpan.FromSeconds(1))
                int frameDifference = this.totalFrames - this.lastFrames;
                this.lastFrames = this.totalFrames;
                this.lastTime = current;
                this.fps.Text = frameDifference.ToString() + " fps";

The Window_Closed event handler simply uninitializes the NUI (releasing resources etc.) and closes the environment.
        private void Window_Closed(object sender, EventArgs e)

Te application is shown below. It detects and tracks the user, provided that they are stood between 4 and 11 feet from the sensor.



The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It enables access to the Kinect sensor, and experimentation with its features. The Kinect sensor returns skeleton tracking data that can be processed and rendered by an application. This involves converting coordinates from skeletal space to image space.

No comments: