Friday 26 August 2011

Kinect SDK: Gesture Recognition Pt III

Introduction

In my previous blog post I discussed the development of a robust and extensible PostureRecognizer class. The class is used to recognize the start of a gesture. As a reminder, my high-level approach to the gesture recognition process is as follows:

  • Detect whether the user is moving or stationary.
  • Detect the start of a gesture (a posture).
  • Capture the gesture.
  • Detect the end of a gesture (a posture).
  • Identify the gesture.

This blog post will focus on capturing the gesture that occurs once the starting posture has been identified.

Implementation

The UI is unchanged from my previous post, with the exception of another Canvas being added to draw the captured gesture on.

The StreamManager constructor creates an instance of the PostureRecognizer class and registers an event handler for the PostureDetected event. It then registers the path where the gesture data will be saved, via a stream. An instance of the GestureRecognizer class is then created, and the name of the canvas to draw the captured gesture on is passed to the DrawGesture method.

        public StreamManager(Canvas canvas)
        {
            this.PostureRecognizer = new PostureRecognizer();
            this.PostureRecognizer.PostureDetected += 
               new Action<Posture>(OnPostureDetected);
            
            this.gestureCanvas = canvas;
            this.path = Path.Combine(Environment.CurrentDirectory, @"data\gestures.dat");
            this.stream = File.Open(this.path, FileMode.OpenOrCreate);
            this.gestureDetector = new GestureRecognizer(this.stream);
            this.gestureDetector.DrawGesture(this.gestureCanvas, Colors.Red);
        }

The StartGestureCapture and EndGestureCapture methods are shown below. The StartGestureCapture method invokes the StartGestureCapture method in the GestureRecognizer class, and if a gesture is already being captured, stops the capture. The EndGestureCapture method invokes the EndGestureCapture method in the GestureRecognizer class, and invokes the SaveGesture method in the GestureRecognizer class, provided that a gesture is being captured. 
        private void StartGestureCapture()
        {
            this.gestureDetector.GestureName = this.GestureName;
            if (this.gestureDetector.IsRecordingGesture)
            {
                this.gestureDetector.EndGestureCapture();
                return;
            }
            this.gestureDetector.StartGestureCapture();
        }
        private void EndGestureCapture()
        {
            if (this.gestureDetector.IsRecordingGesture)
            {
                this.gestureDetector.EndGestureCapture();
                this.gestureDetector.SaveGesture(stream);
            }
        }

The GetSkeletonStream method is as in the previous post, and invokes the TrackPostures method in the PostureRecognizer class. If a posture is identified, the RaisePostureDetected method is invoked which sets the CurrentPosture property to the identified posture, and invokes the PostureDetected event. The handler for the event is the OnPostureDetected method in the StreamManager class and was registered in the StreamManager constructor.
        public event Action<Posture> PostureDetected;
        private void RaisePostureDetected(Posture posture)
        {
            if (this.currentPosture != posture)
            {
                this.CurrentPosture = posture;
            }
            if (this.PostureDetected != null)
            {
                this.PostureDetected(posture);
            }
        }

The OnPostureDetected method is shown below. If a gesture is not being captured, it invokes the StartGestureCapture method, and stores the identified posture in the startPosture variable. If a gesture is being captured and the posture is different to the posture stored in the startPosture variable, the EndGestureCapture method is invoked. The detected Joints are then enumerated, and the Add method in the GestureRecognizer class is invoked if the gesture involves the right hand.
        private void OnPostureDetected(Posture posture)
        {
            if (this.gestureDetector.IsRecordingGesture == true)
            {
                if ((posture != Posture.None) &&
                    (posture != startPosture))
                {
                    this.EndGestureCapture();
                }
            }
            else
            {
                this.StartGestureCapture();
                this.startPosture = posture;
            }
            foreach (Joint joint in this.skeleton.Joints)
            {
                if (joint.Position.W < 0.9f || 
                    joint.TrackingState != JointTrackingState.Tracked)
                {
                    continue;
                }
                if (joint.ID == JointID.HandRight)
                {
                    this.gestureDetector.Add(joint.Position, 
                                             this.KinectRuntime.SkeletonEngine);
                }
            }
        }

The GestureData class is used to model the gesture data. It contains the a Position and the Time the Position was captured.
    public class GestureData
    {
        public Vector Position { get; set; }
        public DateTime Time { get; set; }
    }

The GestureRecognizer class is shown below. Gestures are stored in a collection named gestures, of type List<GestureData>. An instance of the TemplateRecognizer class is also created, which will use template matching to perform gesture identification. The bulk of the work in this class is performed in the Add method. It stores the passed in Vector as a GestureData object, and adds that to the gestures collection. It then draws the position as a small ellipse on the Canvas contained in the displayCanvas variable. Then, if there is more data in the gestures collection than the value of WindowSize, data is removed from both the start of the collection and the displayCanvas. The position data is then added to an instance of the Gesture class. The StartGestureCapture, EndGestureCapture, and SaveGesture methods largely invoke methods in the TemplateRecognizer class.
    public class GestureRecognizer
    {
        private Gesture gesture;
        private string gestureName; 
        private readonly List<GestureData> gestures = new List<GestureData>();
        private readonly TemplateRecognizer.TemplateRecognizer templateRecognizer;
        private readonly int windowSize;
        private DateTime lastGestureDate = DateTime.Now;
        private Canvas displayCanvas;
        private Colour displayColour;
        protected List<GestureData> Gestures
        {
            get { return this.gestures; }
        }
        public string GestureName
        {
            get { return this.gestureName; }
            set { this.gestureName = value; }
        }
        public bool IsRecordingGesture
        {
            get { return this.gesture != null; }
        }
        public TemplateRecognizer.TemplateRecognizer TemplateRecognizer
        {
            get { return this.templateRecognizer; }
        }
        public int WindowSize
        {
            get { return this.windowSize; }
        }
        public GestureRecognizer(Stream stream, int windowSize = 100)
        {
            this.templateRecognizer = new TemplateRecognizer.TemplateRecognizer(stream);
            this.windowSize = windowSize;
        }
        public GestureRecognizer(string gestureName, Stream stream, int windowSize = 100)
        {
            this.gestureName = gestureName;
            this.templateRecognizer = new TemplateRecognizer.TemplateRecognizer(stream);
            this.windowSize = windowSize;
        }
        public void Add(Vector position, SkeletonEngine engine)
        {
            GestureData newGesture = new GestureData
            {
                Position = position,
                Time = DateTime.Now
            };
            this.gestures.Add(newGesture);
            if (this.displayCanvas != null)
            {
                Ellipse ellipse = new Ellipse
                {
                    HorizontalAlignment = HorizontalAlignment.Left,
                    VerticalAlignment = VerticalAlignment.Top,
                    Height = 8,
                    Width = 8,
                    StrokeThickness = 8,
                    Stroke = new SolidColorBrush(this.displayColour),
                    StrokeLineJoin = PenLineJoin.Round
                };
                float x, y;
                engine.SkeletonToDepthImage(position, out x, out y);
                x = (float)(x * this.displayCanvas.ActualWidth);
                y = (float)(y * this.displayCanvas.ActualHeight);
                Canvas.SetLeft(ellipse, x - ellipse.Width / 2);
                Canvas.SetTop(ellipse, y - ellipse.Height / 2);
                this.displayCanvas.Children.Add(ellipse);
            }
            if (this.gestures.Count > this.WindowSize)
            {
                GestureData gestureToRemove = this.gestures[0];
                if (this.displayCanvas != null)
                {
                    this.displayCanvas.Children.RemoveAt(0);
                }
                this.gestures.Remove(gestureToRemove);
            }
            if (this.gesture != null)
            {
                Vector2 vector = new Vector2
                {
                    X = position.X,
                    Y = position.Y
                };
                this.gesture.Points.Add(vector);
            }
        }
        public void DrawGesture(Canvas canvas, Color colour)
        {
            this.displayCanvas = canvas;
            this.displayColour = colour;
        }
        public void StartGestureCapture()
        {
            this.gesture = new Gesture(this.WindowSize);
        }
        public void EndGestureCapture()
        {
            this.templateRecognizer.Add(gesture);
            this.gesture = null;
        }
        public void SaveGesture(Stream stream)
        {
            this.templateRecognizer.Save(stream);
        }
    }

The Gesture class stores all the positions of the captured gesture in a collection called points, of type List<Vector>.
    [Serializable]
    public class Gesture
    {
        private List<Vector> points;
        private readonly int samplesCount;
        public List<Vector> Points
        {
            get { return this.points; }
            set { this.points = value; }
        }
        public Gesture(int samplesCount)
        {
            this.samplesCount = samplesCount;
            this.points = new List<Vector>();
        }
    }

The TemplateRecognizer class will perform the bulk of the processing required for gesture identification, which will be covered in a future blog post. At the moment the class simply handles serialization and deserialization of the gesture data, by using a BinaryFormatter. The constructor deserializes the gesture data if it exists, while the Save method serializes the gesture data to a file.
    public class TemplateRecognizer
    {
        private readonly List<Gesture> gestures;
        public TemplateRecognizer(Stream stream)
        {
            if (stream == null || stream.Length == 0)
            {
                this.gestures = new List<Gesture>();
                return;
            }
            BinaryFormatter formatter = new BinaryFormatter();
            this.gestures = (List<Gesture>)formatter.Deserialize(stream);
        }
        public void Add(Gesture gesture)
        {
            this.gestures.Add(gesture);
        }
        public void Save(Stream stream)
        {
            BinaryFormatter formatter = new BinaryFormatter();
            formatter.Serialize(stream, this.gestures);
        }
    }

The application is shown below. It determines whether the user is moving or stationary, and recognizes the start of a gesture – in this case, a posture. Once a posture is identified (currently hard coded to RightHello) the gesture points are stored in a collection and drawn on the canvas. When a different posture is identified the gesture capture stops.

gesture3<

Conclusion


The Kinect for Windows SDK beta from Microsoft Research is a starter kit for application developers. It allows access to the Kinect sensor, and experimentation with its features. My gesture recognition process now determines whether the user is moving or stationary, recognizes a posture and then captures the gesture that follows, before saving it to a file. The next step of the process will be to scale the gesture data to a reference, before saving it to a file.

No comments: