Designing Kinect-Based Experiences

More than 10 million Kinects have been sold since the depth- and gesture-recognition Xbox accessory first launched in November (selling so quickly in its first 60 days that it beat out the iPhone and iPad for a Guinness Book of World Records award).  Given that it’s aimed at a much larger consumer audience than the Xbox has been, and with Microsoft’s announcement of an official Kinect SDK on April 13, it’s likely more of us will be designing for Kinect-based interfaces in the near future.

I recently partnered with two talented developers to prototype a Kinect-based experience. We had the opportunity to observe more than 30 people use the prototype, which allowed for some great, ad hoc user research.

After the jump, you can read my takeaways and design recommendations based on observations of our experiment.  I’ll also try to post any new Kinect info I might gather from MIX11 next week.

Some basics

Opportunities provided by a Kinect:
  • Depth recognition, allowing an interpretation of 3D space
  • Gesture/motion recognition
  • Voice recognition
  • A very passionate community of enthusiast hackers creating a lot of cool stuff for the platform
  • Limited hardware needed–just a Kinect, your PC, and an idea
Constraints existing at the time of this writing:
  • 6-8 foot clearance needed between the subject and the Kinect
  • No official Microsoft SDK A problem no longer, thanks to the April 13th announcement of an official SDK!
  • Different TV sizes & living room configurations
  • No standardized UX or design guidelines

Make use of and get inspiration from existing hacks

There is a passionate community of enthusiasts who have created free, open source libraries for activating mouse cursors, setting up skeletal tracking, and lots more.  Even with an official SDK, time can be saved by building on top of what others have already explored.  A prominent resource in the community is the Open Kinect site. At the very least, you’ll leave with some fresh inspiration provided by these very creative folks.

Rely on gestures & voice over chrome

Using the “body as the interface” is one of the key tenets of Natural User Interfaces, or NUIs, like the Kinect.  But, if you’re developing very quickly and on the fly, it can be easy to fall back on UI controls (“chrome”) because they’re quicker to implement and more familiar to most designers and developers. We were guilty of overusing chrome in our first iterations.

Arguably, the Kinect seems to have only two gestures that could be considered standard. One of these is the calibration pose (also called the PSI pose), which is used to help the Kinect acquire and track a body. This is generally done once per user, because the Kinect is pretty good at staying locked on someone once recognized.  There are times when the Kinect can acquire someone without the aid of the calibration pose, and in other cases an Xbox calibration card can be used as a calibration alternative.

An example of the calibration pose, courtesy of rock-vacirca.blogspot.com

The other common gesture is the mouse cursor invocation, which is characterized by a vigorous waving motion.  Once instantiated, a person needs to drive it over to a button (no easy task) and then push their hand forward in a sort of “high-five” manner to press it.

The mouse cursor invocation wave

In observing users playing with our prototype, they appeared to be more comfortable and capable in the parts of the experience that allowed for wide and loose gestures. Several became visibly agitated if they had to activate and drive the mouse cursor. The cursor’s required precision seems antithetical to the fun, lightweight experience people expect from a Kinect and it seemed difficult for people to shift from making big, spatial movements to small, linear ones.

Additionally, although we only had four primary buttons in our interface, each extremely large and spaced apart by a minimum of 50 pixels on a 1280 x 720 screen, we observed many people accidentally hitting the wrong button.  The time it took our users to correct these kinds of mistakes and drive the cursor around easily sucked up more than half of their interaction time.

I recommend that designers and developers make the extra effort to really define, implement and iterate on gestures and voice before falling back to chrome for Kinect UI navigation.  Hopefully the work needed to specify gestural and vocal triggers will be decreased with the official SDK.

Provide anchors and instructions

You may want to build your gesture-driven app so that only one major task is required per screen or step. This will reduce the need for your audience to memorize more than one activity at a time.

Kinectimals is a game that is broken up into singular, fun activities

If you need to build an experience that has more than one activity that can occur at a time, make sure to have an anchor point in case people get lost or forget gestural triggers.  You can also help by using gestures that are based on a natural, contextual relationship to the content on the screen.

We learned from one failure of our original prototype: we didn’t provide our users with an in-app explanation of the calibration gesture.  We instead demonstrated this pose in person. Problems came to light during the few cases when our Kinect lost the user it was tracking in the middle of their session. Because the calibration pose is awkward and unnatural, not often reused, and therefore not easily recalled in muscle memory, our users need our help to remember.

It would have been better if we had created an overlay guide of the gesture on the screen when new users got started. Once the Kinect recognized them, the overlay would disappear.  We should have also created an instructions panel for users to re-access in case they needed to jog their memory.

Kinect Dance Central makes gestural guides a primary part of the game experience

Similarly, f you have an interface with multiple screen flows, provide some kind of home or back navigation (many Kinect games have players angle their left hand 45 degrees downward, toward the lower right corner).  However, I feel that having a visual button as an anchor will save people the hassle of recalling a word or gesture in a time of need.  It’s possibly the most justified use of chrome in the experience.

Optimize your layout

While you may be building or even testing your app initially on a PC, bear in mind that the most common use case will be on a TV.  Make sure to test it out  as such and design for the 10-foot UI.  Our experiments were optimized for a 1280 x 720 screen.

I drafted up the following templates as a guide to UI placement. These are based on my own interpretation of what I observed as comfortable vs. uncomfortable arm movement in our users. Your own explorations may incorporate other parts of the body or more intense movements.  Click to get a larger template for your own use.

As mentioned earlier, the mouse cursor can be a bit awkward. If you have buttons, keep them few and far between. Even with large hit areas of 100 pixels and spacing between buttons of 50 pixels, we noticed a lot of accidental hits as people tried to manipulate the cursor.

Multiplayer support

You’ll also want to decide whether your experience should support multiple users. Consider both physical mutliplayers (several people in the same space being interpreted by a single Kinect) and virtual multiplayers (individuals in different locations with their own Kinects linked together into one experience).  According to Microsoft, the Kinect has the potential to recognize an unlimited number of users in a single environment.

Even if you decide only to support a single user, take into consideration that other people may be moving around in their environment.  Especially if you are building an experience based on a hack, the Kinect can sometimes “lose” the primary player it was tracking and switch its focus to someone who has walked into the scene.  Just make sure to build around that possibility.

Emphasize the fun

Most people expect an active, entertaining experience if they’re firing up their Kinect.  We noticed that people were a lot more forgiving of some of the more utilitarian parts of our app because they were able play around and move more freely in other parts.

My recommendation is to integrate a good dose of activity into your Kinect experience.  Take advantage of the platform’s capabilities.  It’s probably best not to build an app for, say, doing taxes (unless you can make tax filing easier AND more effective AND entertaining, in which case, please go for it!).  If you are building an entertainment app or a game that needs to support some utilitarian tasks, try putting those actions into a secondary panel or screen.

Think about other postures

While today most Kinect experiences are optimized for standing subjects and the 6-8 foot experience, it can still recognize folks who are sitting or otherwise positioned.  Since the common location of a Kinect may be in the living room, consider if you’d like to support a use case of someone sitting a bit more passively on their couch.

Live camera feed or 3D avatar?

In general, you have two options in how you can display the user on the screen. You can composite your UI with a live camera feed or you can generate a 3D avatar and environment.

Using a live camera feed is probably the easiest to start with.  But, if you try to composite UI chrome and 3D textures with this feed, you may find yourself spending a lot of time trying tweaking things to achieve an acceptable visual result.   I’d recommend this approach if you have a lightweight UI you’d like to lay over a camera feed or a more task-focused application that may not rely heavily on figural representation.  We were able to create a simple UI using textures output from Photoshop as TGAs with alpha channel.

Creating a 3D avatar and environment will be more time consuming to start (OpenGL, Physics engines, yay!), but the end result is certainly more cohesive-looking. I’d recommend this approach if you want an immersive experience or have an interface with heavy figural representation.

An example of 3D avatars created on a Kinect

Handle mistakes with grace

Inevitably you’ll stumble across some odd issues, especially when it comes to acquiring and tracking a person. For example, we noticed that our prototype was failing to acquire certain users sporadically throughout our trials. This led to some humorous discussion in our group about what criteria the Kinect might be using to discriminate against some people.  We could not figure out what was going on until one of our problematic participants removed their baggy jacket. Voila, solved! Baggy clothing appeared to be at fault.

Have some fun creating your own Fail Whales for these kinds of situations. Don’t just have your Kinect react when the user does something correctly, have it recognize if something is not being done the right way and offer some advice.  Let it be conversational; if your user becomes idle, maybe the Kinect can quip, “So…great weather we’re having, right?”

Go modal for messaging

With such an active interface and with the limits of the 10-foot UI, your users are going to miss any important messages that don’t appear in their line of sight.  After all, they’re busy having fun! So, if you need to grab their attention, aim for a modal display.  This also cuts down on them needing to navigate over to your messaging in the middle of their activity.

More resources:

Microsoft’s main Kinect page Kinect Gestural UI: First Impressions A great usability review from Jakob Nielsen Open Kinect A community of enthusiasts that is working on free, open source libraries that enable the Kinect to be used with Windows, Linux, and Mac Kinect-Hacks This is a great inspiration and news source Open Natural Interaction Open NI is an organization formed to certify and promote the compatibility and interoperability of NUI apps, devices & middleware