Microsoft is no longer building any Kinect devices for consoles or PCs, writes Fast Co. Design. Since their 2010 introduction on the Xbox 360 and through major updates for Xbox One and PC, the sensors combined a depth-sensing camera, a regular video camera, and a microphone array into a device that Microsoft hoped would usher in a new wave of games and apps packed with voice and motion-based controls. Microsoft’s own marketing promised gamers that they themselves would be the controller.
With Kinect, Microsoft appeared to be leading the tech world with a rich mix of voice and motion controls that nobody else could match, all neatly packaged into a consumer-friendly box.
But a world of compelling voice and motion games never really materialized, so a device that once held such promise—as a box in your living room that you talk to; as a showcase of machine vision; as a basis for complex multimodal input mixing controllers, movement, and voice—is being killed off after never quite living up to the potential we felt it had. And as someone who still uses a Kinect every single day, I’m more than a little saddened.
Being different is difficult
Why did this happen? Much as Nintendo discovered with the Wii, novel control mechanisms—such as Kinect’s combination of motion, movement, and voice—have proven challenging for game developers to successfully exploit.
This is partly because they’re a poor match for a world dominated by cross-platform titles built for a generic game console rather than for any one device in particular. Why make a title for the Kinect when doing so means leaving behind every PlayStation 3 owner, along with every Xbox 360 owner who didn’t splurge on the $150 peripheral? Microsoft did what it could to make the hardware more appealing in this regard. The Xbox One initially shipped with a revised Kinect sensor in the box to ensure that, if nothing else, developers for the Xbox One could rely on the hardware at least being available to the target audience.
Even this move proved to be something of a liability. The inclusion drove up the cost of the hardware, and it made the Xbox One a somewhat less attractive development platform. Initially, the Xbox One dedicated a certain amount of memory and processor power to handling Kinect input, such that even non-Kinect games could not tap the full power of the machine. This may have been OK if the bundling had produced a spate of high-quality, Kinect-exploiting games.
But it didn’t because of a deeper and more enduring problem: these novel input systems pose a certain conceptual difficulty. The metaphors and design elements that work with controllers, mice, and keyboards are all well established and well understood by both developers and players alike. Corresponding concepts for Kinect, as for the Wii before it, don’t really exist. The same vocabulary of concepts hasn’t yet been developed. Even if they could guarantee that the hardware was present, developers never developed a strong sense of what to actually do with it.
On top of this, and especially with the first version for the Xbox 360, developers struggled to get the latency—the time between a player performing an action in real life and seeing that reflected on-screen—low enough for games to feel comfortable. The inability to track individual fingers similarly made it hard to develop games that needed fine control. The Xbox One’s Kinect was much better in both regards, but the lack of engagement with the original meant that it no longer mattered; game developer interest had evaporated.
The demise of the hardware will be mourned by a certain kind of more experimental developer. Microsoft produced Kinect hardware that worked with Windows PCs, along with an SDK to develop software for it, and a variety of researchers have taken advantage of this, using the peripheral for things like building machine vision systems for robots and user interface prototyping and development.
Laying the groundwork
The Kinect concepts, however, live on, and a case could be made that Kinect was the right technology, just delivered in not quite the right way. Microsoft’s HoloLens augmented reality headset, for example, uses a derivative of the Kinect for its depth sensing camera, and it uses its integrated sensors to scan the room around you and detect objects within it. This has in turn given way to systems that leverage a conventional camera along with accelerometer data to track motion, something that underpins the virtual and augmented reality system in the latest release of Windows 10.
Google has undergone a similar evolution, from dedicated Kinect-like hardware called Project Tango to a camera- and accelerometer-based system. Apple, meanwhile, bought PrimeSense, the company that made the sensors for the original Xbox 360 Kinect. A highly miniaturized version of that same technology is what powers the Face ID feature coming imminently with the iPhone X. Here, the depth sensing is used to better track the face rather than build a map of the environment or track the movements of your body. Apple also has an augmented reality framework that again uses regular cameras and accelerometers.
Voice control is also proliferating in the same ambient style as Kinect offered. The Xbox One with Kinect sits quietly in the corner of the room, listening for the command to be turned on; when on, you can talk to it from across the room to watch a video, start a game, or play some music. The Amazon Echo and Google Home devices develop this same core idea, working as headless agents to do your bidding. You just speak utterances into the void; the little cylinder in the corner of the room hears you and reacts accordingly.
Microsoft has never quite allowed Cortana on the Xbox to live up to this same potential. Last week, a standalone Cortana speaker from Harman Kardon hit the market, and this device goes head to head with the Google and Amazon offerings. Standalone speaker devices are essential to truly reach into the living room—a cost comparison between the Echo and the Xbox One makes that much clear. But one feels that the Xbox could and should have offered a similar experience. It should have been leading the way as a voice-controlled agent, with Microsoft using the Xbox to ensure that “Hey Cortana” is every bit as iconic and capable as “Alexa” or “OK Google.”
But the Xbox didn’t offer the same experience, and with the end of Kinect, it looks like it never will. The Xbox One, unlike those cheaper devices, has no eyes or ears of its own; it relied on the Kinect to provide that information. While the Xbox One S and Xbox One X are both compatible with the Kinect (albeit requiring the use of an adaptor to convert from Kinect’s proprietary connector to standard USB), neither of them ship with it, and with the discontinuation, nor will they ever. As Kinect hardware becomes harder to get ahold of, it’s going to likewise become harder to give Cortana those ears that she so desperately needs. A Cortana that you can’t speak to might as well not exist.
Microsoft’s “solution,” such as it is, is to plug a headset into your Xbox controller and use that to give commands to Cortana. This is a very different experience to the ambient style found with Alexa or Google Home, and if Microsoft thinks that donning a headset just to get Netflix to pause or wake the Xbox up so that I can watch something while cooking dinner is a reasonable suggestion, I’ve got news: it ain’t.
It makes for a very strange gap; while on the one hand the Windows, Bing, and Research divisions of Microsoft are investing heavily in Cortana and machine learning to build an ever-richer, more capable digital agent, the Xbox team has killed off what was arguably one of the most natural and convenient ways of actually using that agent. I could live without the cameras and motion input—my New York apartment isn’t big enough for that style of gaming anyway, so it has never been my priority—but for the life of me I can’t understand why the Xbox One S and Xbox One X do not include a built-in far-field array microphone so that they can continue to support voice commands from across the room, even without a Kinect attached.
The technology isn’t the only part of the Kinect that will stick around; so too will the conceptual difficulties. Voice controls remain frustratingly inflexible, and the success or failure of augmented and virtual reality will depend not only on developers producing compelling software to take advantage of the hardware but also on producing the metaphors and idioms to make this software feel familiar and predictable. Motion control is difficult, especially on the whole-body scale. Compared to our fingers, our bodies move slowly and imprecisely, and room-scale VR games especially need to be sensitive to this, just as Kinect titles did. We have better motion controllers today, but we still lack the ability to track individual finger movements.
The industry is investing heavily in AR and VR technology, but nobody can yet claim to have solved the human interaction side of things. There’s still every chance that the fate of VR, AR, and even voice controls could ultimately mirror that of the device that was in so many ways their precursor.