When Siri was first introduced, people thought it was much smarter than it actually is. I heard kids giggling for hours, asking it silly questions. In effect, Siri was good for executing Web searches by voice and giving sassy answers to questions about itself. Neat trick, but not very sophisticated. After a few months, most people quit using Siri because, honestly, it just wasn't that practically useful.
The Amazon Echo was widely mocked when it was introduced. Who is going to pay $200 for a speaker? It became a surprise smash hit, not because people needed another speaker but because it had an extensible API that allowed 3rd party developers to code new capabilities for it. It quickly found multiple unserved niches, particularly in home automation. "Alexa, turn off the lights." People who own Echos almost universally say they use it every single day and find it has become an integral part of their experience at home.
The core difference between these two experiences is the existence of an API. The Echo has thousands of 3rd party developers thinking up new ideas for the platform and teaching it new skills, and Siri has Apple. A 3rd party developer who wants to make their app work with Siri has no option other than to index their app and hope it comes up as a search result on a Siri voice search.
There was a brief glimmer of hope recently when Apple introduced SiriKit. Finally, Apple was going to make it possible for 3rd party developers to integrate their apps with Siri! Not so fast, enterprising developers… SiriKit only supports about a dozen canned interactions. They support Ride Booking (for example, book an Uber), person to person payments (Send $20 to a friend on Venmo), starting and stopping a workout, and some basic carplay commands. Although this is some progress, this canned set of actions merely opens up a handful of possibilities for Siri. Apple is still a first-class citizen when it comes to integrating their own apps with Siri and the 3rd party marketplace is relegated to 3rd class citizens in steerage.
Privacy Concerns and Actions
Many of the limitations on integration with virtual assistants boils down to privacy concerns. Google Now reads all of my Gmail messages to provide me with helpful information. I don't want every app I install on my phone to start reading my email, too.
As a result of these privacy concerns, the better virtual assistant APIs are currently limited to being able to register your app for action commands. Google Voice Actions, Cortana, and Amazon all allow you to define phrases that your application can execute on. This is a good start and it allows for a reasonable level of integration with these virtual assistant platforms.
Context Is Needed
Being able to register for context is half of the battle. The platforms with action APIs will allow you to register for a command like, "Send flowers to Mom," and activate your flower ordering app. The problem is that the app doesn't know who your Mom is even though Google does. The user's intent in this case is clearly to share your mother's name and address with the flower ordering app.
To make virtual assistants truly useful for end-users, these platforms need a way to integrate with 3rd party applications that include context without putting people's data at risk. I would propose that this could be done by allowing apps a richer method of registering not only the action commands they can respond to but the context they need to deliver on the user's action.
For example, you could register your car insurance company as subscribed to topics about insurance, cars, and household budgeting. Within each of these topics, you would need to define the moments in natural language terms, like "If the user is in a car accident" would define the broad topic areas that are relevant to your application. If these topic areas are triggered, the virtual assistant platform could pass a pre-defined set of context information that is relevant to this experience, such as the type of car being considered for purchase. Within these topics, your application could define its more specific actions that it can handle using that general context.
Air bags deployed, insurance assistant can proactively pipe in and ask if you'd like a claims agent to meet you or, in the case of Google Now, put a card at the top of your list with a button to summon an insurance agent.
Real magic can happen if virtual assistants can start allowing 3rd parties to collaborate together to deliver more value to the customer. For example, in a household budgeting scenario, multiple apps could collaborate to provide more information than any one company could do by themselves. For example, your bank, credit card company, wealth advisor, insurance, cable, telephone, and so forth, all have a piece of your household's budgetary picture. The problem then arises with making all of these companies behave more in the interest of the user than themselves.
Each company is incented to push themselves to the forefront. The insurance company wants to sell car insurance, the wealth management company wants you to put more money under their management, and the cable company wants you to expand your channel line-up. If you asked your assistant to help you understand your budget, each of these providers screaming at you to sign up for more services would hardly be helpful.
As a result of the need to drive this collaboration, virtual platforms will need to evolve to allow 3rd party applications to describe the services they can perform in a situation like this. The virtual assistant can provide the appropriate context and the 3rd party application can describe what they can do for that context. The virtual assistant then will need to make the decision as to which of the various 3rd party applications has the most relevant input to the current need.
To create a true virtual assistant platform that can unlock the power of the entire marketplace, 3rd party applications need:
- Context: When the user is considering something in this topical area, this is the context information I need to be relevant.
- Actions: These are the specific actions I can do.
- Proposals: For situations where multiple applications can help, this is how I can propose what I can do for the user.
This could potentially require more abstract reasoning than is available under the hood of the simpler assistants like Siri currently can muster. The more advanced recognition systems like Watson would have no trouble assembling these pieces.
It's past time to open up virtual assistant APIs. New entrants like Viv are going to eat the lunch of these closed platforms. Truly open APIs allow a marketplace of innovation that is broader than a dozen canned possibilities to create amazing, surprising, and memorable experiences.