David Galbraith has a smart + straightforward way to frame how AI will change the user interface.
First he imagines taking creating prompts and wrapping them up as buttons:
The best prompts now are quite simple, leaving AI to handle how to answer a question. Meanwhile AI chat suffers from the same problem from command lines to Alexa - how can i remember what to ask? Only this time the problem is exacerbated by the fact that AI is capable of practically anything making the task less one of remembering commands but a creative one of coming up with great questions or delivering an interface to discover them and then wrapping the resulting prompts as buttons.
(Which honestly would be amazing on its own: I have a few prompts I use regularly including Diane, my transcription assistant, and I have nowhere to keep them or run them or share them except for text files and my terminal history.)
And then he uses the concept of buttons to explain how a full AI interface can be truly different:
AI buttons are different from, say Photoshop menu commands in that they can just be a description of the desired outcome rather than a sequence of steps (incidentally why I think a lot of agents’ complexity disappears). For example Photoshop used to require a complex sequence of tasks (drawing around elements with a lasso etc.) to remove clouds from an image. With AI you can just say ‘remove clouds’ and then create a remove clouds button. An AI interface is a ‘semantic interface’.
Aha!
The buttons concept is not essential for this insight (though it’s necessary for affordances); the final insight is what matters.
I would perhaps say “intent” rather than “semantic.”
i.e. the user expresses the intent to remove clouds and then, today, is required to follow interface bureaucracy to achieve that. AI removes the bureaucracy.
And then: there are some intents which are easy to say but can’t be simply met using the bureaucracy of interface elements like buttons, drop-downs, swipes and lists. There are cognitive ergonomic limits to the human interface with software; with hardware there are physical limits to the control panel too. This constrains what we can do with our products as much as if they didn’t have that functionality at all.
So removing the interface bureaucracy is not about simplicity but about increasing expressiveness and capability.
What does it look like if we travel down the road of intent-maxing?
There’s a philosophy from the dawn of computing, DWIM a.k.a. Do What I Mean (Wikipedia).
Coined by computer scientist Warren Teitelman in 1966 and here explained by Larry Masinter in 1981: DWIM embodies a pervasive philosophy of user interface design.
DWIM is an embodiment of the idea that the user is interacting with an agent who attempts to interpret the user’s request from contextual information. Since we want the user to feel that he is conversing with the system, he should not be stopped and forced to correct himself or give additional information in situations where the correction or information is obvious.
Yes!
Squint and you can see ChatGPT as a DWIM UI: it never, never, never says “syntax error.”
Now, arguably it should come back and ask for clarifications more often, and in particular DWIM (and AI) interfaces are more successful the more they have access to the user’s context (current situation, history, environment, etc).
But it’s a starting point. The algo is: design for capturing intent and then DWIM; iterate until that works. AI unlocks that.
This perspective sheds some light on why OpenAI + others are chasing the mythical Third Device (The Verge). (Maybe it’s a hat.)
A DWIM AI-powered UI needs maximum access to context (to interpret the user and also for training) and to get as close as possible to the point of intent.
btw I’m not convinced the answer looks like One Device To Rule Them All but that’s another story.
It’s interesting to consider what a philosophy of Do What I Mean might lead to in a physical environment rather than just phones and PCs, say with consumer hardware.
Freed from interface bureaucracy, you want to optimise for capturing user intent with ease, expressiveness, and resolution – very different from the low bandwidth interface paradigm of jabbing single fingers at big buttons.
But honestly as a vision you can’t do better than Put-That-There (1982!!!!) by the Architecture Machine Group at MIT.
Here’s a short video demo: multimodal voice + pointing with a big screen and two-way conversation.
Like, let’s just do that?
(One observation is that I don’t think this necessarily leads to a DynamicLand-style programmable environment; Put-There-There works as a multimodal intent interface even without end-user programming.)
If you enjoyed this post, please consider sharing it by email or on social media. Here’s the link. Thanks, —Matt.
‘Yes, we’ll see them together some Saturday afternoon then,’ she said. ‘I won’t have any hand in your not going to Cathedral on Sunday morning. I suppose we must be getting back. What time was it when you looked at your watch just now?’ "In China and some other countries it is not considered necessary to give the girls any education; but in Japan it is not so. The girls are educated here, though not so much as the boys; and of late years they have established schools where they receive what we call the higher branches of instruction. Every year new schools for girls are opened; and a great many of the Japanese who formerly would not be seen in public with their wives have adopted the Western idea, and bring their wives into society. The marriage laws have been arranged so as to allow the different classes to marry among[Pg 258] each other, and the government is doing all it can to improve the condition of the women. They were better off before than the women of any other Eastern country; and if things go on as they are now going, they will be still better in a few years. The world moves. "Frank and Fred." She whispered something to herself in horrified dismay; but then she looked at me with her eyes very blue and said "You'll see him about it, won't you? You must help unravel this tangle, Richard; and if you do I'll--I'll dance at your wedding; yours and--somebody's we know!" Her eyes began forewith. Lawrence laughed silently. He seemed to be intensely amused about something. He took a flat brown paper parcel from his pocket. making a notable addition to American literature. I did truly. "Surely," said the minister, "surely." There might have been men who would have remembered that Mrs. Lawton was a tough woman, even for a mining town, and who would in the names of their own wives have refused to let her cross the threshold of their homes. But he saw that she was ill, and he did not so much as hesitate. "I feel awful sorry for you sir," said the Lieutenant, much moved. "And if I had it in my power you should go. But I have got my orders, and I must obey them. I musn't allow anybody not actually be longing to the army to pass on across the river on the train." "Throw a piece o' that fat pine on the fire. Shorty," said the Deacon, "and let's see what I've got." "Further admonitions," continued the Lieutenant, "had the same result, and I was about to call a guard to put him under arrest, when I happened to notice a pair of field-glasses that the prisoner had picked up, and was evidently intending to appropriate to his own use, and not account for them. This was confirmed by his approaching me in a menacing manner, insolently demanding their return, and threatening me in a loud voice if I did not give them up, which I properly refused to do, and ordered a Sergeant who had come up to seize and buck-and-gag him. The Sergeant, against whom I shall appear later, did not obey my orders, but seemed to abet his companion's gross insubordination. The scene finally culminated, in the presence of a number of enlisted men, in the prisoner's wrenching the field-glasses away from me by main force, and would have struck me had not the Sergeant prevented this. It was such an act as in any other army in the world would have subjected the offender to instant execution. It was only possible in—" "Don't soft-soap me," the old woman snapped. "I'm too old for it and I'm too tough for it. I want to look at some facts, and I want you to look at them, too." She paused, and nobody said a word. "I want to start with a simple statement. We're in trouble." RE: Fruyling's World "MACDONALD'S GATE" "Read me some of it." "Well, I want something better than that." HoME大香蕉第一时间
ENTER NUMBET 0016hbcakl.org.cn kouhigh.com.cn hsequi.com.cn ftx71.net.cn www.fnchain.com.cn www.nncq.com.cn www.qodmit.com.cn sttlfn.com.cn www.room79.com.cn www.wypiano.com.cn
David Galbraith has a smart + straightforward way to frame how AI will change the user interface.
First he imagines taking creating prompts and wrapping them up as buttons:
(Which honestly would be amazing on its own: I have a few prompts I use regularly including Diane, my transcription assistant, and I have nowhere to keep them or run them or share them except for text files and my terminal history.)
And then he uses the concept of buttons to explain how a full AI interface can be truly different:
Aha!
The buttons concept is not essential for this insight (though it’s necessary for affordances); the final insight is what matters.
I would perhaps say “intent” rather than “semantic.”
i.e. the user expresses the intent to
and then, today, is required to follow interface bureaucracy to achieve that. AI removes the bureaucracy.And then: there are some intents which are easy to say but can’t be simply met using the bureaucracy of interface elements like buttons, drop-downs, swipes and lists. There are cognitive ergonomic limits to the human interface with software; with hardware there are physical limits to the control panel too. This constrains what we can do with our products as much as if they didn’t have that functionality at all.
So removing the interface bureaucracy is not about simplicity but about increasing expressiveness and capability.
What does it look like if we travel down the road of intent-maxing?
There’s a philosophy from the dawn of computing, DWIM a.k.a. Do What I Mean (Wikipedia).
Coined by computer scientist Warren Teitelman in 1966 and here explained by Larry Masinter in 1981: DWIM
Yes!
Squint and you can see ChatGPT as a DWIM UI: it never, never, never says “syntax error.”
Now, arguably it should come back and ask for clarifications more often, and in particular DWIM (and AI) interfaces are more successful the more they have access to the user’s context (current situation, history, environment, etc).
But it’s a starting point. The algo is: design for capturing intent and then DWIM; iterate until that works. AI unlocks that.
This perspective sheds some light on why OpenAI + others are chasing the mythical Third Device (The Verge). (Maybe it’s a hat.)
A DWIM AI-powered UI needs maximum access to context (to interpret the user and also for training) and to get as close as possible to the point of intent.
btw I’m not convinced the answer looks like One Device To Rule Them All but that’s another story.
It’s interesting to consider what a philosophy of Do What I Mean might lead to in a physical environment rather than just phones and PCs, say with consumer hardware.
Freed from interface bureaucracy, you want to optimise for capturing user intent with ease, expressiveness, and resolution – very different from the low bandwidth interface paradigm of jabbing single fingers at big buttons.
So I’ve talked before about high bandwidth computer input, speculatively in terms of Voders, pedals, and head cursors (2021) or more pragmatically with voice, gesture, and gaze for everything.
But honestly as a vision you can’t do better than Put-That-There (1982!!!!) by the Architecture Machine Group at MIT.
Here’s a short video demo: multimodal voice + pointing with a big screen and two-way conversation.
Like, let’s just do that?
(One observation is that I don’t think this necessarily leads to a DynamicLand-style programmable environment; Put-There-There works as a multimodal intent interface even without end-user programming.)
Anyway.