When it comes to productivity tools, we’ve reached the maximum complexity of graphical user interfaces (GUIs): cascading menus, endless dialogue boxes, a plethora of buttons. When you need a search engine or a user-generated tutorial to figure out how to change a setting in your email, slide deck, or image editor, the interface is no longer discoverable. So if using a GUI is now like trying to determine how to write a query in SQL, what’s the way back to discoverability?
For many years at Uncharted, our role has been designing interfaces that facilitate complex interactions between humans and machines. Now we’re actively investigating large language models (LLMs) as a way to trade traditional click and command methods for new yet familiar interactions via dialogue.
LLMs promise to help users easily complete tasks within the scope of their applications. In analytical applications, for example, they can answer questions about data or analyze diverse topics of time, geography, categorization, connection, correlation, and rank without requiring users to navigate 10 different tabs, menus, and buttons. But LLM responses may be wrong due to user phrasing, inexact questions, or unconstrained scope. How can we leverage these answers—and the reasoning behind them—to improve understanding of analytical results regardless of whether the LLM is on the right track?
Reasoning as context for multimodal interfaces
Whenever an LLM generates a data-driven answer, it can articulate the context of the solution: Did it use geographic reasoning? Temporal reasoning? Graph reasoning? What scope did it use?
This context can drive relevant visualization interfaces, such as maps, timelines, or scatterplots, within a system. It then helps the user understand the answer, and becomes an additional interface for refining the answer: simply click the visualization controls to narrow the timeframe, zoom in to a region, or expand a local set of connections.
Thus, LLMs become a part of a larger multimodal interface, with each part contributing what it does well: LLMs with the potential to excel at large-scale or language-specific questions and visualizations to aid exploration and refinement. We’re investigating this in a few different projects.
Language analytics and query visualization
In one project, we’ve inserted an LLM into a language analytics workflow. Traditional NLP tools analyze the text to summarize the distribution of parts of speech or highlight commonality while the LLM lets users employ simple natural language commands to find elements within the text such as idioms or rhetorical devices.
In another project, the LLM becomes the initial query from which to configure and depict visualizations. Modifying the query modifies the visualization. Modifying the visualization resubmits a refined query. For every prompt there is a visual textual set of answers.
What’s next?
What if we combine these advances with our work in AR and gestural interfaces? (Perhaps you know someone who likes to gesture when they talk?) Even simple gestures such as swipes and pinches can correspond to pan, scroll and zoom, for example, to expand or direct responses. Gestures, clicks, and language—whether written or spoken—can all become part of the multimodal interface.
As a result, we think that the interfaces of analytical applications with highly structured workflows will increasingly incorporate unstructured conversations. The key challenge then becomes how best to merge traditional HMI design with LLMs to create a best-of-both-worlds solution that supports analytical decisions, a paradigm that has the potential to empower novice users and break down barriers imposed by esoteric interfaces.
Read more: Collect and Connect the Dots with LLMs