The latest ChatGPT-4o, released on Tuesday, allows users to input audio, image and even documents; earlier versions supported only text input by users. AI models such as Baidu’s Wenxin Yiyan allow users to chat with them using images, audio and documents, but OpenAI, the once-leader of the industry, had fallen behind on that front.
That it took timely steps to catch up reaffirms a “human-level response”, a term OpenAI CEO and cofounder Sam Altman mentioned as the future of AI technology.
AI is often seen as an information processing assistant that “uses human forms of interaction to communicate”. The addition of real-time voice interaction functionality undoubtedly makes the user experience of large models more aligned with people’s expectations from an “AI assistant”.
They can even direct their smartphone camera toward a book or notebook and ask ChatGPT to act on the content that is seen. In fact, according to the OpenAI product release, ChatGPT can even see the user’s expression through the camera and comfort him/her if they seem tense.
With these new “screen viewing” and “emotion feeling” features, users will feel like AI is serving them, unlike in the past when they were serving AI.
ChatGPT-4o offers free services to users. OpenAI Chief Technology Officer Mira Murati and CEO Altman stressed that “free-to-use” strategy is the future of their company. But it should be noted that free services benefit not only users but also the service provider, enabling them to expand their businesses, collecting more data from free users to train their Large Language Model, and then making new high-end products for paid users. That’s a benign cycle that makes the free mode sustainable.