Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more
Google’s release of Gemini 2.0 Flash this weekgiving users a way to interact with videos around their environment, has set the stage for what could be a significant shift in how businesses and consumers interact with technology.
These releases — along with announcements from OpenAI, Microsoft, and others — are part of an ongoing shift in the field of technology called “multimodal AI.” The technology allows you to take video – or audio or images – that come on your computer or phone, and ask questions about it.
It also shows the rise of competition between Google and its competitors – OpenAI and Microsoft – in the development of AI technology. But more importantly, it feels like it defines the next era of interactive computing, agency computing.
This moment in AI feels to me like the “iPhone moment,” and I’m talking about 2007-2008 when Apple released the iPhone which, through the Internet and a simple user interface, changed everyday life by giving people access. powerful computers in their pocket.
Although OpenAI’s ChatGPT may have started the latest moment of AI with its powerful human-like chats in Nov 2022, Google’s current release at the end of 2024 seems like a great continuation of that time – at a time when many observers were worried. potential downside to AI technology.
Google’s Gemini 2.0 Flash provides functionality, allowing real-time interaction with video captured via a smartphone. Unlike previous demonstrations (for example, Google’s Project Astra in May), this technology is now available to users every day through Google’s AI Studio.
I encourage you to try it yourself. I used it to see and connect with what I lived in – which for me this morning was my kitchen and dining room. You can immediately see how this results in academic and other achievements. You can see why creator Jerrod Lew he did on X yesterday I was surprised when he used Gemini 2.0 Realtime to edit a video in Adobe Premier Pro. “This is so crazy,” he said, after Google guided him after a few minutes on how to add distractions even as a novice user.
Sam Witteveen, the famous AI developer and co-founder of Red Dragon AI, was given the opportunity to test Gemini 2.0 Flash, and he highlighted that the speed of Gemini Flash – it is twice as fast as Google’s record so far, Gemini 1.5 Pro – and the “insanely cheap” prices make it so that it won’t be a showcase for developers to try new things, but a tool to help businesses managing AI budgets. (To be clear, Google hasn’t announced pricing for Gemini 2.0 Flash yet. It’s a free preview. But Witteveen is basing his thoughts on what was launched with Google’s Gemini 1.5 series.)
For developers, the Live API of this multimodal live offers great potential, as it enables seamless integration into applications. That API is also available for your use; a a demo program is available. Here it is Google’s blog for developers.
Programmer Simon Willison called streaming API next level: “These things are just science fiction: having the opportunity to talk to LLM’s who are very knowledgeable about what they can ‘see’ through your camera is one of those ‘future’ things.” He also looked at how you request an API to create a code generator, which allows the models to write Python code, run it and consider the results as part of their answers – all in the future.
This technology is a good reflection of the new environment and user expectations. Imagine being able to analyze the latest video during a presentation, suggest changes, or solve problems in real time.
Yes, technology is great for consumers, but it’s important for business users and leaders to understand it as well. These innovations are the basis for a new way of working and interacting with technology – highlighting the benefits of the future and the direction of production.
Google’s Gemini 2.0 Flash Wednesday comes amid a flurry of releases by Google and its major competitors, who are rushing to ship their latest technologies by the end of the year. They all promise to provide consumer-ready capabilities – live video, image processing, and audio synthesis, but some are not yet fully developed or available.
One reason for the urgency is that some of these companies offer bonuses to their employees to deliver important products before the end of the year. One is proud to receive new information first. They can attract users to be the first, which OpenAI showed in 2022 – when its ChatGPT will be the fastest growing product in history. Although Google had the same technology, it was not ready for public release and was left without a foothold. Viewers have since criticized Google for being so slow.
This is what other companies have announced in the past few days, all helping usher in this new era of multimodal AI.
Although these technologies are revolutionary, challenges remain:
However, all these obstacles are overcome by the benefits of technology, and there is no doubt that developers and companies will rush to receive them next year.
As programmer Sam Witteveen and I discussion in our podcast recorded Wednesday night after Google’s release, Gemini 2.0 Flash is an impressive release – a time when multimodal AI has become a reality. Google’s progress has set a new benchmark, although it is true that this aspect may be too short. OpenAI and Microsoft are hot on its tail. We are still at the beginning of this revolution, as in 2008 when even the iPhone was released, it was not known how Google, Nokia, and RIM would respond. History showed Nokia and RIM didn’t, and they died. Google has responded very well, and has given the iPhone a run for its money.
Likewise, it is clear that Microsoft and OpenAI are very much in this competition with Google. Apple, meanwhile, has chosen to partner with technology and this week announced another partnership with ChatGPT – but it’s certainly not trying to win in this new era of multiple offerings.
In our podcast, Sam and I also explore Google’s unique capabilities in the browser space. For example, its release of Project Mariner, an extension of Chrome, allows you to browse the web in real-time with more performance than competing technologies offered by Anthropic (called Computer Usage) and Microsoft OmniParser (still under research). Although it is true that the Anthropic interface allows you to access more information about your computer. All of this gives Google the lead in the race to push forward AI technologies in 2005, even though Microsoft seems to have it. ahead of the real side of providing business solutions. AI assistants perform complex tasks independently, with little human intervention – for example, they will soon perform advanced research and database analysis before creating ecommerce, selling stocks or buying real estate.
Google’s focus on making the capabilities of Gemini 2.0 accessible to both developers and consumers is smart, as it ensures that it is addressing the industry with a comprehensive plan. Until now, Google has had a history of not focusing as much on developers as Microsoft.
The question for decision makers is not whether to use these tools, but how to quickly integrate them into the work process. It will be interesting to see where next year takes us. Be sure to listen to our take on business users in the video below: