A Guide to Google I/O 2024
2 hours of keynotes and 100 updates later: LearnLM, AI Agents, Project Astra, AI Teammate, Gemini 1.5 Pro explained
It’s been a big week of AI with back-to-back launches: OpenAI’s livestream on Monday and Google I/O on Tuesday. We’re back with a primer to cut through the noise and highlight key updates from Google I/O. (In case you missed it, here’s the guide to GPT-4o and OpenAI’s announcements).
Google I/O’s two-hour keynote was a whirlwind of AI news and features, with Google announcing around 100 different updates. Curious about Project Astra? Wondering what AI agents are? Puzzled by the difference between Gemma and Gems? What’s the purpose of LearnLM? What does this all mean?!
Google is rapidly integrating AI across its ecosystem, including Search, Workspace, Android, Google Classroom and more. There’s a lot to unpack - here are some highlights that stood out:
LearnLM: Circle to Search, Learning Coach Gems, YouTube Quizzes, Google Classroom, Illuminate, Learn About
AI Agents: Project Astra, What are AI Agents?, AI Teammate in Google for Workspace
NotebookLM
Gemini 1.5 Pro and Gemini 1.5 Pro Flash
Veo
Gems
Other Ecosystem and Technical Releases
Early Thoughts…
LearnLM
What if everyone, everywhere, could have a personal AI tutor on any topic? What if every educator could have their own assistant in the classroom?
LearnLM is Google’s new family of models, based on Gemini and fine-tuned for learning. Grounded in educational research, LearnLM aims to make learning experiences more personal and engaging. It will be integrated into existing Google products - Search, Android, Gemini and YouTube. Google is partnering with Columbia Teachers College, ASU, MIT Raise and Khan Academy to test and improve these new capabilities.
They are also collaborating directly with educators to develop more tools with LearnLM. You can read LearnLM’s comprehensive technical report here.
Practical applications:
Circle to Search: Android users can now work through math or physics word problems directly from their phones and tablets. Later this year, users will be able to solve complex problems involving symbolic formulas, diagrams, graphs and more.
Gems for Education: Google is creating pre-made Gems (customized versions of Gemini, like a GPT), including one called Learning Coach. Learning Coach offers step-by-step study guidance and leverages practice and memory techniques to build understanding, rather than simply sharing the answer.
YouTube Integration: LearnLM will make educational videos more interactive. Users can ask clarifying questions, get helpful explanations, and take quizzes. Given Gemini model’s long-context capabilities, these features will even work for long lectures or seminars.
Google Classroom: LearnLM will help teachers simplify and improve the lesson planning process to meet the individual needs of students. A demo showcased a tool that can simplify or re-level content for a target grade level. Future features may help teachers discover new ideas and unique activities.
Illuminate: Google is also building a new tool in Google Labs that will break down research papers into short audio conversations. In minutes, it will generate audio with two AI-generated voices discussing key insights from the paper. They will soon roll out the ability for users to ask follow-up questions. Join the waitlist here.
Learn About: another Labs experience, Learn About is a conversational learning companion that adapts to your unique curiosity and learning goals. It combines high-quality content, learning science, and chat experiences. Users can ask questions and be guided through topics at their own pace using pictures, videos, webpages, and activities. Files or notes can also be uploaded for further clarification. Sign up to be an early tester here.
Availability: in the coming months. YouTube LearnLM and Circle to Search already rolled out to select Android users.
Project Astra
Project Astra is Google DeepMind’s vision for the future of AI assistants (akin to what OpenAI showcased with GPT-4o, an assistant that is able to reason across text, video and audio). It’s described as an “advanced seeing and talking responsive agent”. During a pre-recorded demo, users pointed their AI assistant at physical objects in the real world and received answers in real-time. The agent processes a constant stream of audio and video input, allowing it to understand and interact with its environment dynamically and conversationally.
To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay. - Demis Hassabis, Co-founder and CEO of DeepMind
A few applications from the demo:
Identifying Locations: point the camera at your window and ask which neighborhood you’re in
Engineering Assistance: point the camera at an engineering diagram and ask it for advice
Object Recall: ask “where did I last leave my glasses” and Project Astra can recall the location of your object
Comparison to GPT-4o: Early testers report that Project Astra has a longer latency and less emotional intelligence and tone compared to GPT-4o. However, it boasts strong text-to-speech and potentially better ongoing video with long context support. Unlike GPT-4o, which relies on one single model to achieve these capabilities, Project Astra uses multiple separate AI models that do different tasks, which can result in reduced performance.
Availability: not available to users and has only been shared through pre-recorded videos.
What are AI agents? AI agents have been the hot topic of 2024.
Current versions of AI models (ChatGPT and Gemini) possess vast knowledge, having ingested a massive corpus of knowledge from the internet. However, they don’t do or perform many tasks end-to-end - they still require human prompts to perform tasks and often need human intervention to complete them. AI agents can execute on multiple tasks independently, potentially keeping the human “out of the loop” entirely. With traditional SaaS, we augmented human work and services with software. Now, the software itself could doing the whole job - “imagine what you could do if you commanded an army of 1,000 AI agents”
AI Agents in Action:
Email Management: an AI agent that continuously organizes and tracks all receipts in your inbox, compiling them into a spreadsheet
Moving to a New City: Gemini and Chrome can collaborate to give you a list of services near your new home and update your address across dozens of websites/accounts
Return Assistance: an AI agent that helps you return shoes by searching your inbox for the order number, filling out the return form, and scheduling a pick-up appointment
What about an AI teammate agent that lives inside Google workspace?
AI Teammate: Gemini for Workspace
Paying users will soon have an AI assistant integrated into their Google Apps, across Gmail, Docs, Sheets, Slides and more. Gemini can access, retrieve, and summarize content across your entire Google ecosystem. “Hey Gemini, can you identify all the leads from the email chain with our Sales team and populate them into a new Google Sheets?”
Google is also developing the next evolution of these features called AI Teammate. AI Teammate can function in multi-user environments, acting like a virtual co-worker. It can join chat groups, emails, and documents, participating just as any other employee would. AI Teammate can be assigned specific roles and objectives (and even be given a name), and it can use information from emails or files it’s added to for its responses. The key advantage is that the AI Teammate builds a collective knowledge base from shared team information, which it can then distribute to everyone. Looks like Google is naturally entering the enterprise search and knowledge discovery space.
Availability: available to consumers through Google One AI Premium and business customers through Gemini for Workspace add-on.
NotebookLM
Users can upload learning and course materials and NotebookLM instantly generates a notebook guide with a helpful summary and can also generate an FAQ and quizzes.
They’ve also prototyped a new feature with Gemini called Audio Overviews. This feature transforms your materials into engaging audio discussions. You can provide information in any format, and Gemini will personalize and make that learning material interactive for you.
Gemini 1.5 Pro and Gemini 1.5 Pro Flash
Google unveiled a two versions of their flagship Gemini AI model: Gemini 1.5 Pro and Gemini 1.5 Pro Flash. These models are natively multimodal, capable of processing any form of input (text, images, audio, or video) and create any form of output.
Gemini 1.5 Pro, revealed earlier this year, showcases a doubled context length doubled (2M tokens, which translates to the model “remembering” approximately 1.4M words or 2 hours of video).
Gemini 1.5 Pro Flash is a lightweight, fast and cost-efficient version of the Pro model. It’s also multimodal with a context length of 1M tokens.
Availability: Gemini API available in 200+ countries. Gemini 1.5 Flash costs $0.35 per million tokens, with context caching set to launch next month. Gemini 1.5 Pro is available via waitlist to select developers building through the API.
Veo
This is Google DeepMind’s most advanced video generation model to date (serving as their answer to OpenAI’s Sora). Veo produces high quality, 1080p resolution videos over a minute long from text, images and video prompts. It can process a wide range of cinematic and visual styles. Veo also supports masked editing, allowing users to change specific areas of video with a video and text prompt.
Availability: waitlist, likely will be a couple months before consumers can use. Rumors suggest that OpenAI’s Sora will only be available by December this year.
Gems
Gems, which appear to be Google’s version of GPTs, are customized versions of Gemini that can act as personal assistive experts on any topic. Gemini Advanced subscribers can create any Gem - a gym buddy, sous chef, a coding partner. “you're my running coach, give me a daily running plan and be positive, upbeat and motivating.”
Other Ecosystem Updates:
Ask Photos: Google is upgrading photo search capabilities. Users can voice query complex requests: “show me images of my son playing with our dog”. It can also help extract information from images and video: “remind me what themes we’ve had for my daughter’s birthday parties” or “show me how my child’s swimming has progressed”
Gmail Summary: summarize email threads, search for information across your entire inbox and draft an email.
Trip Planning: Gemini gathers information from apps like Search and Maps, and takes into account your priorities and constraints. It produces a personalized vacation plan - for example, if it knows that you’re landing late in the afternoon, it will find you a nice dinner spot instead of planning a big activity.
Imagen 3: their most capable image generation model to date.
Music AI Sandbox: a suite of AI tools to transform how music can be created. Users can create new instrumental sections from scratch, transfer styles between tracks, and more.
Technical Updates:
Gemini Nano: Google’s smallest AI model that can run entirely on a device. This model is used to power features like Circle to Search, Multimodal Talkback, and Scam Detection. They are also integrating this into Google Chrome itself and Android.
Gemma 2 and PaliGemma: two new open-source models. PaliGemma is Google’s first vision-language open-source model.
RecurrentGemma: a research paper proposing a language model that can match or exceed the performance of transformer-based models while being memory efficient, promising LLM performance in resource-limited environments.
Trillium chips: Google’s 6th generation TPUs, which are 4.7x faster than their predecessors and will be available late 2024 to Cloud customers.
Early thoughts…
AI Agents
AI assistants + Gemini integrations into your entire workspace and files + models with multi-step reasoning = a potentially powerful and well-informed AI agent that can perform tasks on your behalf. Sundar Pichai said AI agents are in their early days, but the vision is that AI can complete complex multi-step tasks across Google’s entire product ecosystem.
For competitors and many early consumer agent startups, it will certainly be challenging to match Google’s vast and integrated services. Google has both sides of the equation: product AND distribution, and will continue to leverage both to maintain its commanding lead.
Consider the scale: over 2 billion people use Gmail alone, and that’s just one of their products. Google owns the vast majority of internet search. 3.4 billion people use Google Chrome as their main browser. 1 in 4 email users worldwide use Gmail. This massive distribution network positions Google uniquely to deliver and scale its AI innovations.
Announcement ≠ Delivery
It’s important to note that many of these features and announcements are rolling releases. There are numerous pre-recorded demos and waitlists for products.
Will Google deliver on their demos? While the announcements are definitely impressive, the true test will be when users get access and the products can be tested in real-world scenarios.
Common Themes
It’s interesting that both Google and OpenAI have shared features that converge on 4 common themes: (1) multimodal AI models (GPT-4o and Gemini 1.5 Pro), (2) a real-time AI assistant with multistep reasoning (GPT-4o and Project Astra), (3) customizable chatbots for specific tasks (GPTs and Gems), and (4) tutoring demos and use cases.
Although we are still in the early innings and one should always take demos with a grain of salt, this has been a huge week for AI in 2024. Two of the biggest players in the AI space are investing heavily in tangible and compelling education use cases. There’s evidently a growing recognition of how transformative this technology will be for education and learning, and we’re so excited to see what the future holds.