Google IO 2024 Project Astra Real Time Multimodal AI Assistant is The Future Heres How it Works

Spread the love

Google, at its I/O developer conference, showed off Project Astra, a realtime AI assistant. Its job is to become a universal helper for users. Demis Hassabis, head of Google’s DeepMind, showed how this AI assistant can easily perform complex tasks like finding your glasses. As a multimodal AI, Astra claims to help in complex tasks including visualizing the world, identifying objects, locating lost objects and many more.

Google During the I/O event, Hassabis showed off Astra’s power with a demo video, in which the powerful AI model was scanning the surrounding environment through the camera and answering questions in realtime. With this, questions can be asked by pointing out things and Astra can give information about that object. For example, it is as if you opened the camera and drew a circle or arrow towards any object visible on the screen and asked ‘What is this called?’ and immediately you get a voice from the smartphone in which you describe that object. All necessary information is given about it. This is just an example. Astra can answer many complex and humorous questions in realtime.

Hassabis emphasized during the I/O the importance of AI agents that not only communicate but also execute tasks on behalf of users. Hassabis believes that the future of AI lies not just in fancy technology, but in practical use. He talked about AI agents, who do not just talk but perform many tasks on your behalf. He believes that in the future there will be different types of agents, from simple assistants to more advanced assistants, depending on your needs and your situation.

He says that Astra’s development Google Made possible by improvements to Gemini 1.5 Pro’s language model. Over the past six months, the team worked on making Astra faster and more responsive. This involved not only refining the model but also making sure everything ran smoothly on a larger scale.

Astra unveiled Gemini at this year’s I/O many announcements Is one of. Google also highlighted the 1.5 Flash AI model with Gemini 1.5 Pro, which has been developed to perform tasks like summarization and captioning faster. Additionally, Veo has also been announced, which can generate videos from text prompts. There’s also the Gemini Nano, a model designed to be used locally on a device like your phone.<!–

–>

Source link

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.