Hi, this is Wayne again with a topic “Google AI DEMOS vs REAL LIFE tests! Do they actually work? | Google I/O 2024”.
I’M here at Google IO 2024 and if you watch the keynote today, you saw Google launch a bunch of really cool new AI features. The problem is, is that when you do a launch on stage like that, you sometimes use pre-recorded demos, and even the demos that are real are happening in such a way that you know they’re not going to fail. So what I wanted to do is I wanted to go through Google IO here and see if I could try out those things for myself and see how they really work. One of the biggest things Google announced today is Project Astra, which is a new multimodal AI system.
Multimodal means that it can accept different types of input, not just text in the case of project Astra. The most notable thing about it is that it can actually accept realworld identifiers, so you could put something in front of it and it could tell you about what that object is. But how good is Project Astra? Really we went into the project, Astra booth and the first time we couldn’t film.
This is unfortunate because this demo ended up being the best of the two that we did in the first demo I went in, and I said I have a tattoo that is a Battlestar Galactica reference. If I show my tattoo to project Astra, will it be able to then identify that it is from battl Star Galactica without me, prompting it saying that it is so? What I did is, I showed it my tattoo, which says so say we all and then it identified it immediately. It said this is a battle star, galactic reference tattoo, which is really cool. So then we went to a different Booth to try it again, this time to catch it on video, but unfortunately, project Astra on the second run, didn’t properly identify the tattoo as being a Battlestar Galactica reference. Instead, it thought it was from Game of Thrones which, if you’re a battle star fan, is pretty weird. So the second time I did this, I said, are you sure it’s for Game of Thrones and it then correctly identified that it was from battl Star Galactica.
So we have a a demo here of project Astra that was both successful and not successful, but certainly represents that this was not nearly as smooth as what they showed on stage at Google iio. Next up, we wanted to check out Gemini’s, drag and drop feature for Android, so this allows you to create something in Gemini, such as an image and then just drag that image directly into your conversation. Underneath the Gemini prompt. We got to try this on an actual pixel phone and it worked really well when you generated the image.
It obviously took a little bit of time because it had to do so on Google’s Cloud. Server didn’t happen on device, but we were able to generate an image of a cat playing a guitar and then drag and drop that image directly into a email. So that was really cool and it worked just as well as we expected it to. However, it should be noted that we weren’t able to touch the phone and we also weren’t able to create our own prompt.
So it is possible. There was some trickery here, but from where we were standing, it looked really really legit and it seems like what we’re going to get when it eventually rolls out to pixel phones later this year. Something that’s coming to Gemini later this year. Is It’s ability to be context aware, so what does that mean? That means that when you pull up the Gemini overlay over something it’ll be able to understand what you’ve pulled it up over.
In this case, it’s a PDF and when you do that with a PDF it’ll have a new button. That’Ll appear that’ll say: ask this PDF. Essentially, what happens is that you hit that button and then Gemini will read that PDF and then you can ask it questions about what’s in the PDF, this is especially helpful if you had some document that was really really long like in this case, the manual for An air fryer, so what we did here is is that we pulled up the Gemini overlay over the manual and then asked the manual how to change the clock on the air fryer and pretty much instantly. It responds not only with the answer to the question, but also where we can go in the PDF to read the full context. This could be incredibly helpful for really long PDF you get or like this example shows a complicated manuals for products that you already own. We also wanted to give a look to imagine 3.
So imagine is Google’s image generation software. Think of it like mid Journey. The most notable thing about imagine 3 announced on the keynote stage is the fact that it now can create letters more accurately. This has been a big problem with image generation systems for a long time.
If you don’t believe me, go to any image generator and try to create something that has a lot of text involved and it probably will mess it up. So what Google showed off to us was the ability to create a full alphabet of text using various prompts to create something fun and unique. So what I did is, I decided I wanted to make something involving my cat Luther, which is an orange cat, and we tried to make it so that the cat was on a purple, couch or ottoman. So what it would then try to do is try to create multiple letters of the alphabet using this prompt. Unfortunately, it didn’t really work out so well.
I think our prompt was just a little too complicated because when we simplified The Prompt, it was able to do it, but even then the some of the images that it created looked a little bit wonky when it created the letter d, for example, from the cats. It made it out of like a 100 cats and when you zoomed in on the cat’s faces, you could tell that they were not quite there, so this one clearly needs a little bit more work. But the fact that Google is approaching this problem of image generation software not being able to create text accurately and reliably is a big deal.
What we’ve learned here at Google IO 2024 is that, despite the fact that Google showed off a whole slew of really cool AI features on stage at the keynote, not all of those features are actually ready for prime time now granted some of them did work really Well, but a few of them either didn’t work as well as we thought they would or didn’t work at all. So the important thing to remember is that, just because Google shows you something on stage at google.io doesn’t mean that it’s actually ready to go. Hopefully, though, these will be ready to go by the end of the year, because a lot of them are launching to the public.
Then we just have to wait and see if Google irons out all the Kinks or not. In the meantime, I will see you guys in the next article .