Okay. So OpenAI had this spring update and
out of that came the GPT 4o model with the, O, they're standing for Omni. The first and probably one of the
biggest things about this is much more of a fully multimodal model
than what they previously had. even with the GPT V models or the
vision models, they were limited to basically just taking images in. With this new model not only are you
able to put text in and get text out, you can put images in, get texts out. You can also even do things
like voice in akin to what, Google's Gemini models, can do. And they're also even precursors of being
able to do image in image out . And also even things like 3D out, with this. So OpenAI showed a bunch of things
that were interesting around productizing this model with the
key thing, being that they're making the play to actually make this best
model available to people for free. so that's a huge, change, in that
now people will be able to experience some of the really high quality models
without having to pay the $20 a month without having to, use some kind of
API to access this kind of thing. So I think already that alone is going to
change the whole sort of startup scene and people who are trying to build startups on
this kind of thing when people can now use one of the best models for free in here. You've got to also wonder what's that can
actually mean for their business model. Will, people who are paying the $20
a month, keep paying that $20 a month when they realized that perhaps they
didn't need as many calls as they thought, and they can actually get
away with the free tier in here. another thing that they introduced
alongside this model was a whole new user interface, and this is the whole
thing around the desktop app and using the desktop app to be able to do things. Now, this is a great way for OpenAI to
basically get some training data from people as to what they're actually doing
on their desktop, what they want to use a language model for on their desktop. And you've got to think that
this is like a precursor to some of the more advanced agents. and taking on some startups, like
multi on where it can basically access your web browser and automate
things on your web browser in here. It's not too much of a jump to imagine
that we go from this to either having a plugin or having a browser in
the actual, OpenAI app, which can be fully automated to do various
tasks for you in the near future. Now they point out that this new model
can be used just like the previous models in ChatGPT Plus for things
like analyzing data, creating charts, chatting about photos, uploading files,
doing a whole bunch of these things. And then also don't forget that over
the past, you know, couple of months they've been adding memory and much
more sort of personalization to that chat GPT plus interface there. Another benefit that comes with this model
is that now, because it can take audio in and create audio out, you've actually
got a much more powerful voice interface. So in the past OpenAI has been using a
separate sort of TTS model and ASR models. So things like their whisper
model for doing the transcription. And then their TTS model, which
they've made available for certain, uses in the API, et cetera. but they certainly haven't made
those models, public ly, like they did with whisper, et cetera. This new way of doing it Allows
for a huge advance in the whole concepts around PROSODY in TT S. So this is where we can get it to be
much more emotional, we can get it to, basically play different emotions. we can get it to be much more dynamic
in its range, much more dynamic in the speed that it speaks at, et cetera. And right through to some of the demos
where they're showing that it can actually do singing and sometimes can even try
and harmonize with itself, as it's doing the singing So that voice is a big part. Now, unfortunately it doesn't look
like that has come to the API yet. I'd love to think that sometime in the
future that's going to come to the API. but I guess we have to wait and see for
this, but that's certainly makes this whole new model much more powerful by
giving it this user interface, which is essentially like the movie "Her". And I think you've seen probably lots
of people talking about this idea that, this is now replicating that movie of
where the lead character is talking to an AI girlfriend all the time. and being able to get lots of
information, get advice, get updated in a whole variety of different things. You've got to think that's basically
here now it's just going to come down to, how are you going to do the prompting? What are going to be the sources
of information and stuff like that. And OpenAI does seem to be
pushing still for the MyGPTs. So basically making a custom GPT,
on this, which is going to be pretty interesting to see where that goes now. So the model capabilities are
definitely outstanding from what, everything that we see in here. We can see, going through these videos of
getting it to talk to itself, getting it to basically monitor, get it to basically
describe what it's seeing in a video. Now we don't know, for example,
things like technical details, like how many times a second
is it sampling from that video? in the live demos that they showed on the
live stream, it definitely seemed to get certain things wrong where when the person
had accidentally had the back camera on. And had a split second of video of
the wooden table and then transferred it to their face, the model was
still talking about the wooden table. So that kind of thing it makes me think
that they're probably only sampling a reasonably low amount of frames per second
just like Gemini and other models out there are doing for that kind of thing. The whole ideas of going from text
to image in here are pretty amazing. So they talked about that this
is not available currently, but this is something that will be
available in the near future here. and I think this is looking
like it's going to be stronger than the current Dall-e models. and really in some ways, why would
you have a Dall-e 4 if you can fold the whole thing in, into one model. And eventually somewhere down the
track, you could imagine that this kind of model could do something like
Sora as well by combining these things. So they've published a number
of evaluations in here. these kind of interesting to look at
and see, okay, where does it compare? to the others. One thing that I would say that
I find interesting is that they, Constantly compare this to the GPT
4T and that, even though those models are showing to be, way below both the
GPT 4T and the new GPT 4o in here. Again, this really is an example of
be careful of, model evaluations. You really want to make your own
set of evaluations, that you can try things out on and try and keep those
private, so that you know, don't make the mistake that I made last year was
that, when I was showing off all the sort of evaluations I did pretty soon
people were actually training on those and fine tuning on those so that their
models would come up and show, to be quite good at those sorts of things. The next thing that I want to talk
about is probably for me, perhaps one of the most interesting things
and things that people haven't really been talking about at all. And this is the whole
language tokenization issue. So they talk about that the
model has a new tokenizer. And that tokenizer, is a lot
better at multilingual things. so if you think back when I made a video
quite a long time back, I think last year, some time about the different tokenizers,
and showing that something like GPT 4 with the old tokenizer really couldn't do
a lot of the multilingual models because it just needed so many more tokens. and so you can see here that they've
basically got this, and showing that for a lot of these languages now,
you're getting, the number of tokens needed for the multilingual response
out has not only halved, a third or a quarter, or even like a fifth sometimes
of what it, has been in the past. And that's really interesting to look
at, you know for this going forward. of course that means then that
the outputs are actually much faster for multilingual things. But the one thing that I don't see
anyone talking about, which I think is the most interesting thing about
this is normally if you're going to go for a new tokenizer, you are going
to train a new model from scratch. And if this is a model that's being
trained from scratch, that means they've decided to either make a whole new
GPT 4 model with this new tokenizer. and perhaps we're seeing, the benefits
of all the learning that they've had on making other models and making smaller
models for testing things out . But the thing I'm not hearing anyone say
is that this not only could be, a fully new train from scratch model. This could actually be a very
early checkpoint of GPT 5. So one of the things that we're
seeing now with a lot of the companies that are training models, Is that
we're seeing things like them releasing versions 1.5 of things. So we've got, Gemini pro 1.5, we're now
Dennis Hassabis has actually publicly said that they didn't intend to actually
release that but they realized it was so good that they decided to release it. And you got to see that, this has also
happened with some of the Chinese models. where we've had the Qwen 1.5 models. just yesterday that Yi 1.5 model. and the reason is that as people are
training for a much bigger model or. a version of the model that is
trained on a lot more tokens. They're starting to realize that, oh,
okay, somewhere in the middle of there is a model much better than what we
released previously but hopefully is not going to be as good as the 2.0 version. as this is rolled out, we've got a
number of, model makers now that have got these 1.5 models, which really are
a stepping stone to the 2.0 models. Now in OpenAI's case, you're going
to think that this is a stepping stone to the GPT 5 model in here. Now, I'm not saying that for sure this
is GPT 5 this is probably, a model that they've used to basically try out some
of the ideas that they're planning to do at a much bigger scale for GPT 5. In a GPT 4 or probably actually
I would say in a smaller than original GPT 4 size, which is how we
basically get the faster responses. We get the costs going
down and stuff like that. Basically, anytime you see cost go down,
it's that way because of, compute going down with these companies at the moment. people are not going out there trying
to make huge profits and then deciding to have their profits. They're looking at how to distill the
models and then with that, be able to run a version of those models that
can get similar results out for a far less amount of compute spend in there. So for me, this is one of the most
interesting parts of this whole release. So looking at it and stepping back and
seeing okay, not only have they updated. this tokenizer. And it is very funny that they've updated
the tokenizer that just when Llama 3 has adapted the first hundred thousand
tokens out of the GPT 4 tokenizer. for that. so while Meta have made their
choices to basically be catching up. OpenAI has actually realized that
it's probably better that they go to a new tokenizer to be able to deliver
a lot of the multilingual stuff. And you could imagine that this is
a test for the GPT 5 model, which I think is probably More likely than
it actually being the sort of early checkpoint for this, but certainly we're
seeing, some really interesting new things coming out with this model here. So later in the week, I'll have
a play with this, with code. We'll talk about doing some
of the things with code. Hopefully they will also release
some of the interesting things around how to access the voice via code and
doing some of that kind of stuff. I think that's going to be really
interesting to see what third-party people actually do with that
rather than having to conform to the OpenAI way of using that. Another quick thing on the
multilingual is I'm actually currently in the Bay Area at the moment. And a number of us were testing this
model for the multilingual things at dinner tonight, and it's amazing that it
is able to basically do live translation from a variety of different languages. that alone, you got to think
totally changes the need for a lot of startups that are trying to
do these very specific use cases. And this is representative again of
OpenAI, kind of steam rolling a lot of startups as they roll out these
sort of smaller features inside of their newer models and the bigger
direction that they're going in here. Anyway, as I'm recording this,
we've got Google IO on Tuesday. we've got new models coming out there. I will be making some more videos
about that later in the week, and also make some videos coming back to this
and playing with code, With this new GPT 4o model and seeing what you can
do with it and what you can actually make out of it and stuff like that. So anyway, as always, if you've
got any comments, please put them in the comments below. any questions and stuff like that. Otherwise, I will talk
to you in the next video. Bye for now.