First question, simple, why is it
called. Dolly?
All because Dolly was the first clone of a sheep, and Dolly, the open source
large language model that we released is very, very similar.
It's almost like a clone of these existing other models, but its main
difference is it only cost us thirty dollars to produce it and using just
three machines, whereas all these other things that it's a clone of have been
using hundreds of thousands of hours and they've been trained on trillions of
trillions of documents, you know. So it's that that's that's what we call
the dolly. It also kind of sounds like double e the
product that was put out, but only I. OK.
OK, so a more narrow dataset. You've talked about the price
comparison, but think about the technology.
What are the similarities and differences between what you think is
capable with your Paul Allen vs. literally what DP is offering now?
Yeah, I think the key point. The thing that's kind of took us all by
surprise in November last year when CAC came out with this ability to have this
kind of human interaction, where it's going back and forth and reasoning with
you and that people thought you need tens of thousands of machines and lots
of money, billions. And that's the thing that we kind of did
for 30 dollars. So that's the thing that actually,
frankly, surprised us. But the key thing here is that we open
sourced it so anyone can actually use this.
Any enterprise, any organization can use that, then they can own the model
themselves where it has wide open eye. And these other models, those are
proprietary and owned by a specific company like opening.
Are you anthropic or cohere? Let's go deep, deep, deeper into the
nitty gritty, Ali. This is a years old large language
model. Al Alam, right.
Talk to me about architecture. OK.
But things are moving really fast right now.
What is the architecture? This is based on Rachel Mats, who's just
joined us at Bloomberg News. She is everything.
I really, really following her work closely.
She's basically asking me the question, is this a generative system using
transformer architecture? What is new here?
Yes, absolutely, so, yes, it is using craft from architecture, which is the
sort of secret sauce behind these lands. It is generative model.
It's just quite small and it hasn't been trained on lots of lots of documents and
not a lot of money has been spent on it. So we think that the secret sauce here
lies in this small data set that we trained it on.
So we had a data set which questions of answers of how humans like to have this
kind of dialogue that you and I are having.
And that small data set apparently is the secret sauce.
And it turns out that maybe you don't need these huge models.
Maybe the industry's been going in the wrong direction, training bigger and
bigger models and spending more and more money.
All you just need is this specific data set right.
And that's when you crack the code on this human interaction.
I tweeted that you were coming on the show and one of the audience questions
was about the ethical considerations around this.
Essentially, they're asking how data breaks approaches the ethical side of
what it's offering. Yeah, that's super important to us, of
course. We think that the best way is for the
community to collaborate on an open model that we can actually understand.
We have the source code to it rather than being locked down somewhere.
So we think overall it's super, super important.
We did research on the ethical aspects of this, but let it be open.
So open sourcing it is actually really, really critical and making sure that
every organization can have these so they can understand what they're doing
with their data and understand what their model is doing when it's
generating responses rather than its coming out of a blackbox API that
somewhere else that you don't actually have control over.
Only very few companies, only a handful have control over it.
Ali, there's just an intense interest in this space right now.
You heard us talk about free sites and it's oversubscribed IPO over in Abu
Dhabi. Well, data breaks IPO when the window
opens fairly. Yes, that's a stepping stone for us that
will pass. We're more excited.
We think over the next 10 years this stuff is gonna have a huge impact.
And there's no doubt for us the database will be an immensely successful company
on its own right over the next decade or so.
So I feel is just a stepping stone. We're not super obsessed with whether it
happens in the next six months or a year or whatever it is.
Right. We will do that when the market
conditions are appropriate. But you could do it within the next 12
months. You could be ready.
I mean, we already already right. You know, we've said we've already
shared we're over a billion revenue, over 6000 employees.
You know, we have the finances. So we're operating, you know, even
though we're private as a public company.
Final question on Dolly. If you make dollars cheaper than what's
out there, what is the risk that it opens up access to bad actors?
Yes. Look, I think the ethical aspects of
this are super important. I just think these models like Dolly are
super powerful and it can help us do things better.
They can help us make education way better, health care way better passing
way better. Right.
So they're going to be great for humanity, but bad actors can also use it
to do bad things. You know, not just Dolly, any of these
models, any machine learning, any technology can be used by the good guys
and the bad guys. And I think we need regulation and to we
need to understand how these models work.
In the best way to do that is by opening them up and having them be open sourced
rather than them being proprietary so that every company leverages technology,
because we think in every industry in the next decade, the winners are gonna
be data companies. It doesn't matter which industry
vertical, they're all going to be data and A.I.
leaders and they're gonna be leveraging this kind of technology.
So let's open it up and understand what they're doing.