Greg: Evolution is one of the most interesting
aspects of informational science because it's the ultimate bootstrap system. You've got
these letters strung together on DNA that have, over billions of years, encoded themselves into
the most sophisticated system on the planet, and it's everywhere around us. In theory, artificial
intelligence could look at that and understand every piece of it the same
way that every cell does. Lukas: You're listening to Gradient Dissent, a
show about machine learning in the real world. I'm your host, Lukas Biewald.
Today, I am talking to Greg Hannum, the VP of AI Research at Absci, and Sean
McClain, the founder and CEO of Absci. I'm talking with them about drug discovery and
development and manufacturing and how ML fits into that, and that's what Absci does. This is a super
interesting conversation that I really enjoyed. Lukas: Why don't we start with you, Sean?
Maybe you could explain to our audience what Absci does. This might be like
explaining it to your mother or something, right? Everyone's sort of interested in these
applications, but maybe doesn't really understand the deep biology or really even the industry
that you're in. How do you think about that? Sean: Yeah, it's pretty simple. We are merging
biology and AI together. One of the really exciting aspects of our technology is that we are
able to screen or look at billions of different drug candidates, looking at the functionality
of those drugs as well as the manufacturabilty. That's compared to what the industry
is currently doing, is looking at drug candidates in the tens of thousands.
If you look at a protein-based sequence like a monoclonal antibody — you're all familiar
with COVID, Lilly's antibody that came out, that's a protein — and if you look at a protein
sequence, there is more sequence variance in an antibody than there are atoms in the universe.
What we're essentially doing is feeding in all these billions of different data points on
the protein functionality and manufacturabilty to ultimately be able to predict the best
drug candidate for a particular disease or indication. Essentially our vision
is to become the Google index search of drug discovery and biomanufacturing where we can
take patient samples, find the specific biomarker or target for that particular disease,
and then utilize deep learning and AI to predict the best drug candidate for
that particular target or biomarker. All at the click of a button, and totally
changing the paradigm of healthcare and biotech, and ultimately getting the absolute best drug
candidates to patients at truly unprecedented speeds. It's this really exciting forefront
of, again, merging biology and AI together. Lukas: Do you ultimately take these drugs
to market and sell them? How far do you go in this process? Do you just invent them
and then hand them off? How does that work? Sean: It's really a perfect marriage
of what we do and what pharma does. Pharma's really good at being able to design
clinical trials, take the drugs through the clinical trials, and then ultimately market
them. Where we come in is being able to assist the pharma and biopharma companies with
actually designing and creating the drug itself. Then we out-license it to the large
pharma to take through the clinical trials as well as commercializing it.
We get milestones and royalties on that, which essentially, in the world of
tech, is another version of a SaaS model, but based on the clinical trials and
ultimately the approval of the drug product. Lukas: How far along is this? What's the drug
where you've used these techniques that's closest to something that cures a disease?
Sean: Yeah, so we have one product that we're working on right now that is in Phase III.
They are planning on implementing our technology post-BLA approval. We're potentially assuming the
drug gets approved. A few years away from actually seeing that drug on the market. So that would be
our first drug candidate that would make it to the market utilizing our technology.
Lukas: What does it do? Sean: Unfortunately due to confidentiality,
I can't disclose that, but I'm hoping here in the very near future that we will be able
to disclose that. I will say in general, most of the programs that we work on are either
on immuno-oncology or in infectious diseases. But our platform's really agnostic to the types of
indications or diseases that we can go after, but we really focus on where the industry's
focused, and a lot of that is on oncology. Lukas: Is that because cancer's such a big deal
and so many people get it or some other reason? Sean: Yeah, I would say that that is one of the
big diseases that the industry is focused on and where a lot of innovation can be. Our technology
is really an enabling technology, so we take the ideas that our pharma partners have, they're
the experts on the biology, and saying, "Hey, we need to design a drug that has
these attributes that can do this." We can then enable them to do that and that's
across really all diseases and indications. Lukas: Forgive me for such basic questions, but
I'm really curious how this works. So a pharma company would come to you and say... Is it as
simple as, "We want to cure this specific disease and we need a molecule that cures this disease?"
Do I have that right? I mean, how does that happen? Then what do you deliver?
Is it like, "Here's a molecule," or "Here's 20 you should try," or "Here's how we think about it?"
Sean: Yeah, I mean, the simplest way of looking at it, it's exactly how you described it. So they
come to us and say, "Hey, we have this particular target or indication and this is the biology.
If we design a drug that has these attributes, we think that this drug candidate
then could kill this cancer cell." They then have to perform the animal
models and then ultimately take it into the clinic to prove their hypothesis on
that, and we're assisting them in being able to discover the drug candidate that
has the properties that are needed to solve the biology problem that they
have determined is going to ultimately cure or improve that particular disease.
Lukas: When you say drug candidate, is that literally a molecule?
Sean: That is. In our case, that is a protein that is being used as a drug. There's protein-based
drugs and then there are small molecule-based drugs. So small molecule drugs, Advil,
Vicodin. Basically a pill in a bottle. Then you have the protein-based drugs or
biologics, such as insulin and a lot of the exciting monoclonal antibodies. Again, going
back to Lilly's COVID antibody or GENERON's COVID antibody, these are all protein-based drugs.
The interesting thing with protein-based drugs is you can't chemically synthesize it. You actually
have to make it in a living organism. That adds more complexity to discovering these
molecules as well as manufacturing them. Lukas: Can you predict exactly what the
protein's going to look like and then look at it and see if it does it? Is that
all in simulation or are there surprises when you actually try to manufacture it?
Sean: Yeah, so there is a lot of surprises that can occur. We are not to the point where we
can predict drug functionality. That's ultimately where we're headed with all of this.
A lot of times, if you can predict the functionality of a protein, that doesn't
necessarily mean that you can manufacture it. So many times we see with large pharma,
they discover these really exciting novel breakthrough protein therapies, but ultimately
can't take them to the clinic because they can't manufacture them. You not only have to predict
the protein functionality, but you also have to be able to predict the manufacturabilty of it
as well. We're really looking at both of those. Really what AlphaFold has done with
being able to predict the protein structure based off of the amino acid
sequence, where we're headed is being able to predict the protein function or protein-protein
interaction. So it's the other side of the coin. It was a huge breakthrough for AlphaFold for
basic research. What we're doing is going to be a huge breakthrough in drug discovery and
biomanufacturing. Again, that's the opposite side of the coin from what AlphaFold has done.
Lukas: I want to make sure I heard you right. Did you say you're not predicting the functionality?
Sean: We are predicting the protein functionality. Lukas: The functionality is how
it interacts with another protein? Sean: Exactly. It's "How tight does it
bind to another protein?" Then also, we take into consideration immunogenicity.
Is it going to react in the body once it's administered? Then also taking
a look at the CMC or manufacturing aspects. Is it soluble and stable? Can it
be produced at high yields? These are other predictions that we take into account or
other attributes we take into account. Lukas: Interesting. I want to hear more about how
this actually works, but, I guess, one question I want to make sure that I asked you is that I saw
that you started your company in, I think, 2011, right? It seems like ML as applied
to medicine has changed so much. I'm curious if you started your company with
this perspective or how different it was, and also how your perspective on machine
learning has changed as machine learning has evolved and deep learning's come along.
Sean: We did not start off as an AI company. I would say we are very similar
to Tesla's evolution. Tesla started off as an electric car manufacturer. They started
collecting all this data from their sensors, built an AI team around that, and now they're a
fully autonomous self-driving car tech company. That's a very similar evolution that Absci is on.
We started out on the biology side and engineering E. coli to be more mammalian-like to
really shorten the development times and decrease manufacturing costs. We then built
out this technology that allowed us to screen billions of different E. coli cells and look
at different variants of proteins, looking at basically the drug functionality and then also
looking at, "Can you actually manufacture this?" We started generating all this data, billions of
different data points on the protein functionality and the manufacturabilty. We knew that if we
could leverage that data with deep learning, we could get to the point where we could predict
the protein functionality needed for every type of target or indication, and that's ultimately
what led us to apply our Denovium pioneering deep learning technology for protein engineering.
But it really started off with the data. Data is so key and we have proprietary data that no one
else has that we are then leveraging deep learning to mine that, to get us to the point where we
can ultimately predict protein functionality. Where we're currently at right now is being able
to leverage the data we already have and be able to predict the best billion-member libraries we
should be screening for, for every new target and indication we work on. Eventually, as we train the
model with more and more of our proprietary data, the more and more predictive it's going to get.
Instead of predicting a billion-member library, it starts predicting a million, a thousand,
and then ultimately predicting the absolute best drug candidate for a given target or
indication, looking at what modality should it be, the affinity, low immunogenicity, all the
manufacturing attributes that you want. Right now, it's a race to feed as much data as we possibly
can, but it all started off with the biology technology that we had originally developed.
Lukas: For you, Sean, as CEO of a company that's not a deep learning
company, I'm curious how you first got exposed to deep learning and what
made you think that it might be useful, and then how you got conviction
around making these large investments in deep learning that you're doing now.
What were you seeing that made you feel like it would work? It seems like you're more
bullish on it than maybe a lot of your peers and I wonder where that might be coming from.
Sean: I'm bullish because we have the data. Again, it all goes back to data. We have high-quality
data on the protein functionality and manufacturabilty. It goes back to an earlier
point that I made, which was there are more sequence variance in an antibody than
there are atoms in the universe. There's no screening technology that we could ever create
that would allow us to mine that big of a space. That's really where the deep learning comes
into play, is being able to essentially sift through all of the potential evolutionary paths
that a drug could be created in and figure out what is that best drug candidate,
basically mine that whole search space, and ultimately come to the point where
we're creating the best drugs for patients. I think we've seen huge...once we've
implemented the deep learning technology, we've already seen huge gains in terms of
yields and the types of drugs that can be discovered when taking our data and pairing it
up with deep learning. Ultimately where I see us going is becoming a full tech company once we
have enough data here. I'm extremely bullish on AI and what it can do within healthcare.
Lukas: It's interesting talking to you in that we work with, I guess, a lot of pharma
companies, which I see are slightly different in what they do than you, but it seems like their
perspective is "interested in deep learning, but probably not at the CEO level," except
the sense that they're making, I'd say, small or medium investments whereas you want to
transform your entire company in this direction. Do you think that you're doing something different
than your competitors around deep learning? Do you think that you can be
the best at this in some way? Sean: I do think that we can be the best. I would
say that the industry is starting to understand the benefits of what deep
learning and ML can provide. Biotech probably doesn't have as great an
appreciation for tech and machine learning and really what that really means, and
vice versa, that the tech industry doesn't quite understand all that goes into biology. It's
really exciting to be able to take two industries, two cultures, and merge them together to really
create something that's going to be hugely impactful for patients and ultimately the world.
Lukas: That's super cool. I mean, thanks for doing an interview like this. I think this is really
great for cross-pollinating ideas. I love these. I have a lot of maybe slightly more technical
questions. Greg, feel free to jump in if you like. Lukas: One thing I wonder about
with ML applied to this stuff is, do you feel like it was always a latent
possibility to successfully be able to make these predictions that you're doing
now and it was just a matter of getting enough data? Or do you feel like there's been
breakthroughs in machine learning, in model architectures or something like that that have
actually made this a more practical application? Greg: Yeah, thank you. It's a great question.
I would say that it's a little bit of both, that there has always been potential for ML in
bio and has been very successful in the past in some of these same indications, but it's been
limited both on the data collection side — which is not stagnant, it's moving in incredible
ways, the same way that the AI community has, and the AI modeling...recent advances in
large-scale architectures, transformers, a lot of different techniques for getting these models
to converge successfully and to be very predictive have been incredible breakthroughs as well.
Essentially now I'm less concerned about the AI holding back any sort of success as I am
about making sure that we can marry these two communities, make sure that what is always
an intrinsically messy process of collecting biological data is actually connected to
the inputs and outputs of that AI. Which, as Sean will be the first to tell you, this is
a great place to be able to do that at because a lot of that hard work of actually developing
these assays and working through that challenging space is part of the bread and butter of Absci.
Lukas: Could you give me maybe a concrete example of an ML breakthrough that would help with this?
For example, I think of transformers as... I know them as technology mostly for natural language
processing. I could sort of imagine how this might apply to what you're doing, but maybe could
you walk me through some kind of architecture, some kind of new way of doing things, and how you
framed the biology in this machine learning world? Greg: I'll give a couple of examples
that have come over the last few years. The biggest is related to scaling.
The biological problems are necessarily complex. Evolution is one of the most interesting aspects
of informational science because it's the ultimate bootstrap system. You've got these letters strung
together on DNA that have, over billions of years, encoded themselves into the most sophisticated
system on the planet. It's everywhere around us. In theory, an artificial intelligence
could look at that and understand every piece of it the same way that every cell does.
What you need to do to connect these dots now is in collecting enough data of different parts of
the system. Namely, you need a lot of nucleotide data, so we need to do DNA sequencing.
But we need that from lots of different organisms and we need to understand how they
translate into proteins, we need to understand how those proteins act and function, what if
they bind together, how they fold together, is an incredible number of pieces that need
to come together to see that big picture. This is where scale becomes very important. It's
a bigger problem than some traditional ML or even the original deep learning architectures
are capable of solving, because it simply requires more parameters, requires more
complexity, requires better understanding. NLP-based models and transformers in general
are really good for this domain because a lot of what we operate on isn't sequenced space. But
I wouldn't say that they're the only approach to this either. But those advancements in
letting us get to larger and larger models to create the GPT-3 of DNA is something
that really gives us, for the first time, a real handle on these challenges.
Lukas: There is this trend in NLP — which I'm much more familiar with — of models becoming
more and more black boxes. Less and less informed maybe by linguists. I don't know if every linguist
I've had on this podcast would agree with that, but I think broadly as the data increases and the
model complexity increases, they become more open. Is there a similar trend in these applications,
where maybe the chemistry and physics matters less and you just treat it as this
translation from letters to "Did the drug get successfully produced or not?" or do you still
inject your knowledge of biology or chemistry or physics to make the whole system work?
Greg: Yeah, it's been moving in that direction, but we're not there yet. Biology is...those
two communities still haven't fully been united. There have been some big advancements
recently in the protein-biology space, and the MSA transformer is a big example of
this where being able to take something that bioinformaticians and computational biologists
have been doing for years of aligning sequences to see what kind of patterns they share in nature can
be used as an input directly with a special kind of architecture to let models learn from that.
These sorts of biologically inspired architectures are still coming. AlphaFold is another great
example of one where they did a number of relatively novel techniques and combining
them together was really key to the success. The black box approach is powerful
and I wouldn't downplay it, but we're still plenty of room for improvement.
Sean: But I think that's ultimately where we want this to go. You can input in a target sequence
and be able to have the output be the sequence for the drug candidate and predict all the
binding just based off the sequence itself. We've already seen some really interesting
discoveries that have occurred from...our deep learning model showed that we got increase
in overall yields from this protein that wasn't necessarily classified as a chaperone, but our
deep learning model predicted that it would be. I think these are some of the really interesting
discoveries that are going to be occurring at a very rapid pace by bringing
the AI and biology together. Lukas: Sean, how do you think about
investing in data collection versus your ML team? There's maybe two
ways to improve your models. Going out and collecting more data, which
is probably really one type of investment, versus building up ML expertise. Do you think
about it that way and do you feel like there's a trade-off there? How do you look at that?
Sean: I think investments in both is absolutely critical. You can't invest in one
and neglect the other. You really have to make the strong investments in both. Right
now, a big investment of ours is, "What is all the data that we want to be feeding
in into the models?" Looking out 10 years, are we going to regret not collecting this piece of data?
Then how do we build our databases and scale the amount of data that's needed in the future? How
do we collect it as quickly as we possibly can to then hand it over to our ML team to be able
to continue to train and improve the models? We have made huge investments in both, from the
wet lab side, the data capture, and the database and scaling that along with the AI team.
Lukas: As more of a computer scientist, I'm definitely enamored at the idea of a
wet lab. Could you describe what happens and what that collection process looks like?
Sean: We just built out a, I think it was 88,000 square foot campus. Half of the campus is office
space and then the other half is an actual lab. The lab is super key to what we do. It ranges
all the way from the drug discovery team all the way down to our fermentation and purification team
that grow up the cells and ultimately purify them. A lot of the data that we're feeding into our deep
learning models is Next Generation Sequencing data and flow cytometry data. That's really key.
Some of the breakthroughs within NGS and the speed at which we can process NGS data is
really enabling us to do what we do. It's really fun to be able to grow a team that's both
on the wet lab side and then the AI and ML side. Also, I would say an AI scientist
that understands the biology is absolutely critical to what we do and
the talent on that side is...there is not a lot of it out there, but we have
done a really amazing job of building out talent that understands both aspects.
Lukas: Maybe this is a stupid question, but what goes on in a wet lab these days? Is it
like beakers full of proteins? Is it microfluidics arrays? I don't know. How does it work? How
fast can you actually collect meaningful data? Sean: We build these...so we start off with
building these large libraries. We work with what's called a plasmid. It's basically
circular DNA and that encodes the drug product. We vary that DNA to look at various
different drug candidates. In a single small test tube, we basically take all of those billions of
different plasmids and put that into an E. coli. It's extremely small and you look at it and be
like, "Wow, there's trillions of cells in there," and it's pretty incredible. Then we take all of
that, we screen it, and then ultimately we find the drug candidate and the cell line. Then
we grow it up in big fermentation reactors. Think of beer and brewing beer. It's essentially
big vats that are highly controlled and then you just grow up the bugs in there and basically give
them the genetic code to make the drug candidate and then you scale it up from there. But
yeah, it's all beakers, fermentation, purification. You name it, we've got it.
Greg: I'd add a little color to that as well, in that from a background of somebody who
doesn't spend every day inside the wet lab, it feels a lot like stepping into Wonka-land.
You have an amazing amount of human ingenuity sitting on every desk, whether it's a mass
spectrometer or some sequencing technology or...all these devices have very specific and very
incredible capabilities and a bunch of people who know what to do with them and know how to put all
the pieces together to make this stuff happen. Sean: It's so funny. I actually think I
don't think I've ever had anybody ask me, "What does a wet lab do?" I was searching
for the words to describe it. I probably did a terrible job. But it's like-
Lukas: I thought it was great, what you provided.
Sean: You don't really quite understand the magnitude until
you step in and really understand every intricate aspect that's being done.
Lukas: I remember the first time I ever went into one of our customer's wet labs. I felt like,
"Oh, this is what I thought science was like when I was a kid." I love it.
Greg: I'm still disappointed I don't get to show up as
a lab coat. I might just start doing that now. Sean: Yeah.
Lukas: It's funny. I never thought about this, but we do a lot of ML experiment tracking, but I would
imagine there's a lot of parallels to tracking all the experiments that you're doing in the lab. Do
you have software that does that? You've probably written a lot of software to just keep track
of everything that's happening in there, right? Sean: We've actually decided to build a lot
of this out ourselves and Jonathan Eads, who's our VP of Data Science, he and his team are
actually working on building out a database where we track everything internally based
off of the software that they have developed. This is really because there is no software
solution out there that really met our needs. We actually just got a demo of it the
other day and it's really incredible, what it's going to allow us to do. Not only in the
data capture, but also being able to track where programs are at in the lab, where we have
bottlenecks. I'm mean, it's really this brilliant software that is really going to help expedite
what we currently do and to be able to capture the data that's needed for the long-term success.
Lukas: Very cool. I'm curious about how you think about where this goes. Where do you imagine ML
taking you as you collect more data? Do you think the whole process moves to this? Do you think you
could run clinical trials essentially in ML and know if they're going to be successful or not?
Sean: I won't say that we'll be able to run ML for clinical trials, but the drugs
that we do design, if indeed we are predicting the best drug candidates for various
indications, it's going to increase the overall success rate. That in turn is going to lead to
shorter clinical trial timelines and being able to rapidly progress new drug candidates
through, and ultimately lead to the point where we can do personalized medicine because we
have shown that the success rates dramatically increase and allow for that personalized medicine.
But who knows? We could here in the future be able to use ML for a clinical trial
design and prediction as well. One of our core values here is believing in
the impossible, so I feel bad for not saying, "Yes, ML will be able to predict clinical
trials and not actually have to go through it." It'll be really interesting to see
what's done on that front in the future. Lukas: What is a typical
clinical trial success rate? Sean: Right now, it's right around 4%.
Lukas: 4%. Sean: Yeah.
Lukas: But there's different stages, right? Or how does that work?
Sean: Yeah. There's three stages. You have your Phase I, your Phase II, Phase III, and
then ultimately approval. So going from Phase I all the way through approval,
it's about a 4% success rate. Lukas: Wow.
Sean: Yeah. Lukas: Just as another CEO, it sounds totally
harrowing to me to have my revenue depend on a 4% success rate process. How do you
stay sane in a market like that? Sean: The way we structure our
revenue is one, the pharma partner pays us to actually develop the drug candidate
and the cell line. We're getting paid for that. Then we get paid on milestone payments as they
progress through the clinical trial. You get a milestone payment at Phase I, Phase II, Phase
III, ultimately approval, and then royalties. Sean: Even if a drug doesn't make it to
the clinic, you can still get paid these milestone payments, which are 100% pure margin.
Then it's a law of large numbers. It's just growing the number of programs you have as quickly
as you can. You ultimately get to the point where you do get drugs approved and you get
royalties coming in for 10 to 15 years off of that. But you grow the revenue base just
by growing the number of programs every year. Lukas: Can you say order of magnitude how many
of these you're doing? Is it like thousands? Sean: We currently have nine active programs
ongoing. Our goal for this year is five programs, which we're on track for, and then
increasing those year over year. But no, it's definitely not thousands.
It's more on the tens instead of thousands. Lukas: Do the programs inform each other? Is this
similar to natural language where you can have one big model and then fine tune
it on the different cases? Greg: Yeah. That's actually a big part
of why we think this is so exciting, is because it really is one physical
system underlying a lot of these drugs. Creating a model that can understand this for
one drug is useful. Then for the second one, it presumably will need less training data because
it can transfer learn what it understands about the first one. Then you go to the third and the
fourth, and before long, as Sean was saying, the number of shots you need on goal becomes
reduced to the point where any novel drug then becomes a one-shot learning problem.
This is exactly where we see it going. Lukas: Is it possible for you guys to engage
with the academic community at all? I feel like you're actually adjacent to two very different
academic cultures, right? There's the ML culture, which I know well, but seems like it might be
tricky to share data with and then the vast medical literature, which I know less well. Are
these communities relevant to you at all? Do you try to do any publishing or engage in some way?
Sean: Yeah, definitely. We love to engage in the academic community and we are looking to publish
some papers here in the near future, both on the work that we're doing, but also in collaboration
with some of the leading new academic professors in our area. We see this as ways to continue
to validate the work that we're doing and improve the science that we have and
leverage domain expertise that we don't have. The academic community for us is really essential
to the work that we do. We very much foster those partnerships and collaborations.
Lukas: Cool. Well, I know a lot of ML practitioners that
I think would be interested in working in your domain. Can you say anything
about what you look for in hiring an ML practitioner that might be different
than, I don't know, a Google or an OpenAI? Greg: I can speak to some of what we've looked
for on our team and what we continue to look for going forward. There's a lot of the strengths
that naturally come from the AI community that we like to keep going forward. The way that
we think about problems, the way that the... how we understand the implementation
details. As you know, AI can be tricky to execute on both the compute and the setup
and understanding all the different systems and software that goes into that.
But on the totally different side, you have all the biological complexity and it's
an entirely different field to be learning...you need a whole other degree to learn about
all the complexities that come from that. Lab scientists and the close relationship
with them is an important piece there. I guess what I'm trying to get at is that it's
that capability to learn, because there's so few people who naturally are in both spaces anyways.
So it's a capability to learn, the patience and the rigor to go through and understand all sides
of the problem, and how to make an impact therein. It's never as easy as a lot of AI
problems often are where it's like, "Here's your inputs, here are your outputs.
Now, maximize some scoring function." It's a lot trickier than that. The scientists
live that day to day. To some extent, it's like, "Well, welcome to our world." And that's great
because it means that when...we can also say, "This is how AI can address these challenges. It
can help clean up that noise. We can help better understand what's going on with this process, and
then, yes, ultimately build systems that speed up and maybe even replace a lot of these processes."
Lukas: Sean, I guess in that vein, as you have transitioned from not doing a lot of machine
learning to really making this heavy investment in machine learning and building out these
teams, have there been any kind of unexpected cultural issues or team issues that you've
had to work through that might have happened because of adding all these ML nerds?
Sean: Yeah, I think that it's having everyone recognize that by combining both ML with
biology and the lab scientists, that it ultimately is getting to our vision quicker and that it
ultimately is impacting patients' lives in ways that we couldn't do without combining it together.
I think the first thought is, "Oh, my gosh, Sean, you're bringing in all these AI and ML
experts. Are they just going to automate my job away and they're going to be able to
predict everything and there is going to be no need for me?" It's like, "Absolutely not."
Biology is so complex. We have so many problems to solve. Once we solve one
problem with AI and we have the data, we then need the biology and wet lab expertise
then to solve the next problem and the next problem after that. It's never going to go away.
You need both. At the end of the day, you can't stop the wet lab and the biology side because
that's what feeds the data and both are absolutely critically important. I just love the different
perspectives that both sides bring to the table to make our company the best it possibly can be.
Lukas: It sounds like a lot of fun. Have you gotten any questions from your ML
team where you're just like, "Man, we're just miles apart here," like you
just don't understand what we're doing? Sean: No, I think honestly everyone has really
done a great job of understanding the other side's perspective. Sometimes the AI team may not
be getting data as quickly as they would like, but then they dive in with the scientists and they're
like, "Oh, I understand you ran into this problem. Can we work together to increase the throughput?"
Or it's like, "Hey, I gave you all this data. I'm not seeing any improvements yet. When are we going
to start seeing improvements from our AI models?" I think it creates patience and collaboration and,
I think, a respect for each other's part that they play in the overall bigger picture.
Lukas: Greg, do you agree with this?
Should I ask you separately? Greg: No, no. I think you nailed it. You started
by saying it's exciting and I couldn't agree more. It's an opportunity of a lifetime to be at
the intersection of something like this. It's wonderful to see such smart
people and such talented people who are respected in their own field and then
coming together. There's something very humbling always to be on the other side of things and
realizing, "Wow, there's always more to learn." It's very healthy, as Sean said. It does give
you a greater sense of context and perspective. Lukas: We always end with two questions, and I
think you both are coming from super different perspectives, but I'd love to hear both of
your answers to this. One question we always end with is what's a topic in ML that you
feel is underrated versus its impact? I mean this very broadly. I mean, I guess, Sean, what
skills do you feel like people should be showing up with that they're not, maybe?
Sean: When folks come to Absci, we're solving very big complex problems. Our
mantra and our number one value is there for a reason, which is believe in the impossible. We are
always looking for people that are wanting to push the limits on both the AI side as well as the
biology side and really bringing that together. We are creating this new ecosystem that really
hasn't existed and this understanding of what ML can do for biology and vice versa.
We just want to bring in people that want to think about things differently and change paradigms.
I'm super excited about where the future lies with AI and biology together and we're really
on the forefront of that. Yeah, couldn't be more excited about where the industry's headed.
Greg: All right. Yeah, I guess I'll give my different take here on what's the underappreciated
side of ML. I'd say that it definitely has some appreciation, but could be higher, is the
capability of deep learning and artificial intelligence to do integrative work.
We see an awful lot of research solving specific problems, often hard problems, and they
compete against each other on performance scores and evaluation. But the real value, I think, in
the practical world for AI is how well it ties different kinds of information together.
We use this at Absci in trying to collect dozens of different kinds of assays and we can
understand, "All right, in context for just one of them, this is a spreadsheet of data. It's not
even that large. But maybe if I relate that to the embedding space projection of a different model
that was trained on a different task, it can tell me something useful about the problem that I'm
working on now." This is a philosophy that we're big proponents of, of integrating large multitask
systems that can leverage the commonalities in the data and understand...putting them all together.
This is an advantage, not just that you get to use all your data on hand, that you get
information on that, but it also creates a simplicity to everything where instead of having
to run all these different pieces, you can ask from maybe one piece of data what
the other pieces would look like. You can take a lot of what might be, let's say,
in the case of bioinformatics, we have a lot of computational tools for understanding protein
function. You can run dozens of these different tools and try to get them all to work together
and set up your environments or you can have one AI model that knows these answers and can give
it to you in a millisecond. Full appreciation of how well it can simplify problems and bring
different kinds of problems together is something that I think could use more appreciation on.
Lukas: This really works for you. I mean, I feel like a lot of people talk about this
multitask learning and combining problems, but it's always felt a little theoretical to me. Do
you actually find that it meaningfully helps on tasks to incorporate data from other tasks?
Greg: Oh, absolutely. This was a big part of what we did at Denovium, was with taking our DNA
models and protein models, tying them together, two entirely different domains of data. But it
allowed us to...users could essentially take a DNA sequence and then just one artificial
intelligence model, find all of the proteins, what do they do, characterize them with
700,000 different labels. Very multitask. We had something like 25-some odd different
databases that were all tied together in different...it essentially had to multitask
quite a bit to solve those challenges. But it both worked and it really sped up the
progress of what we could do with it, as well as allowed some really unconventional approaches.
So Sean was talking earlier about the chaperone discovery work where we could use these protein
models to understand what a protein would do if it otherwise hadn't been understood by science.
These sorts of models, because they're generalized over so many different kinds of tasks, were not
burdened with memorization and they can say, "Oh, yeah. Well, hey, look, this looks an
awful lot like this. It should do this," and we can trust it to step outside its box.
Lukas: Is there any paper or something that you could point people to who
want to learn more about this? Have you been able to publish any of this work?
Greg: There is a legacy work that was somewhat of a precursor to it. We can pull up the paper later.
Lukas: That'd be awesome. Greg: Yeah.
Lukas: Cool. We'll put it in the notes. Lukas: Our final question, and Sean, I'm
really curious to get your take on this one. Sean, you've been super positive about the
promise here, but you guys are actually doing ML and trying to get real results, and so
I'm sure that you're running into problems. What has been the biggest unexpected problem trying
to go from this idea of something you want to do to actually making it really work in reality?
Sean: Oh, man, there's problems every which way. I would say first, it's actually convincing
the scientific community and our partners that deep learning and AI is the future and
showing them work and showing that this can actually happen. That's first hurdle.
Then I would say, the other biggest hurdle and challenge that we've had
to work through is being able to develop the technologies that get us the data
— get us the data in a clean format — and then scaling that data and then building out
a world-class AI team. Greg and Ariel and myself with Matthew, we're always looking for
the best talent and how do we bring them in. But as you know, as a fellow co-founder,
it's like once you think things are going well, you're always thrown in off the
deep end and going in another path and having to solve another problem. It's continuously
problem solving it, but that's the fun of it. We've made so much progress and we're going
to continue. I think that's just so much of the fun of growing a company and doing what we do.
Lukas: Greg, anything you want to add of unexpected hurdles along the way?
Greg: Unexpected hurdles? I mean, that's every day.
Lukas: Well, give me one. Give me one real story from the trenches.
Greg: Oh, let's see. What's a good one that we've discovered recently? It's always getting back
to the fact that biological data is messy and a lot of scientists are exceptional at what they
do, but things come back that you're surprised at. For example, we assemble these plasmids, these
long stretches of DNA that are a circle that can essentially convey various information
about how to construct the drug and how to manufacture at scale. A lot of the technology
that we're developing is trying to say, "Okay, if you put in this sequence, it will do this.
If you put in this sequence, it'll do that." In the process of building the precursors for that
— I'm not going to credit deep learning here, just credit the infrastructure development underneath
that — we discover, "Oh, hey, in some of our assays, whole sections of the DNA have just been
cut out and have been looped together into a smaller shape. What's going on with that?"
This was nobody's plan. Your AI is not going to say, "Wow, that was a really
interesting phenomenon. You should go..." These are the sorts of things where it is that
collaboration environment where an AI scientist can, even just in the process of getting things
ready for ingestion to an AI, can really make sure that all the data is together and understood
and a lot of these things are overcome. Then, of course, on top of it, now you get the insights
of, okay, now for the ones that are together, what do we see here? What is interesting?
Sean: I think it all goes back to the hardest part that we deal with, is the biology. We can predict
these billion-member plasmid libraries to build, but it could take us a week to build to it
or it could take us two months depending on the complexity of it and we just don't know
because it's biology. It keeps it interesting. Lukas: Well, awesome. Thanks so much for
your time, guys. This was really fun. Really appreciate it.
Sean: Thanks so much, Lukas. Greg: Thank you.
Lukas: If you're enjoying these interviews and you
want to learn more, please click on the link to the show notes in the description where you can
find links to all the papers that are mentioned, supplemental material, and a transcription that
we worked really hard to produce. So check it out.