Sean and Greg — Biology and ML for Drug Discovery

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Greg: Evolution is one of the most interesting  aspects of informational science because   it's the ultimate bootstrap system. You've got  these letters strung together on DNA that have,   over billions of years, encoded themselves into  the most sophisticated system on the planet, and   it's everywhere around us. In theory, artificial  intelligence could look at that and understand   every piece of it the same  way that every cell does.  Lukas: You're listening to Gradient Dissent, a  show about machine learning in the real world.   I'm your host, Lukas Biewald. Today, I am talking to Greg Hannum,   the VP of AI Research at Absci, and Sean  McClain, the founder and CEO of Absci.   I'm talking with them about drug discovery and  development and manufacturing and how ML fits into   that, and that's what Absci does. This is a super  interesting conversation that I really enjoyed.  Lukas: Why don't we start with you, Sean?  Maybe you could explain to our audience   what Absci does. This might be like  explaining it to your mother or something,   right? Everyone's sort of interested in these  applications, but maybe doesn't really understand   the deep biology or really even the industry  that you're in. How do you think about that?  Sean: Yeah, it's pretty simple. We are merging  biology and AI together. One of the really   exciting aspects of our technology is that we are  able to screen or look at billions of different   drug candidates, looking at the functionality  of those drugs as well as the manufacturabilty.   That's compared to what the industry  is currently doing, is looking at drug   candidates in the tens of thousands. If you look at a protein-based sequence   like a monoclonal antibody — you're all familiar  with COVID, Lilly's antibody that came out,   that's a protein — and if you look at a protein  sequence, there is more sequence variance in an   antibody than there are atoms in the universe. What we're essentially doing is feeding in   all these billions of different data points on  the protein functionality and manufacturabilty   to ultimately be able to predict the best  drug candidate for a particular disease   or indication. Essentially our vision  is to become the Google index search of   drug discovery and biomanufacturing where we can  take patient samples, find the specific biomarker   or target for that particular disease,  and then utilize deep learning and AI   to predict the best drug candidate for  that particular target or biomarker.  All at the click of a button, and totally  changing the paradigm of healthcare and biotech,   and ultimately getting the absolute best drug  candidates to patients at truly unprecedented   speeds. It's this really exciting forefront  of, again, merging biology and AI together.  Lukas: Do you ultimately take these drugs  to market and sell them? How far do you   go in this process? Do you just invent them  and then hand them off? How does that work?  Sean: It's really a perfect marriage  of what we do and what pharma does.  Pharma's really good at being able to design  clinical trials, take the drugs through the   clinical trials, and then ultimately market  them. Where we come in is being able to assist   the pharma and biopharma companies with  actually designing and creating the drug   itself. Then we out-license it to the large  pharma to take through the clinical trials   as well as commercializing it. We get milestones and royalties   on that, which essentially, in the world of  tech, is another version of a SaaS model,   but based on the clinical trials and  ultimately the approval of the drug product.  Lukas: How far along is this? What's the drug  where you've used these techniques that's   closest to something that cures a disease? Sean: Yeah, so we have one product that   we're working on right now that is in Phase III.  They are planning on implementing our technology   post-BLA approval. We're potentially assuming the  drug gets approved. A few years away from actually   seeing that drug on the market. So that would be  our first drug candidate that would make it to   the market utilizing our technology. Lukas: What does it do?  Sean: Unfortunately due to confidentiality,  I can't disclose that, but I'm hoping here in   the very near future that we will be able  to disclose that. I will say in general,   most of the programs that we work on are either  on immuno-oncology or in infectious diseases.   But our platform's really agnostic to the types of  indications or diseases that we can go after, but   we really focus on where the industry's  focused, and a lot of that is on oncology.  Lukas: Is that because cancer's such a big deal  and so many people get it or some other reason?  Sean: Yeah, I would say that that is one of the  big diseases that the industry is focused on and   where a lot of innovation can be. Our technology  is really an enabling technology, so we take the   ideas that our pharma partners have, they're  the experts on the biology, and saying, "Hey,   we need to design a drug that has  these attributes that can do this."   We can then enable them to do that and that's  across really all diseases and indications.  Lukas: Forgive me for such basic questions, but  I'm really curious how this works. So a pharma   company would come to you and say... Is it as  simple as, "We want to cure this specific disease   and we need a molecule that cures this disease?" Do I have that right? I mean,   how does that happen? Then what do you deliver?  Is it like, "Here's a molecule," or "Here's 20 you   should try," or "Here's how we think about it?" Sean: Yeah, I mean, the simplest way of looking   at it, it's exactly how you described it. So they  come to us and say, "Hey, we have this particular   target or indication and this is the biology.  If we design a drug that has these attributes,   we think that this drug candidate  then could kill this cancer cell."  They then have to perform the animal  models and then ultimately take it   into the clinic to prove their hypothesis on  that, and we're assisting them in being able to   discover the drug candidate that  has the properties that are needed   to solve the biology problem that they  have determined is going to ultimately   cure or improve that particular disease. Lukas: When you say drug candidate,   is that literally a molecule? Sean: That is. In our case, that is a protein that   is being used as a drug. There's protein-based  drugs and then there are small molecule-based   drugs. So small molecule drugs, Advil,  Vicodin. Basically a pill in a bottle.  Then you have the protein-based drugs or  biologics, such as insulin and a lot of the   exciting monoclonal antibodies. Again, going  back to Lilly's COVID antibody or GENERON's   COVID antibody, these are all protein-based drugs.  The interesting thing with protein-based drugs is   you can't chemically synthesize it. You actually  have to make it in a living organism. That   adds more complexity to discovering these  molecules as well as manufacturing them.  Lukas: Can you predict exactly what the  protein's going to look like and then   look at it and see if it does it? Is that  all in simulation or are there surprises   when you actually try to manufacture it? Sean: Yeah, so there is a lot of surprises   that can occur. We are not to the point where we  can predict drug functionality. That's ultimately   where we're headed with all of this.  A lot of times, if you can predict   the functionality of a protein, that doesn't  necessarily mean that you can manufacture it.  So many times we see with large pharma,  they discover these really exciting novel   breakthrough protein therapies, but ultimately  can't take them to the clinic because they can't   manufacture them. You not only have to predict  the protein functionality, but you also have to   be able to predict the manufacturabilty of it  as well. We're really looking at both of those.  Really what AlphaFold has done with  being able to predict the protein   structure based off of the amino acid  sequence, where we're headed is being able to   predict the protein function or protein-protein  interaction. So it's the other side of the coin.  It was a huge breakthrough for AlphaFold for  basic research. What we're doing is going to   be a huge breakthrough in drug discovery and  biomanufacturing. Again, that's the opposite   side of the coin from what AlphaFold has done. Lukas: I want to make sure I heard you right. Did   you say you're not predicting the functionality? Sean: We are predicting the protein functionality.  Lukas: The functionality is how  it interacts with another protein?  Sean: Exactly. It's "How tight does it  bind to another protein?" Then also,   we take into consideration immunogenicity.  Is it going to react in the body   once it's administered? Then also taking  a look at the CMC or manufacturing   aspects. Is it soluble and stable? Can it  be produced at high yields? These are other   predictions that we take into account or  other attributes we take into account.  Lukas: Interesting. I want to hear more about how  this actually works, but, I guess, one question I   want to make sure that I asked you is that I saw  that you started your company in, I think, 2011,   right? It seems like ML as applied  to medicine has changed so much.  I'm curious if you started your company with  this perspective or how different it was,   and also how your perspective on machine  learning has changed as machine learning has   evolved and deep learning's come along. Sean: We did not start off as an AI   company. I would say we are very similar  to Tesla's evolution. Tesla started off   as an electric car manufacturer. They started  collecting all this data from their sensors,   built an AI team around that, and now they're a  fully autonomous self-driving car tech company.  That's a very similar evolution that Absci is on.  We started out on the biology side and engineering   E. coli to be more mammalian-like to  really shorten the development times and   decrease manufacturing costs. We then built  out this technology that allowed us to screen   billions of different E. coli cells and look  at different variants of proteins, looking at   basically the drug functionality and then also  looking at, "Can you actually manufacture this?"  We started generating all this data, billions of  different data points on the protein functionality   and the manufacturabilty. We knew that if we  could leverage that data with deep learning,   we could get to the point where we could predict  the protein functionality needed for every type of   target or indication, and that's ultimately  what led us to apply our Denovium pioneering   deep learning technology for protein engineering. But it really started off with the data. Data is   so key and we have proprietary data that no one  else has that we are then leveraging deep learning   to mine that, to get us to the point where we  can ultimately predict protein functionality.  Where we're currently at right now is being able  to leverage the data we already have and be able   to predict the best billion-member libraries we  should be screening for, for every new target and   indication we work on. Eventually, as we train the  model with more and more of our proprietary data,   the more and more predictive it's going to get. Instead of predicting a billion-member library,   it starts predicting a million, a thousand,  and then ultimately predicting the absolute   best drug candidate for a given target or  indication, looking at what modality should   it be, the affinity, low immunogenicity, all the  manufacturing attributes that you want. Right now,   it's a race to feed as much data as we possibly  can, but it all started off with the biology   technology that we had originally developed. Lukas: For you, Sean, as CEO of a company   that's not a deep learning  company, I'm curious how you first   got exposed to deep learning and what  made you think that it might be useful,   and then how you got conviction  around making these large investments   in deep learning that you're doing now. What were you seeing that made you feel   like it would work? It seems like you're more  bullish on it than maybe a lot of your peers   and I wonder where that might be coming from. Sean: I'm bullish because we have the data. Again,   it all goes back to data. We have high-quality  data on the protein functionality and   manufacturabilty. It goes back to an earlier  point that I made, which was there are   more sequence variance in an antibody than  there are atoms in the universe. There's no   screening technology that we could ever create  that would allow us to mine that big of a space.  That's really where the deep learning comes  into play, is being able to essentially sift   through all of the potential evolutionary paths  that a drug could be created in and figure out   what is that best drug candidate,  basically mine that whole search space,   and ultimately come to the point where  we're creating the best drugs for patients.  I think we've seen huge...once we've  implemented the deep learning technology,   we've already seen huge gains in terms of  yields and the types of drugs that can be   discovered when taking our data and pairing it  up with deep learning. Ultimately where I see   us going is becoming a full tech company once we  have enough data here. I'm extremely bullish on   AI and what it can do within healthcare. Lukas: It's interesting talking to you   in that we work with, I guess, a lot of pharma  companies, which I see are slightly different in   what they do than you, but it seems like their  perspective is "interested in deep learning,   but probably not at the CEO level," except  the sense that they're making, I'd say,   small or medium investments whereas you want to  transform your entire company in this direction.  Do you think that you're doing something different  than your competitors around deep learning?   Do you think that you can be  the best at this in some way?  Sean: I do think that we can be the best. I would  say that the industry is starting to understand   the benefits of what deep  learning and ML can provide.  Biotech probably doesn't have as great an  appreciation for tech and machine learning   and really what that really means, and  vice versa, that the tech industry doesn't   quite understand all that goes into biology. It's  really exciting to be able to take two industries,   two cultures, and merge them together to really  create something that's going to be hugely   impactful for patients and ultimately the world. Lukas: That's super cool. I mean, thanks for doing   an interview like this. I think this is really  great for cross-pollinating ideas. I love these.   I have a lot of maybe slightly more technical  questions. Greg, feel free to jump in if you like.  Lukas: One thing I wonder about  with ML applied to this stuff is,   do you feel like it was always a latent  possibility to successfully be able to   make these predictions that you're doing  now and it was just a matter of getting   enough data? Or do you feel like there's been  breakthroughs in machine learning, in model   architectures or something like that that have  actually made this a more practical application?  Greg: Yeah, thank you. It's a great question.  I would say that it's a little bit of both,   that there has always been potential for ML in  bio and has been very successful in the past   in some of these same indications, but it's been  limited both on the data collection side — which   is not stagnant, it's moving in incredible  ways, the same way that the AI community has,   and the AI modeling...recent advances in  large-scale architectures, transformers, a lot   of different techniques for getting these models  to converge successfully and to be very predictive   have been incredible breakthroughs as well. Essentially now I'm less concerned about the AI   holding back any sort of success as I am  about making sure that we can marry these   two communities, make sure that what is always  an intrinsically messy process of collecting   biological data is actually connected to  the inputs and outputs of that AI. Which,   as Sean will be the first to tell you, this is  a great place to be able to do that at because   a lot of that hard work of actually developing  these assays and working through that challenging   space is part of the bread and butter of Absci. Lukas: Could you give me maybe a concrete example   of an ML breakthrough that would help with this?  For example, I think of transformers as... I know   them as technology mostly for natural language  processing. I could sort of imagine how this   might apply to what you're doing, but maybe could  you walk me through some kind of architecture,   some kind of new way of doing things, and how you  framed the biology in this machine learning world?  Greg: I'll give a couple of examples  that have come over the last few years.   The biggest is related to scaling. The biological problems are necessarily complex.   Evolution is one of the most interesting aspects  of informational science because it's the ultimate   bootstrap system. You've got these letters strung  together on DNA that have, over billions of years,   encoded themselves into the most sophisticated  system on the planet. It's everywhere around   us. In theory, an artificial intelligence  could look at that and understand every   piece of it the same way that every cell does. What you need to do to connect these dots now   is in collecting enough data of different parts of  the system. Namely, you need a lot of nucleotide   data, so we need to do DNA sequencing.  But we need that from lots of different   organisms and we need to understand how they  translate into proteins, we need to understand   how those proteins act and function, what if  they bind together, how they fold together,   is an incredible number of pieces that need  to come together to see that big picture.  This is where scale becomes very important. It's  a bigger problem than some traditional ML or even   the original deep learning architectures  are capable of solving, because it simply   requires more parameters, requires more  complexity, requires better understanding.  NLP-based models and transformers in general  are really good for this domain because a lot   of what we operate on isn't sequenced space. But  I wouldn't say that they're the only approach   to this either. But those advancements in  letting us get to larger and larger models   to create the GPT-3 of DNA is something  that really gives us, for the first time,   a real handle on these challenges. Lukas: There is this trend in NLP — which   I'm much more familiar with — of models becoming  more and more black boxes. Less and less informed   maybe by linguists. I don't know if every linguist  I've had on this podcast would agree with that,   but I think broadly as the data increases and the  model complexity increases, they become more open.  Is there a similar trend in these applications,  where maybe the chemistry and physics   matters less and you just treat it as this  translation from letters to "Did the drug get   successfully produced or not?" or do you still  inject your knowledge of biology or chemistry   or physics to make the whole system work? Greg: Yeah, it's been moving in that direction,   but we're not there yet. Biology is...those  two communities still haven't fully been   united. There have been some big advancements  recently in the protein-biology space, and   the MSA transformer is a big example of  this where being able to take something that   bioinformaticians and computational biologists  have been doing for years of aligning sequences to   see what kind of patterns they share in nature can  be used as an input directly with a special kind   of architecture to let models learn from that. These sorts of biologically inspired architectures   are still coming. AlphaFold is another great  example of one where they did a number of   relatively novel techniques and combining  them together was really key to the success.   The black box approach is powerful  and I wouldn't downplay it,   but we're still plenty of room for improvement. Sean: But I think that's ultimately where we want   this to go. You can input in a target sequence  and be able to have the output be the sequence   for the drug candidate and predict all the  binding just based off the sequence itself.  We've already seen some really interesting  discoveries that have occurred from...our   deep learning model showed that we got increase  in overall yields from this protein that wasn't   necessarily classified as a chaperone, but our  deep learning model predicted that it would be.   I think these are some of the really interesting  discoveries that are going to be occurring   at a very rapid pace by bringing  the AI and biology together.  Lukas: Sean, how do you think about  investing in data collection versus   your ML team? There's maybe two  ways to improve your models.   Going out and collecting more data, which  is probably really one type of investment,   versus building up ML expertise. Do you think  about it that way and do you feel like there's   a trade-off there? How do you look at that? Sean: I think investments in both   is absolutely critical. You can't invest in one  and neglect the other. You really have to make the   strong investments in both. Right  now, a big investment of ours is,   "What is all the data that we want to be feeding  in into the models?" Looking out 10 years, are we   going to regret not collecting this piece of data?  Then how do we build our databases and scale the   amount of data that's needed in the future? How  do we collect it as quickly as we possibly can to   then hand it over to our ML team to be able  to continue to train and improve the models?  We have made huge investments in both, from the  wet lab side, the data capture, and the database   and scaling that along with the AI team. Lukas: As more of a computer scientist,   I'm definitely enamored at the idea of a  wet lab. Could you describe what happens   and what that collection process looks like? Sean: We just built out a, I think it was 88,000   square foot campus. Half of the campus is office  space and then the other half is an actual lab.   The lab is super key to what we do. It ranges  all the way from the drug discovery team all the   way down to our fermentation and purification team  that grow up the cells and ultimately purify them.  A lot of the data that we're feeding into our deep  learning models is Next Generation Sequencing data   and flow cytometry data. That's really key.  Some of the breakthroughs within NGS and the   speed at which we can process NGS data is  really enabling us to do what we do. It's   really fun to be able to grow a team that's both  on the wet lab side and then the AI and ML side.  Also, I would say an AI scientist  that understands the biology is   absolutely critical to what we do and  the talent on that side is...there is   not a lot of it out there, but we have  done a really amazing job of building out   talent that understands both aspects. Lukas: Maybe this is a stupid question,   but what goes on in a wet lab these days? Is it  like beakers full of proteins? Is it microfluidics   arrays? I don't know. How does it work? How  fast can you actually collect meaningful data?  Sean: We build these...so we start off with  building these large libraries. We work with   what's called a plasmid. It's basically  circular DNA and that encodes the drug   product. We vary that DNA to look at various  different drug candidates. In a single small test   tube, we basically take all of those billions of  different plasmids and put that into an E. coli.  It's extremely small and you look at it and be  like, "Wow, there's trillions of cells in there,"   and it's pretty incredible. Then we take all of  that, we screen it, and then ultimately we find   the drug candidate and the cell line. Then  we grow it up in big fermentation reactors.  Think of beer and brewing beer. It's essentially  big vats that are highly controlled and then you   just grow up the bugs in there and basically give  them the genetic code to make the drug candidate   and then you scale it up from there. But  yeah, it's all beakers, fermentation,   purification. You name it, we've got it. Greg: I'd add a little color to that as well,   in that from a background of somebody who  doesn't spend every day inside the wet lab,   it feels a lot like stepping into Wonka-land.  You have an amazing amount of human ingenuity   sitting on every desk, whether it's a mass  spectrometer or some sequencing technology   or...all these devices have very specific and very  incredible capabilities and a bunch of people who   know what to do with them and know how to put all  the pieces together to make this stuff happen.  Sean: It's so funny. I actually think I  don't think I've ever had anybody ask me,   "What does a wet lab do?" I was searching  for the words to describe it. I probably   did a terrible job. But it's like- Lukas: I thought it was great,   what you provided. Sean: You don't really   quite understand the magnitude until  you step in and really understand   every intricate aspect that's being done. Lukas: I remember the first time I ever went   into one of our customer's wet labs. I felt like,  "Oh, this is what I thought science was like when   I was a kid." I love it. Greg:   I'm still disappointed I don't get to show up as  a lab coat. I might just start doing that now.  Sean: Yeah. Lukas: It's funny. I never thought about this, but   we do a lot of ML experiment tracking, but I would  imagine there's a lot of parallels to tracking all   the experiments that you're doing in the lab. Do  you have software that does that? You've probably   written a lot of software to just keep track  of everything that's happening in there, right?  Sean: We've actually decided to build a lot  of this out ourselves and Jonathan Eads, who's   our VP of Data Science, he and his team are  actually working on building out a database   where we track everything internally based  off of the software that they have developed.  This is really because there is no software  solution out there that really met our needs.   We actually just got a demo of it the  other day and it's really incredible,   what it's going to allow us to do. Not only in the  data capture, but also being able to track where   programs are at in the lab, where we have  bottlenecks. I'm mean, it's really this brilliant   software that is really going to help expedite  what we currently do and to be able to capture   the data that's needed for the long-term success. Lukas: Very cool. I'm curious about how you think   about where this goes. Where do you imagine ML  taking you as you collect more data? Do you think   the whole process moves to this? Do you think you  could run clinical trials essentially in ML and   know if they're going to be successful or not? Sean: I won't say that we'll be able to run ML   for clinical trials, but the drugs  that we do design, if indeed we are   predicting the best drug candidates for various  indications, it's going to increase the overall   success rate. That in turn is going to lead to  shorter clinical trial timelines and being able to   rapidly progress new drug candidates  through, and ultimately lead to the point   where we can do personalized medicine because we  have shown that the success rates dramatically   increase and allow for that personalized medicine. But who knows? We could here in the future be able   to use ML for a clinical trial  design and prediction as well.   One of our core values here is believing in  the impossible, so I feel bad for not saying,   "Yes, ML will be able to predict clinical  trials and not actually have to go through   it." It'll be really interesting to see  what's done on that front in the future.  Lukas: What is a typical  clinical trial success rate?  Sean: Right now, it's right around 4%. Lukas: 4%.  Sean: Yeah. Lukas: But there's   different stages, right? Or how does that work? Sean: Yeah. There's three stages. You have your   Phase I, your Phase II, Phase III, and  then ultimately approval. So going from   Phase I all the way through approval,  it's about a 4% success rate.  Lukas: Wow. Sean: Yeah.  Lukas: Just as another CEO, it sounds totally  harrowing to me to have my revenue depend on a 4%   success rate process. How do you  stay sane in a market like that?  Sean: The way we structure our  revenue is one, the pharma partner   pays us to actually develop the drug candidate  and the cell line. We're getting paid for that.   Then we get paid on milestone payments as they  progress through the clinical trial. You get a   milestone payment at Phase I, Phase II, Phase  III, ultimately approval, and then royalties.  Sean: Even if a drug doesn't make it to  the clinic, you can still get paid these   milestone payments, which are 100% pure margin.  Then it's a law of large numbers. It's just   growing the number of programs you have as quickly  as you can. You ultimately get to the point where   you do get drugs approved and you get  royalties coming in for 10 to 15 years off   of that. But you grow the revenue base just  by growing the number of programs every year.  Lukas: Can you say order of magnitude how many  of these you're doing? Is it like thousands?  Sean: We currently have nine active programs  ongoing. Our goal for this year is five programs,   which we're on track for, and then  increasing those year over year.   But no, it's definitely not thousands.  It's more on the tens instead of thousands.  Lukas: Do the programs inform each other? Is this  similar to natural language where you can have one   big model and then fine tune  it on the different cases?  Greg: Yeah. That's actually a big part  of why we think this is so exciting,   is because it really is one physical  system underlying a lot of these drugs.   Creating a model that can understand this for  one drug is useful. Then for the second one,   it presumably will need less training data because  it can transfer learn what it understands about   the first one. Then you go to the third and the  fourth, and before long, as Sean was saying,   the number of shots you need on goal becomes  reduced to the point where any novel drug then   becomes a one-shot learning problem. This is exactly where we see it going.  Lukas: Is it possible for you guys to engage  with the academic community at all? I feel like   you're actually adjacent to two very different  academic cultures, right? There's the ML culture,   which I know well, but seems like it might be  tricky to share data with and then the vast   medical literature, which I know less well. Are  these communities relevant to you at all? Do you   try to do any publishing or engage in some way? Sean: Yeah, definitely. We love to engage in the   academic community and we are looking to publish  some papers here in the near future, both on the   work that we're doing, but also in collaboration  with some of the leading new academic professors   in our area. We see this as ways to continue  to validate the work that we're doing and   improve the science that we have and  leverage domain expertise that we don't have.   The academic community for us is really essential  to the work that we do. We very much foster those   partnerships and collaborations. Lukas: Cool.   Well, I know a lot of ML practitioners that  I think would be interested in working in   your domain. Can you say anything  about what you look for in hiring   an ML practitioner that might be different  than, I don't know, a Google or an OpenAI?  Greg: I can speak to some of what we've looked  for on our team and what we continue to look for   going forward. There's a lot of the strengths  that naturally come from the AI community that   we like to keep going forward. The way that  we think about problems, the way that the...   how we understand the implementation  details. As you know, AI can be tricky   to execute on both the compute and the setup  and understanding all the different systems   and software that goes into that. But on the totally different side,   you have all the biological complexity and it's  an entirely different field to be learning...you   need a whole other degree to learn about  all the complexities that come from that.   Lab scientists and the close relationship  with them is an important piece there.  I guess what I'm trying to get at is that it's  that capability to learn, because there's so few   people who naturally are in both spaces anyways.  So it's a capability to learn, the patience and   the rigor to go through and understand all sides  of the problem, and how to make an impact therein.  It's never as easy as a lot of AI  problems often are where it's like,   "Here's your inputs, here are your outputs.  Now, maximize some scoring function."   It's a lot trickier than that. The scientists  live that day to day. To some extent, it's like,   "Well, welcome to our world." And that's great  because it means that when...we can also say,   "This is how AI can address these challenges. It  can help clean up that noise. We can help better   understand what's going on with this process, and  then, yes, ultimately build systems that speed up   and maybe even replace a lot of these processes." Lukas: Sean, I guess in that vein, as you have   transitioned from not doing a lot of machine  learning to really making this heavy investment   in machine learning and building out these  teams, have there been any kind of unexpected   cultural issues or team issues that you've  had to work through that might have happened   because of adding all these ML nerds? Sean: Yeah, I think that it's having   everyone recognize that by combining both ML with  biology and the lab scientists, that it ultimately   is getting to our vision quicker and that it  ultimately is impacting patients' lives in ways   that we couldn't do without combining it together. I think the first thought is, "Oh, my gosh, Sean,   you're bringing in all these AI and ML  experts. Are they just going to automate   my job away and they're going to be able to  predict everything and there is going to be   no need for me?" It's like, "Absolutely not." Biology is so complex. We have so many   problems to solve. Once we solve one  problem with AI and we have the data,   we then need the biology and wet lab expertise  then to solve the next problem and the next   problem after that. It's never going to go away. You need both. At the end of the day, you can't   stop the wet lab and the biology side because  that's what feeds the data and both are absolutely   critically important. I just love the different  perspectives that both sides bring to the table   to make our company the best it possibly can be. Lukas: It sounds like a lot of fun. Have you   gotten any questions from your ML  team where you're just like, "Man,   we're just miles apart here," like you  just don't understand what we're doing?  Sean: No, I think honestly everyone has really  done a great job of understanding the other   side's perspective. Sometimes the AI team may not  be getting data as quickly as they would like, but   then they dive in with the scientists and they're  like, "Oh, I understand you ran into this problem.   Can we work together to increase the throughput?" Or it's like, "Hey, I gave you all this data. I'm   not seeing any improvements yet. When are we going  to start seeing improvements from our AI models?"   I think it creates patience and collaboration and,  I think, a respect for each other's part that they   play in the overall bigger picture. Lukas:   Greg, do you agree with this?  Should I ask you separately?  Greg: No, no. I think you nailed it. You started  by saying it's exciting and I couldn't agree more.   It's an opportunity of a lifetime to be at  the intersection of something like this.   It's wonderful to see such smart  people and such talented people   who are respected in their own field and then  coming together. There's something very humbling   always to be on the other side of things and  realizing, "Wow, there's always more to learn."   It's very healthy, as Sean said. It does give  you a greater sense of context and perspective.  Lukas: We always end with two questions, and I  think you both are coming from super different   perspectives, but I'd love to hear both of  your answers to this. One question we always   end with is what's a topic in ML that you  feel is underrated versus its impact? I mean   this very broadly. I mean, I guess, Sean, what  skills do you feel like people should be showing   up with that they're not, maybe? Sean: When folks come to Absci,   we're solving very big complex problems. Our  mantra and our number one value is there for a   reason, which is believe in the impossible. We are  always looking for people that are wanting to push   the limits on both the AI side as well as the  biology side and really bringing that together.   We are creating this new ecosystem that really  hasn't existed and this understanding of   what ML can do for biology and vice versa. We just want to bring in people that want to think   about things differently and change paradigms.  I'm super excited about where the future lies   with AI and biology together and we're really  on the forefront of that. Yeah, couldn't be more   excited about where the industry's headed. Greg: All right. Yeah, I guess I'll give my   different take here on what's the underappreciated  side of ML. I'd say that it definitely has some   appreciation, but could be higher, is the  capability of deep learning and artificial   intelligence to do integrative work. We see an awful lot of research solving   specific problems, often hard problems, and they  compete against each other on performance scores   and evaluation. But the real value, I think, in  the practical world for AI is how well it ties   different kinds of information together. We use this at Absci in trying to collect   dozens of different kinds of assays and we can  understand, "All right, in context for just one   of them, this is a spreadsheet of data. It's not  even that large. But maybe if I relate that to the   embedding space projection of a different model  that was trained on a different task, it can tell   me something useful about the problem that I'm  working on now." This is a philosophy that we're   big proponents of, of integrating large multitask  systems that can leverage the commonalities in the   data and understand...putting them all together. This is an advantage, not just that you get   to use all your data on hand, that you get  information on that, but it also creates a   simplicity to everything where instead of having  to run all these different pieces, you can ask   from maybe one piece of data what  the other pieces would look like.  You can take a lot of what might be, let's say,  in the case of bioinformatics, we have a lot of   computational tools for understanding protein  function. You can run dozens of these different   tools and try to get them all to work together  and set up your environments or you can have one   AI model that knows these answers and can give  it to you in a millisecond. Full appreciation of   how well it can simplify problems and bring  different kinds of problems together is something   that I think could use more appreciation on. Lukas: This really works for you. I mean,   I feel like a lot of people talk about this  multitask learning and combining problems, but   it's always felt a little theoretical to me. Do  you actually find that it meaningfully helps on   tasks to incorporate data from other tasks? Greg: Oh, absolutely. This was a big part of   what we did at Denovium, was with taking our DNA  models and protein models, tying them together,   two entirely different domains of data. But it  allowed us to...users could essentially take   a DNA sequence and then just one artificial  intelligence model, find all of the proteins,   what do they do, characterize them with  700,000 different labels. Very multitask.  We had something like 25-some odd different  databases that were all tied together in   different...it essentially had to multitask  quite a bit to solve those challenges. But   it both worked and it really sped up the  progress of what we could do with it, as well   as allowed some really unconventional approaches. So Sean was talking earlier about the chaperone   discovery work where we could use these protein  models to understand what a protein would do if   it otherwise hadn't been understood by science.  These sorts of models, because they're generalized   over so many different kinds of tasks, were not  burdened with memorization and they can say,   "Oh, yeah. Well, hey, look, this looks an  awful lot like this. It should do this,"   and we can trust it to step outside its box. Lukas: Is there any paper or something that   you could point people to who  want to learn more about this?   Have you been able to publish any of this work? Greg: There is a legacy work that was somewhat of   a precursor to it. We can pull up the paper later. Lukas: That'd be awesome.  Greg: Yeah. Lukas: Cool. We'll put it in the notes.  Lukas: Our final question, and Sean, I'm  really curious to get your take on this one.   Sean, you've been super positive about the  promise here, but you guys are actually   doing ML and trying to get real results, and so  I'm sure that you're running into problems. What   has been the biggest unexpected problem trying  to go from this idea of something you want to do   to actually making it really work in reality? Sean: Oh, man, there's problems every which way.   I would say first, it's actually convincing  the scientific community and our partners that   deep learning and AI is the future and  showing them work and showing that this can   actually happen. That's first hurdle. Then I would say, the other biggest   hurdle and challenge that we've had  to work through is being able to   develop the technologies that get us the data  — get us the data in a clean format — and then   scaling that data and then building out  a world-class AI team. Greg and Ariel and   myself with Matthew, we're always looking for  the best talent and how do we bring them in.  But as you know, as a fellow co-founder,  it's like once you think things are   going well, you're always thrown in off the  deep end and going in another path and having   to solve another problem. It's continuously  problem solving it, but that's the fun of it.   We've made so much progress and we're going  to continue. I think that's just so much of   the fun of growing a company and doing what we do. Lukas: Greg, anything you want to add of   unexpected hurdles along the way? Greg: Unexpected hurdles? I mean,   that's every day. Lukas: Well, give me one. Give   me one real story from the trenches. Greg: Oh, let's see. What's a good one that we've   discovered recently? It's always getting back  to the fact that biological data is messy and   a lot of scientists are exceptional at what they  do, but things come back that you're surprised at.  For example, we assemble these plasmids, these  long stretches of DNA that are a circle that   can essentially convey various information  about how to construct the drug and how to   manufacture at scale. A lot of the technology  that we're developing is trying to say, "Okay,   if you put in this sequence, it will do this.  If you put in this sequence, it'll do that."  In the process of building the precursors for that  — I'm not going to credit deep learning here, just   credit the infrastructure development underneath  that — we discover, "Oh, hey, in some of our   assays, whole sections of the DNA have just been  cut out and have been looped together into a   smaller shape. What's going on with that?" This was nobody's plan. Your AI is not   going to say, "Wow, that was a really  interesting phenomenon. You should go..."   These are the sorts of things where it is that  collaboration environment where an AI scientist   can, even just in the process of getting things  ready for ingestion to an AI, can really make sure   that all the data is together and understood  and a lot of these things are overcome. Then,   of course, on top of it, now you get the insights  of, okay, now for the ones that are together,   what do we see here? What is interesting? Sean: I think it all goes back to the hardest part   that we deal with, is the biology. We can predict  these billion-member plasmid libraries to build,   but it could take us a week to build to it  or it could take us two months depending   on the complexity of it and we just don't know  because it's biology. It keeps it interesting.  Lukas: Well, awesome. Thanks so much for  your time, guys. This was really fun.   Really appreciate it. Sean: Thanks so much, Lukas.  Greg: Thank you. Lukas:   If you're enjoying these interviews and you  want to learn more, please click on the link   to the show notes in the description where you can  find links to all the papers that are mentioned,   supplemental material, and a transcription that  we worked really hard to produce. So check it out.
Info
Channel: Weights & Biases
Views: 1,946
Rating: undefined out of 5
Keywords:
Id: -CVJZQa-lvc
Channel Id: undefined
Length: 55min 25sec (3325 seconds)
Published: Thu Dec 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.