Open for (neuro)science Symposium: Day 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
if you are having any connectivity issues we do have a couple of options for you the first being you can watch the event via our youtube live stream you can access that from the link in the chat or on the event login page that we sent to registered attendees on friday or you can visit our youtube channel directly at youtube.com allen institute you are also welcome to call in from your phone for better audio you um access that next to the mute button and the audio settings menu and you click switch to phone audio and that will help bandwidth on your computer here is a brief overview of today's agenda we'll start with some comments from our symposium committee chairs and a brief introduction to the allen brain map which is our public data portal where you can find all the resources that will be featured over the course of this symposium and the tutorials happening this week and the tutorials that we've hosted over the past couple months which are also available on our youtube channel each day we'll start the talks by hearing from some allen institute scientists on their research using our publicly available resources and then hear from some researchers at other institutions who have also used these public resources today these researchers will be talking about their work using the allen cell types database and we'll hear about some other data resources from the alan institute for brain science tomorrow and wednesday as a reminder at the end of each day of the program we will be having a tutorial on how to use the resources you'll just see featured in this symposium and learn about what the allen bray map can do for your research today's tutorial will be on the transcriptomics data from the allen cell types database this will be in a separate zoom webinar which you'll find on the same page from the alaninsuit.org login page for the webinar today for the symposium the tutorial links are there as well well we hope you can stay for the entire program understand if you have to step away a recording of today's talks will be available on youtube as well as the tutorial featured at the end of the day and now we're at the top of the hour i'm happy to turn it over to the chairs of the next generation leaders keith hanging and tom nowakowski good morning or afternoon whatever time it is um megan am i am i on yep okay great um so thank you megan uh and everybody else who helped organize this and everybody who's shown up for the open for neuroscience symposium um i'm keith hengen i'm an assistant professor of neuroscience at washington university in st louis and i'm here uh sort of co-chairing this with tom nowakowski an assistant professor of neuroscience at university of california in san francisco so tom and i are the co-chairs of the next generation leaders council at the allen institute so we've been working as outsiders with allen for the last three years and as the pandemic struck we thought we might be able to take advantage of the movement to everything being online to try to pull together a a conference or symposium that blended three major features of the allen institute that we've never seen put together in the same event and so we're very proud of this so every everybody i think here probably knows that the allen institute produces fantastic open source tools data sets analysis tools then there's also very strong sort of independent hypothesis driven science within the allen institute and then there are external researchers whose research relies upon and is sort of closely woven into the resources from the allen institute and so we want to sort of merge all three of those things together into the same symposium so that you could hear about the tools and the data sets and how those are created then we can hear about how they're deployed and used within the institute outside of the institute and then actually have workshops so that people who are interested in this research can then learn how to use those very tools that are freely available to anybody throughout the world so um with that uh i want to make a quick plug for the for the ngl it has been a fantastic opportunity for me and i think for tom and our our third co-chair here eric itri um i encourage anybody who's uh on on the trajectories of leaving for being a postdoc towards trying to start their own lab to apply for this and i want to thank everybody at the allen institute for the opportunity to have worked with you for the last three years i've learned so much and hopefully today we can kind of showcase the the sort of fantastic international thing that is the allen institute um so tom with that i'll hand it over to you because you're in charge officially today great hello everyone thanks for the introduction megan and keith and so um just to add a couple of notes here so the workshop is really organized around their resources the different types of resources that the allen brain map and the ellen institute for brain science has generated over the years and so today you're going to hear about research and resources that are sort of centered around the allen cell types database um tomorrow you will hear more about the allen brain observatory um which is which was produced by the allen institute mind scope program and finally on wednesday you will hear more about the common coordinate framework and the microns explore electron microscopy data and so if you think you might be interested in some of those resources and the research that these resources and these databases have data sets have um have really led to make sure to log in and participate not just today but also on tomorrow and on wednesday i'd like to also thank the program committee and everyone who contributed to organizing of this event and that includes in the committee was consisted of nathan gowans jeremy miller sasuke devries sharmi si shaymani and caitlin casimo and with that i just want to um make a final reminder um that before you before we get started um as megan mentioned earlier we'll take live questions and via the q a panel and please remember to preface your question um with the name of the speaker so that we can more easily handle those questions i would really like to thank our events team for facilitating the symposium uh megan whitening christina jarvis and again caitlin casimo finally um really uh special thanks to the allen paul on it's to paul allen uh who was the founder of the allen institute without whom this institute this event and much more of and much of the research wouldn't be possible and so to get us started on this showcase of research um using the data from the on brain map i'm going to now hand it over to caitlin cassimo um training and outreach specialist at the allen institute she's going to give us an introduction to these open resources and a preview of the tutorials that we'll be hosting at the end of each day of the event caitlyn over to you all right thanks tom so as he said i'm gonna give us a brief preview of uh what we're going to be learning about from the brain map from our open uh open resources so i'm caitlin casimo the training and outreach specialist at the allen institute i was on the organizing committee for this event and everything that i do at the institute is about helping users like you learn how to use our open data for research or for tj all right get the slide advanced all right so we've talked about the theme of this event but who is the allen institute the allen institute is a biological sciences nonprofit we're located in seattle and our focused research areas are not only in neuroscience but also in cell biology and immunology and we support cutting edge research across the world and across biology through the paul g allen frontiers group and across these institutes our scientific process is focused on tackling hard complex problems in foundational biology um and the approach is guided by these three core principles big team and open science which i'll return to in a moment and the work generates data knowledge and analysis tools all of which we share publicly and to date we've collected about 14 petabytes of data at the allen institute for brain science which is 2.8 million dvds worth so it's right in the name at this event we're talking about open science but what exactly does that mean it means that we share all of our data and tools for you to use in your work as soon as it's been quality control not once we're done doing all of the analyses we're going to do on it and our data sets are designed to be expansive and complex and so we're never going to do all of the possible analyses anyway and that's even before we get into ways that you can combine the allen institute's data and tools with your own all of the talks and tutorials at this event are going to feature these open data sets including the talks from the institute scientists we hope that you'll walk away feeling inspired of how you could use them the other two core principles are team science and big science big science i've already touched on 14 petabytes of data but specifically we aim to be as comprehensive and robust with that data as we can and producing big data requires big coordinated teams with varied expertise those teams include neuroscientists physicists mathematicians animal care specialists computer scientists and many more so we've been collecting and sharing our open data since day one of the institute back in 2003 so where do you find it braindashmap.org that's where most of our neuroscience data lives and you can find links to the couple of data sets that live on their own sites like the microns explorer we also encourage you to check out allencell.org which is the data portal from the allen institute for cell science for tools and resources that are useful in neuroscience as well like fluorescently tagged human-induced pluripotent stem cell lines which you can differentiate into neurons or whatever other cell type you want and image analysis tools so the four resources that we're featuring in this symposium as keith and tom said are the allen cell types database today which is a series of projects and data sets aiming to define all the cell types in the human and mouse brain characterize those types and study how they act and interact tomorrow will be the allen brain observatory functional exploration of the visual system and wednesday first we'll tackle the common coordinate framework which is an anatomical atlas and coordinate system of the mouse brain which also underlies several of our other resources as well as being a resource itself and then second on wednesday will be the microns explorer which is capturing electron microscopy images and doing detailed neuronal reconstructions and more so lost my mouse we're highlighting a few selected resources in this symposium so uh what can you do with these resources for each one we share the data organized and quality controlled and produced online viewers and we've also shared our analysis tools the protocols and methods that we use to produce the data our papers that we've produced on them and when applicable lab resources like mouse lines it's not just the data set it's also all of these other parts that go around it you should be able to replicate the methods that we used understand the methods well enough to uh analyze the data and run the analysis on your own data and if you so choose you should be able to replicate the methods yourself you don't need permission or a license to use our data for research uh the data and analysis tools and everything else we produce is free to use um all you have to do is cite it that's it uh but we love it when you tell us what you're doing so please drop us a line you can use the contact form on our website um or tweet at us or whatever um and maybe you could end up the presenter in a symposium someday so with our data you can do things like using it as the primary data for analysis um you can use data from tissues that you might not otherwise be able to access volumes of data that are out of reach for individual labs methods and techniques that might not be available to you like access to an electron microscope um you can augment and accelerate your research use these data sets as healthy controls compare cross species or use our analysis tools and lab resources and you can use them as teaching tools give your students access to methods and sample sizes that aren't practical in a teaching lab let them design their own experiments develop analysis skills and much more we have classroom resources designed for educators that tie into the open data at alleninstitute.org learn and on wednesday we're going to highlight some specific opportunities and programs at the allen institute for brain science besides working with the data like the ngl program and some more so we want you to walk away from this event feeling inspired to use these resources for your own work and for the practical questions of how to actually do that we have tutorials at the end of each day of the program which will feature the resource highlighted in the talks for that day so today that's going to be the allen cell types database starting at 11 pacific that that tutorial is specifically going to focus on the rna sequencing data and then we have the other tutorials on the other days the end of the program on each day so i hope you'll join us for these practical demos and we will be recording those as well so you can refer back to them while you're actually using the data set to get some of those details and work them into your own analyses and we also ran a series of pre-symposium tutorials for the last few months that covered the allen developing mouse and human brain atlases our modeling tools some additional resources from the cell types database and a broad overview of brainmap.org for new users to get oriented so we'll drop those a link to those in the chat you can find them on our youtube channel and they're also linked on the page where you registered for this event all right so brainmap.org let's go over there so all right so brainmap.org is where our all of our data from the lns 2 for brain science either lives or is linked from the speakers and the tutorial presenters are going to dive into this in much more detail for their specific data sets and you can check out that broad overview tutorial as well so this is the homepage of brainknap.org you can navigate the data sets here using this drop down menu or using these icons on the main page again we share all of the data once it's been processed and quality controlled and all of the analysis tools that go with them so for each data set you can view and analyze the data using browser tools right here online this is really good for getting oriented to the data preliminary explorations student projects and especially for our gene express analysis and you can download all of the data and access them programmatically through our sdk for more robust analyses so for example in the allen mouse brain atlas which is our oldest and still most popular resource you can use this online viewer to do some common analyses like searching for differential expression or just searching for the expression of a single gene i'm going to search for gene and check out that overview tutorial for more detail on how to navigate this and you can you can analyze the data which in this case is these uh images of ish showing expression of a gene across the mouse brain you can uh look at them right here in the browser or you can download them um straight from here or using our api to download a bunch at once on this page and on the landing page for each data set you'll also find documentation which provides detailed information about how the data set was collected so that you could replicate it yourself and to inform your own analyses back on the home page uh while our data sets are our most known resource under technical resources you'll also find all of our tools our sdk and api all of our analysis code on our github page mouse lines hardware schematics protocols and more these explore pages group related data sets and analysis tools by theme and link out to relevant papers and tools so you can start with one of those and find more information relevant to you i do also want to mention two additional pages uh where our data is produced so we have the microns explorer which is our electron microscopy data this is microns explorer.org um and this data is produced in collaboration with baylor and princeton and you can learn much more about that on wednesday and biccn.org includes more cell types data um and from the brain initiative self-census network of which we are a member and we're going to learn more about that in the tutorial later today so finally please remember that we want you to use these resources so if you are ever stuck please use the resources we've made available to unstick you and if the documentation on each page doesn't uh answer the question that would be the place to start please hop over to our community forum you can find it in this drop-down menu or just go straight to community.braindashmap.org search past questions ask your question and let us know what you're working on so i'm going to hand it back to tom please go ahead and post questions now or at any time during the program if you have questions about how to access the data any tools that we're talking about we want you to use them tom great thanks thank you caitlin for this great introduction to the resources and so um without further ado i would like to kick us off on today's research talks and and so today's talks will be subdivided into those that are presented from allen institute researchers and we'll hear about some of the exciting research and insights in the allen cell types database so the general goal of this program is to build a generate a cell census of the brain and that includes both a mouse uh brain and also non-human non-human primate and human brain so really looking across species and this is really exciting because if we can understand the cellular composition what cell types build the brain and perhaps this is a really good springboard for understanding and studying its the function of these cell types and so with this um it is my distinct pleasure to introduce uh basil basilica taisig who is the director of the molecular genetics and who will be presenting um the the work that she's been leading at the on cell types database basilica uh take it away hello thank you tom thank you uh organizers for um uh inviting me i think i unmuted myself but it said i should unmute myself can everybody hear me yes we can hear you okay great so um i um i'm going to talk about um basically what we need to do before we get to allen cell types database and why do we want a database and then i will highlight how we um organize our data within the database uh enable researchers to access these organize and analyze data so to start with um i thought um i should um i lost those errors do one do one extra click on extra click okay there you are okay so um we have been um um as of recently sort of um defining and redefining what the allen issue for brain science uh is and um in in sort of in alignment with our previous with our tradition uh we decided for the next uh roughly 10 years to focus aligned with our tradition on a complete accurate and permanent types of data collection that means we would like to really collect an impactful and complete data set on the brain in this particular case we're focusing on brain cell types therefore the definition is we are an institute focusing on cap data sets and experimental tools towards gaining fundamental knowledge of the identity development evolution connectivity and function of cell types why study cell types well we want to study the brain our mission is to study the mammalian brain uh human ultimately but mouse is a very important uh model and uh in order to study this very complicated system um it is very clear there that understanding building blocks is essential i mean that has this has been clear since um centuries ago times of harmonica hall um so really understanding the system means understanding its parts creating a parts list um once one creates a parts list uh one is able to have reproducible access to those same parts to study their function in the context of a tissue organ or organism um ultimately that leads to discovery uh what do these parts do how do they work together and in a context of different species or uh in the context of studying how these um cell types arise we can understand uh development evolution and then ultimately uh if um some of these cell types are subject to dysfunction how their dysfunction leads to disease how does one study building blocks of a system i would say the ultimate really the holy grail definition of cell types is in my opinion uh sets of cells that participate in the same function however um it is quite difficult to do it that way it is um at this moment i would say almost impossible therefore we resort to quantitative phenotypic measurements measurements of any type of property we can in a very systematic way so we measure morphology morphological features we measure physiological features and molecular signatures for example single cell transcriptomes what does one do when one measures many many properties in a single cell well the idea is once we measure all these properties we want to use those properties to group cells into categories we can call these categories usually to high level classes and as we go to finer and finer categories we reach types but the idea is really that taking all these properties together we will define similarities among cells and then in this particular case we'll use transcriptome similarities to group cells into buckets and then ask how similar are these buckets to each other we create a tree this is one of the representations we create a tree that says well these particular buckets some of them are very similar to each other and some of them are less similar and etc all the way to the so-called root this is a very um simplified yet convenient way to represent cell types it assumes that every cell can be put in a particular bucket which may not be entirely true but it's a very convenient simplification and the one that people are very used to seeing for example many of us almost all of us i assume have seen dendrograms that represent relatedness of let's say animal species what defining cell types using transcriptomics allows us uh is not only to group these cells into buckets based on gene expression but once we have defined taxonomy in this particular case a cortical taxonomy of two based on two cortical areas we can also this taxonomy itself has a built-in feature which is it shows us which genes are expressed in particular groups of cells and that's how we defined for example very broadly classes in this case for example classes of neurons they share many many genes that are common to neurons in this particular case di1 a very well-known neuronal gene then as we go to finer and finer divisions we see that some groups of cells share for example gabaergic markers get one some smaller groups share somatostatin and then there are some cells that share only a particular group of genes that's not expressed in any other cell type for example a particular type of cell called sst charles so with this i would like to introduce a definition of types based on a single data modality we commonly at least at the allen institute and we would like this to maybe become a general terminology we call these transcriptomic cell types the reason is that we have defined these groups based on a particular data type a particular data modality single cell transcriptomic measurements and for short we call them t types that doesn't mean that is the only way to define types and that also this before i go into other ways of defining cell types i want to mention that the simplest way of describing these groups of cells and they related nips is in fact tree but there are many other ways to describe relatedness among cell types and some of them are not as rigid as the tree why do i call tree rigid because it's very discrete and the relationships between cell types are very hierarchical on the other hand one can represent these cell type landscapes in a very continuous fashion or in a fashion that allows both discreteness and continuity some of them include representation of every single cell in a reduced dimensionality representations for example many of you have probably heard of tc and umap reduced dimensionality spaces but there are also ways to represent this in between for example with graphs that allow some non-hierarchical and non-hierarchical relationships to exist and also can incorporate a lack of discreetness so i just want to say we use many of these different ways to describe our cell type landscapes and for different purposes we we can use one or another depending on what we want to showcase and what we want to uh highlight going from transcriptomic types i would like to emphasize that single cell transcriptomics is not the only way to describe types one of the main reasons why single cell transcriptomics is employed heavily in our investigations and many others labs is because it is highly scalable our databases uh already include millions of cells and will include many many more millions in years to come one can be on transcriptomics one can measure properties in cells in different ways i can measure epigenetic properties for example chromatin accessibility projectional properties where do cells project morphological properties and electrophil physiological and also one can do individual modalities or individual data types in isolation or one can combine these modalities to measure for example transcriptomic and epigenomic properties in the same cell or measure them separately and then integrate them post data collection one can measure projectional properties together with transcriptomics within a method we call retroseq or other methods and that's what we would call for example pt types projectional transcriptomic types or ultimately what is really a very very exciting development and a data set that will be showcased on our website soon um most likely in april are the morphological electrophysiological and transcriptomic types m-e-t types from a single cell one can measure morphology electrophysiology and transcriptomics and use all these three data types to define cell types we call these met uh types so with this um i would like to just summarize that what we are trying to do at the allen institute is define building blocks of cells a building blocks of neural nervous system cells at the level of cells using mostly two different uh modalities transcriptomes uh three different realities sorry transcriptomes and morphologies and electrophysiologies but we also try to perform cross-correlation between these data modalities using multi-modal data collection patch-seek mfish and genetic tools and um i don't know why i'm not advancing maybe oh there it is maybe i'm too slow and genetic tools so there are there are ways to collect individual data sets that can have a single modality or their their ways that one can collect uh multimodal data sets for example patch seek ultimately we would like to present the neuroscience community with an integrative view of cell types that will include as many multi-modal data sets as possible and as many correlations between unimodal and multimodal data sets uh we also um have mostly focused on adult and so far on two species with introduction of a third speciesism recently mouse and humans and macaque we have started collecting some of the data and macaque will increasingly be featured in our cell types database as well as we are venturing into development again mostly mouse for the moment but uh um mouse uh human and macaque as well so what is um our standard process um towards generating these resources we first decide what type of data set we want to collect and there is a lot of thought there is a lot of um comparisons usually at the sort of r d level as we call them or um where we will take these different uh types of methods we will compare them and we'll decide which one is the best or we may actually decide to go with two so for example for transcriptomics what we have decided to do is to pursue two different data modalities one that collects single cell per well and treats a single cell as a sample that is using smartseek approach and the other one which is 10x 10x genomics where a group of cells is profiled at the same time but these individual cells are segregated in a machine that is provided by 10x genomics into individual aqueous droplets and each cell is treated as its own reaction but they're all ultimately sequenced uh together however post sequencing each sequence in greed can be analyzed um can be assigned to a particular cell once we collect these data the key aspect is to perform qc quality control and all of our data sets that will be present that are presented on the website are always um uh processed through multiple steps of quality control and in fact most of the transcriptomic datasets that you will see are analyzed the reason why the analysis is in fact essential we don't we do provide some data sets prior to analysis but in order for a user to be able to navigate the data we do analyze these data and we will provide them as part of taxonomy we publish the data on our website that's the web product one of the web products the cell types database and we also publish a company scientific papers um the datasets that will be featured today are included in the celtx database and they include electrophysiology morphology transcriptomics and models and mostly focused on mouse and human um we have these very interesting graphics that showcase what is the proportion of different types of data that we that are contained on our website and if you as you enter uh the summary of these data sets you will see that uh we have electrophysiology data we have uh different types of models and morphology data but then if you click on an icon below you will see that if you include transcriptomics data this is the disproportionate amount of transcriptomics data we have compared to all other data and the reason is transcriptomics is much more scalable transcriptomics is easier to collect and due to the nature of nucleic acid molecules and their our ability to amplify them and to do that on with new technologies on thousands of millions of cells the majority of the data we have now on the website is transcriptomic uh the original cell type space database actually featured efis and morphology data and individual uh cells were presented uh for their morphology and transcriptomics the one that is more recent and where we have actually featured different um types of products and we're ultimately now focusing on our transcriptomics explorer features now uh cells from mouse and human brain um based on profiling of two different platforms as i mentioned smart seek which is high gene detection but lower throughput uh but when we talk about gene detection uh it is really amazing for large cells for example subcortically projecting layer 5 cells we detect 10 000 genes per cell uh these cells have been as i mentioned quality controlled their transcriptomes have been quality controlled examined they have been clustered and they're presented on the website in the transcriptomics explorer what you see as you enter is not as small dendrogram as i presented before but a larger dendrogram this dendrogram now features data set integrated data sets from both smart ckit 10x so not only did we generate 10x and smartseek data but we also integrated them into the same uh reduced dimensionality space and we have integrated them at the level of clusters the dendrogram that you see on top is the dendrogram featuring all the cell types uh at this moment from our cortical data set um all regions of the cortex and hippocampus um and using our uh colors that i'm very partial to warm colors for inhibitory neurons cool colors for the excitatory neurons and then in a grayish colors all dot non-neuronal cells i would encourage any of you who is interested in single central skeptomics to visit our transcriptomics explorer and to start playing with the data take some time to get used to it but the major um uh the major aspects of the data one can learn is the arrangement of cells in the lower dimensionality spaces like the um umap space here and the representation within the dendrograms and then learning the patterns of genes and interrogating patterns of genes patterns of well-known genes like marker genes or unknown genes one can examine pattern of any gene in any groups of cells and for example here i've just highlighted expression of a particular gene in this particular group of cells in the visual cortex this is um a group of cells that is a particular type of projection neurons that express 3 gene and are named after that gene what is coming is very exciting uh as i mentioned our we are always aiming towards integrating data sets integrating data sets of different types and soon in april uh patch seek data are going to become browsable apache data will integrate transcriptomics electrophysiology and morphology on individual cells and a user will be able to browse all our gabaergic uh cells that have been recently published in an integrative uh manner uh their transcriptomics their electrophysiology and their morphology for every single cell reported in our recent gowans at all publication in cell and then longer down the road uh something that many of us are working on are what we call cell type cards we would really love for users to be able to access not only cells through these individual cells but also at a summary level a cell type card that will summarize all the features of a cell type in all different modalities what are the challenges well the main challenge is um you know as we started presenting some of the data we were working with thousands of cells and that was doable currently managing data size is becoming really not a problem only for okay data collection but once you want to install start analyzing these data and once you want to start to present them well you can't use just a simple dendrogram and just to showcase the complexity and the the challenges we're facing this is now the dendrogram this is the dendrogram for um the full cortex hippocampus taxonomy we have it is really um very difficult to navigate and we are constantly thinking of new ways to actually allow users to enter these data and to navigate them in a meaningful way so um dendrograms will become increasingly challenging we're thinking of hierarchical gender grams collapsible dendrograms collapsible umaps neighborhoods within you maps that are expandable or collapsible etc and um ultimately what do we want we would like all of you to be able to use these data uh this molecular and anatomical description that will um off of uh cell types uh built as an architecture of the mammalian bearing in your in your um work uh we hope you will be able to examine with our data sets development evolution uh use these cell types to build genetic tools to examine their role in behavior and circuit function build models of uh circuits and ultimately the whole brain then learn about changes of these cell types cell types are not static uh cell types keep changing uh during the the life of an animal uh to enable plasticity and learning and then of course um human populations are very diverse we would like to understand um how differences uh in diversity of cell types and genetic diversity uh affect uh for example propensities for uh different diseases and selective availability of certain cell types and diseases and ultimately we would like to be able to create disease models and uh enable therapies impact we would ultimately really like to enable the community to understand and treat brain diseases starting with neurodevelopmental diseases through diseases of adulthood mostly psychiatric mood and metabolic disorders all the way to diseases of neurodegeneration neurological and neurodegenerative diseases we would like to understand genetic underpinnings selective cell type variability build predictive disease models and circuit-based therapies and this is of course the dream of how our tools will be used and how our resources will be used we need you to use our resources to enable uh this dream uh to come true with this i would like to thank again paul allen none of this would be possible without his vision encouragement and support um i joined uh allen institute well almost 10 years ago and i feel extremely fortunate uh to uh have been and continue to be part of this effort and with this i would like to highlight that we indeed um do what we preach we really um practice team science and that means that um our uh our scientists uh and support staff are organized and multiple teams in multiple teams that are cross-functional we have project teams we have departments we have core services and i would just like to highlight some of them here i'm sure i will miss some um genomics uh in vitrous single cell characterization which produces patch seek data genetic tools electron microscopy and multiple core teams which all enable us to collect these data analyze them and then ultimately our technology department who generates the web tools that you get to use this program has also had amazing leadership hong koi zheng and adleen has been supported by allen institute for brain science but we also have received very generous support from the nih especially from the brain initiative and i want to just highlight the three people from the uh from the institute nick rohan and rebecca who will be giving you talks on specific aspects or specific types of data um and potentially new features that we would like to incorporate into our data sets and tools that will be available on our website and with this i will end my talk and i'm ready for some questions okay i already see some questions so let's see q a question from zinnia should i just read the question caitlyn can you help me should i just read the question i forgot what was done yeah okay so you can go ahead and read them and and eric and keith might have more questions for you too okay so the first question do you expect to identify more transcriptomic cell type as or if you expand the sample size or single cell sequence yes yes i hope at some point we will reach some level of saturation um but i do and i also expect as the measurements become more accurate what do i mean by more accurate more genes are measured more isoforms are measured i expect that finer and finer divisions will be detected so um i um i don't think we're there yet i think we can say the divisions we observe we can say that some of the divisions and some of the groupings of cells can be detected very reliably but i'm pretty sure there are some that we are not detecting reliably and that we will be detecting in the future okay let's see more questions um would it be interesting oh there are more questions going whoa whoa whoa they're zooming in would it be interesting to have celtics represented as a spectrum a blended rainbow of the taxonomy tree is these are ultimately transcriptomic cell types which are based on stringent statistical tool and as you mentioned tell types change in our lifetime yes yes yes yes yes to all of these questions initially when we published our first paper or when we were trying to publish our first paper and i insisted that i don't want buckets i want buckets with some spillover it was extremely actually challenging to present this because discreteness is really seductive people love discreetness we like clarity we like simplicity and um even then the first p the first single cell transcript newspaper were published in 2016 we actually had resistance both discussions internally but also from the reviewers why are we providing this fuzziness to our cell types why are we not just clearly claiming that a cell type is a cell type and i think the reason is because in fact due to noise either biological or measurement noise cell types are not completely discrete so yes i completely agree with you um should i okay so i should be clicking on these questions answer live and okay i'm sorry if i clicked on something that i shouldn't have are there any plans to expand the program to another model like macaques or rats uh macaques definitely rats not yet um maybe uh we have to be reasonably we have to we have to set ourselves off up for success and if we expand to too many things we just can't accomplish the goal we have set for ourselves which is cap complete accurate and permanent of course nothing is entirely complete there will always be better methods um nothing is entirely accurate there is always noise complete well maybe you can achieve it we always hope but um i would say stay tuned there will be more macaque coming right not yet okay i'm not sure if i'm i hope you guys are clearing up whatever i answer are we doing good okay looking at online tools that you presented that's the tool allows for original analysis of your data or just provides raw data to analyze off the platform if so can i download the results of my data performed on the site you can absolutely okay so you can absolutely i'm sorry somebody entered the room and i got distracted quickly um raw data are all available you can download all the raw data the analysis is limited to the types of things we allow you to do on the website we're trying to add more and more features can you download the results of my data performed on the on your site can you put your data on the website look at it in our ecosystem and say where for example do where do your cells map within our taxonomies not yet but we we are absolutely thinking about it we are many of us are dreaming about it and we would love for that to become a reality not yet but i think in the future uh i'm one of the supporters uh that wants to enable this do you have plans to collect patch seek during behavior that is learning would be lovely to do uh we have limited sort of forays and they're usually part of collaborations we don't have systematic plans to do that and again the reason is we have to we have to be reasonable and we have to set ourselves up for success so we have to limit the number of dimensions we are going to explore uh when we also ask about you know behavior or let's say learning well which behaviors which which for example learning paradigms etc so yes but in a limited context um are you guys using ai unsupervised learning to key techniques to help understand patterns in all of the massive data that you're collecting yes absolutely yes so we use machine learning we use clustering um for any data sets we collect the transcriptomic data all have been processed through um multiple uh what we usually present one one outcome but the outcome uh is a result of uh usually very uh careful iterative and bootstrapped uh machine learning to learn patterns to um characterize patterns of for example gene expression and then to group these cells and usually to not only group them as i said in these firm buckets but to actually provide probabilistic cell type calling along this cell type landscape but what you will hear is that we are trying to do this not only on the single modality but we're trying to do completely unsupervised machine learning on multi-modal data sets so rohan dala today we'll talk about machine learning using um neural networks on patch seek data for example so stay tuned um how would you achieve the cell types as a spectrum would you need a novel bioinformatic pipeline as opposed to clustering great questions clustering likes to bucketize clustering likes to segregate but we know that the truth is that the truth well what we measure we know what we measure is a combination of discreteness and continuity and we want to stay as true to our measurements so what we do to simulate uh and to enable not only bucketization but also continuity is repeated clustering uh usually in a bootstrap manner to enable this probabilistic cell type calling also we are working but we are not yet there um hoping to enable some maybe more principled ways of really describing these um um cell type landscapes um that will incorporate both clustering and continuous um uh cell type variation stay tuned stay tuned i don't know how much time i have am i running out of time or should i just continue going through these keep going go on okay you can take maybe five more minutes of questions if you'd like i think this is interesting i can continue thank you um okay so i did answer the do you expect to uh identify more transcriptomizers from john simpson on youtube dr tasik are you planning to link these data to brain atlases to be able to open a particular feature hierarchy relating to some arbitrary brain region yes i want that i want my allen institute to do that for me also not only for uh for other users i would love to be able to wander into a brain region and say what are the cell types in this brain region what do i know about cell types in this brain region how do our old tools which are still extremely popular like the allen brain atlas how do they relate single gene per brain measured but how does that relate to my transcriptomic measurement in order to do this we need more multimodal data and we probably well not probably i would say almost certainly need higher quality spatial transcriptomic data so also stay tuned uh spatial transcriptomic data are coming currently most of our uh single cell transcriptomic data is from relatively crude dissections we are also planning to present those dissections on the web but ultimately what is really needed are spatial transcriptomic data there will be the rosetta stone between transcriptomics and many other data modalities that can be collected from tissue either in vitro and vivo that preserve the tissue context we are interested in non-canonical imprinting in specific cell types is there a way to integrate your single cell data with newly generated single cell imprinting info very interesting um we have not done that um i'm not aware that we are planning to do this but we are collecting more and more of single cell um chromatin accessibility data and um i suspect that you know these are genome-wide data sets we will be able to collect um data on imprinted genes but one thing that one would probably love if we're talking about imprinting is to have alleles be different most of our data is collected on black six mice at least in uh at least the mouse uh the mouse tissue it would be lovely to maybe include some um you know um uh crosses f1 crosses between wild type inbred and black six but that's that's not yet there i think caitlyn is saying something like this am i done or should i one more one more okay i was wondering where i might read more about the hierarchical okay i don't know if that's a word i'm learning it known hierarchical graph analysis you could be performing on the cellular data so uh you can for now please take a look at our um 2018 um uh paper tacit uh nature paper but there is more coming there also we've been performing these graph analysis on our human data you can look at a hodge and bakken papers that are on human data and then i think we have a next iteration of our graph representations and analysis coming in uh the cortex hippocampus paper that is currently being reviewed so um i would say look at those papers look on github for code older or more updated code um yeah that would be my recommendation okay am i done i think i think i think so okay thank you everybody thank you very much um caitlyn i believe that there's a way to sort of continue answering q a for basilica talk even as we move on is that correct that's correct okay great so now we'll hear from some scientists at the allen about their research using the allen cell types database um we're gonna start with start with nick jarstad so nick is a scientist in the human cell types team he's part of the allen salt human selves team part of the allen salt types database and he will be presenting single nucleus rna-seq profiling of middle temporal gyrus across the great apes and monkeys nick do i have that right yep great thanks all right nick thank you very much and take it away and thanks basilica for that excellent introduction on a lot of these topics so i don't have to spend a lot of time on those uh yeah so my name is nick i'm on the human cell types team here at the allen institute i work under edleen and trigva bakken and i really deal with kind of cross-species analysis and profiling cell types across species um compared to human and so i'd like to dive into this but um let's hear there we go but before i get into the actual uh results on comparing uh non-human primates i'd like to kind of explain where this work fits into the field in general and at the institute and so many groups at the allen institute have looked at and established cell type taxonomies in different brain regions and in different species for instance here is a dendrogram showing the cell types in human mid-temporal gyrus which we're going to be talking about this was from a 2019 paper by hodge back in it all and then you just heard a little bit about the mouse taxonomy as well from tasik at all basilica's work from 2018 and this is the dendrogram of cell types in a different brain region actually two regions the primary visual cortex and the anterior lateral motor cortex but what you can see and appreciate from these two taxonomies is that um there's actually pretty broad correspondence between species and even between different brain regions and so humans have lamp five types mice have lamp five types humans have vip and sst neurons and mice have them as well and so there's really a broad conservation of cell types across species and across regions so in hodge at all in this 2019 paper they actually compared these two different cortical regions from two different species and this is just kind of a cartoon of how that was done the cell types are colored over here and the species have different shapes so humans are the circles mice are the squares and what you can do is use different computational algorithms such as esc align or serat these are r and python packages that can be used to integrate two different data sets together and essentially what you do is you find anchors or you find common genes that are expressed in each species and use those to line them up in this reduced dimension space and then you can perform um unsupervised clustering methods there's the whole variety of them but you can perform them on this integrated space and identify cell types that have contributions from both species to kind of get an idea of which cell types are lining up across these different species and so these tsnes on the right show the inhibitory neurons that were integrated between human and mouse so humans blue mouse is orange you can see that they mix pretty well across the whole space and even these rare types out on the edges have contributions from both species and so when we clustered and um actually annotated these cell types these are the inhibitory neurons that were identified as cross-species cell types or cell types that are found in both mouse and human another way of looking at this data is as what we call kind of cluster overlap heat maps and so here we have as rows these are all the human inhibitory neurons that i previously showed you and then as columns i just cut the labels off so i could blow this up a bit but as columns would be the mouse cell types for the inhibitory neurons and you can see that there are instances of really coarse uh resolution so like for instance right here there might be four or five different human cell types that group or integrate with four or five different mouse cell types and we can't really get any deeper resolution on that there's too many uh species specific features that prevent you from going any deeper than that resolution and so we would call this a cross species p-valve one cluster but there are also instances of really high resolution mapping for instance these chandelier cells here human have one type that was identified and mouse have one type and they align one to one but looking across this whole uh taxonomy of inhibitory neurons you see that both species really have kind of complementary components and they're both have very conserved cell types there's just kind of different levels of species specificity within it and so the data i just showed you again is on the brainmap.org website this is just a screenshot of where it's located so under human mtg you can explore the raw data or you can download the analyze data with all the metadata and all that interesting stuff and then here's the mouse data set so now to get back to the topic of this talk which is um talking about the mid-temporal gyrus across primates so here's a little cartoon of a brain here's the mid-temporal gyrus in the human brain and this is a really interesting structure so it really it's thought among other structures is thought to really contribute to our humanness it's involved in language processing and integrating different sensory inputs together it's been implicated in many different neurodegenerative diseases such as alzheimer's disease it's implicated in neuropsychiatric diseases so it's it's a very human-centric brain region that has a lot to do with who we are and how we navigate the world and so one interesting thing we can do in this cross primate study is uh looking at what makes us unique to our closest living relatives and so that's one of the goals of this study i'm going to highlight three goals here but this first goal is really to con explore conserved and divergent features of our closest living relatives and so for this data set we were able to generate single nucleus rna-seq data sets from human chimpanzee gorilla reese's macaque and marmoset and then over here you can see how many million years have passed since our last common ancestor so uh chimp and gorilla for instance shared a common ancestor with human um seven to eight million years ago so those are pretty much our most recent ancestors and then i just love this visual on the right this was from uh bioarchive pre-print where they ran mri scans on 33 different primate species and so you can see the human chimpanzee gorilla reese's macaque and a marmoset brain all on the same five centimeter scale to kind of appreciate what we're comparing here but so another goal of this study um is to expand upon our previous human taxonomy so i just showed you uh the mtg data set from back in and hodgetall that was generated on 15 000 nuclei so now we actually have over 150 000 nuclei for human tenfold greater coverage so this was getting to one of the questions i was asked is do you expect to see more cell types as you increase the cell number that's exactly what we're doing here and that's exactly what we see and then the third goal for this um is to establish a really high quality mid temporal gyrus reference data set that we can use to map to different data sets into different studies we have going on here at the allen institute for example we have an alzheimer's study where we're sampling different brain regions from many people who have had alzheimer's and different disease states and so having one of those regions we're sampling is mtg and so having a really robust high quality reference data set that we can map these disease brain data sets to we can see if certain cell types have different proportions if cell types are dying out of preferential preferential types are dying off we can look for disease mechanism associated gene modules those sorts of things so it's very handy to have a high quality reference additionally we're generating multiple cortical regions across the human brain and having a really high quality reference that we can compare those two will help that as well we have an exciting study coming up called the human variation study and so we have around 80 samples from mid temporal gyrus from 80 different donors that we have already generated data from we're trying to look at how variable cell types are across the population or at least these 80 donors and so having a really high quality reference to map that is useful as well and then lastly you've heard a little bit about patch seek where you stick an electrode in a neuron you record its electrical properties and then you can suck out the nucleus sequence that data and map it back to this reference to find out what cell type you actually had and what electrical features are associated with it so you can start integrating these different data types together and so now let's actually get into the data so here is the data set that we generated for this great ape and monkey study and so we have human through marmoset over here we generated a 10x version 3 data set across all all six cortical layers the whole thickness of the cortex we generated around a hundred thousand nuclei per species and so this is really to capture all the diversity that we can we really cast a broad net and tried to get around a hundred thousand nuclei per cell type interestingly layer five has a bunch of so cortical layers one through six layer five has a bunch of very distinct and very rare uh neuron types and so we really wanted to boost the sampling of those so we generated a layer five micro dissected data set for human with around 38 to 40 000 nuclei um to really boost our sampling of those rare types now we also have what i previously showed from backend and hodge was this 15 000 nuclei data set for smartseek this was individually layer dissected so we can paint that layer information onto the final clusters that we generate and so for this study we also generated uh 4 and 4 500 nuclei for gorilla and chimp as well and we can perform isoform analysis with smartseek that's kind of another topic i won't have time to get into that on the right here here are the number of donors and sexes that were from each uh species so we have pretty good coverage across all of them uh with the notable exception of reese's macaque we only had three female donors four but we were able to get seven chimpanzee mtg samples which is just incredible for a total of around six hundred thousand nuclei and so i don't have time to get into the analytical pipeline i know there's been some questions on that but um i really want to reiterate this the importance of integrating these data sets so if i take all the chimp 10x nuclei that we've generated and throw them into umap we get uh the cell types over here and we can see that they split by sex and they can split by donor and this is really a problem for human and chimpanzee in these non-inbred non-mouse data sets like mice tend to integrate pretty well without having to integrate them but some of these more outbred species not so and so the problem is if we perform clustering in this non-integrated space we'll generate cell types that are sex specific or donor specific and they might not be that helpful for having a reference that encapsulates everything so what i do is i i take each donor and integrate them together with sarah and we can see now that the donors mix very nicely you see all the colors are nicely distributed and here we can even split it out by technology so the 10x nuclei and the smartseek nuclei perfectly overlap and so you can take annotations from the layer dissected smartseek data and paint those onto the final clusters you determine and so this is the integrated space that i actually perform clustering on to establish these taxonomies and so i did this for each species but i'm just showing here is the new human mtg taxonomy so we went from 75 cell types to now 150 cell types so we've actually doubled our resolution many of those original ones split into two now that we have more cells and here you can see as a bar plot the distribution and cell type proportions of those and then down below this heat map it starts at one and goes down to layer six these are the layer dissections from that smartseek data we had and so you can see the layer two three neurons they actually come from layers two and three just like you would expect them to and then here is the chimp taxonomy um and one thing that you really can appreciate here is that the two species have comparable levels of cell type diversity meaning if we look at a given subclass like these sst these orange neurons there's around 20 25 different types and if we look in chimp there's around 20 different types as well whereas if we look at um let's see here like these layer 5et types which is this very distinct branch on the dendrogram we only have like three types there and we only have two types in chimps so the comparable levels of diversity across cell types and so as basilica mentioned we can break these taxonomies up by kind of neighborhoods and so here i've broken the data up into five neighborhoods we have a two inhibitory neighborhoods two excitatory and a non-neuronal neighborhood and you can break those up in chimp gorilla all the way back to mouse you can still kind of establish these five main taxonomies of neighborhoods and then below the neighborhood level you can break them up into subclasses so for instance the red types here are the parvalbumin types the orange types are somatostatin and so on and so on and so one of the first interesting analyses we can do is we can look for marker genes of let's say this sst subclass in human we want to know what genes make this subclass specific so we can find those genes we can do the same thing in chimp we can do the same thing in gorilla and then we can compare those genes and see which ones are the same which uh transcriptional profiles are conserved and how do they differ and so for uh this heat map is pretty much showing the result of that so i don't expect you to be able to read all the small labels here but down below are all the neuron subclasses as different colors we have the inhibitory neurons over here and the excitatory neurons here and then each column is a different species so we have human chimp gorilla reeses and marmoset and we can see that a particular gene that marks a subclass can be conserved across all five species and it can be very specific and so these might be transcription factors these might be receptors and we can start exploring them and find out what key genes mark this as a cell type across different species so that's kind of at the subclass level we can also dive in a little deeper and go and compare leaf nodes on this tree so let's just take this inhibitory neuron neighborhood for human and compare it to the inhibitory neuron neighborhood for mouse and so here's the integrated umap space with the human clusters painted onto it and here are the chimp clusters painted onto it and so over here in this heat map we have all the human cell types as rows and the chimp cell types as columns and this heat map is showing how far away these clusters are to each other in reduced dimension space so a value of zero would be the darkest blue color and would be a zero distance so they'd be completely overlapped in um however many dimensions you're looking at and so what we see is that uh human and chimp have a lot of like one-to-one relationships there's one human cell type that aligns perfectly with a chimp cell type maybe one to two or one to three ratios but if we go back to the previous plot i showed uh comparing human to mouse you see that these are a lot more coarser resolutions where you have many to many relationships so it's nice to see that our closest relatives you can align better and establish uh homologous types between species so i'm going to skip that slide but uh the last the last little section here i wanted to show is uh interest so everything i've shown you is species conservation and i wanted to show an example of something that might be an interesting example of species specialization and so here is one of the excitatory neuron neighborhoods this is looking in the individual species space so these are not integrated together these are just the human nuclei the chimp nuclei etc and we can take a look at this purple subclass which are these layer 6it car 3 neurons and we see that in human they form two really distinct islands they're two very distinct types that exist and in chimp we can see that you still see kind of two types they're a little bit closer together now and gorilla they almost look like a butterfly they're still kind of linked together but then we go back to marmoset and there's really one homogeneous population of these and so this might be an example where a single cell type bifurcated in the great apes and started to specialize and so i took all these layer 6it car 3 neurons integrated them together here's the umap colored by species so they're all mixed together and then i broke them up in this feature plot here so here are all the human nuclei here all the chimp etc and i looked for genes that distinguished uh these two human islands from each other and so you can still make out the two human islands in this integrated space and one of the main categories that came out was these serotonin receptors htr 2 a b and c and so this right island was enriched for the 2a isoform in human and chimp and gorilla they both have this 2a and 2c uh profile but going back a little bit further in evolution we see that uh reese's and marmoset actually don't really have the 2c at all and they kind of have ubiquitous expression of 2b and 2a and so this is just an example of the kinds of analyses and kinds of hypotheses you can generate with this cross-species data and hopefully this will be available within the next year and so you guys can start analyzing it too we're still working on the qc and all that with that i'd uh really like to thank the allen institute i'd like to thank chet sherwood who procured these really precious chimp and gorilla samples that we're very privileged to work with fennecrenin who we're going to hear from i think at 10 30 who contributed the marmoset data the allen institute rna seek brain core team who and rebecca hodge who actually perform all these dissections all the facts sorting everything to get to the data generation which is a huge effort so thank you guys and then um my mentors trigva and ed so and of course sorry uh paul allen for his uh just vision and generous philanthropy and who would out we would not have this so thank you and i'll take a couple questions thank you nick for this sharing this fantastic and really inspiring um presentation and your insights this is this is really uh this is really fantastic thank you um so as we wait for some questions maybe to trickle in please reminder utilize the q a um maybe i can uh start and ask you the first questions since i'm as you see very excited about this topic and i was just wondering whether you have looked at the expression of some of the human specific genes or gene duplications as you know in the genome over the course of past you know six million years of evolution or so that separate us from chimpanzee there's been some sort of notable genomic events such as the duplication a certain gene duplication notch 2nl and argup11b and i'm just wondering if you see evidence of these genes being expressed in any sort of interesting ways that would be you know that give would give you some hints about what could have um you know really underlie the the recent evolution in human brain yeah so i'll start off by saying i don't have any crazy bomb fails to drop right now we i've looked uh tentatively at this and i do see some genes that actually were in those uh even like human accelerated regions that are enriched across all neurons so it doesn't seem like one particular neuron subclass is enriched for those human accelerated genes that's that's kind of an initial analysis now we also as i said we have smartseek data from human chimpanzee and gorilla which allows us to look at the isoforms that are actually expressed because many genes can be broken up into dozens of different splice variants and so we have collaborators at cold spring harbor who are currently analyzing the the smart seat data and looking for differences in isoform uh expression especially in those human accelerated regions across human champion gorillas so right now i i see a little bit of a signal across all the cell types but nothing at kind of the gene level so now we're taking a deep a deeper look at the isoform level to see if there's anything really interesting there super interesting thank you um i look forward to seeing how that and argued all tying this to um you know epigenomic chromatin analysis as well that you think could could provide some novel insights or are you are you mostly focusing on transcriptomics on the coding region so this study is only going to be a transcriptomic study we do have i believe we have mtg single nucleus attack seek and multimodal attack and rnac coming down the pipeline so eventually we will be able to incorporate that data into this reference but for now it's just transcriptomics fantastic and we've got a question here in the q a hello from um any rude hello great talk i had a technical and biology related question so two questions one i wanted to know how you are comparing expression profiles across species in terms of normalization so that's a technical question and then a second question based on your work and can you comment on um improvements on whether improvements in drug translatability i guess in other words do you think that understanding sort of the differences between species could provide you some insights into predicting whether certain types of drugs would have different effects across in different species both very important questions um the first one comparing expression across species this this is a tricky one for normalizing um the the simplest way to do it is just like a uh global log two cpm normalization which actually does a pretty good job and then you can do like spearman correlations and rank tests to see where in the expression intensity uh genes rank with each other but there are ways of uh so like dec 2 for instance has a way of comparing between species and that has a linear a generalized linear model a glm that is fit to the data and then you're kind of looking at what's the expected expression and then which genes are over expected or under expected and so performing degeneracies with uh these like pseudobulk methods with dc2 i found very helpful um that it's there's a 10 different ways to normalize the data and so it's hard to dive into all those in this um for your second question though based on your work can you comment on improvement of drug translatability yeah i think that's important because a lot of pre-clinical work is done in uh different species right and we don't know if the receptors that drugs are targeting are expressed in the exact same cell types in those two species or three species or whatever like is a drug tested in mouse hitting the same cell types as it would in human and so i think this work is really going to help shine a light on that and hopefully improve the the fail rate of a lot of these a lot of these drugs in in the clinic great fantastic thank you nick for these answers and if there are further questions uh i think please put them in q a uh otherwise and think in the interest of time although i know this is really fascinating we should move on um so thank you nick again great thanks tom and i might be the field questions in the q a so please please ask them and next up is rohan gala who is a scientist in the informatics and data team um and he will tell us about um consistent cross-modal identification of cortical neurons with coupled encoders thank you rohan and over to you all right so my name is rohan gala i work as a scientist at the allen institute with wigar symbol and team and what we primarily focus on is trying to integrate different modalities of data with which neurons are being profiled at the institute in particular i'll be looking at transcriptomic and electrophysiological properties of neurons that are collected through patch seek data so we are analyzing these data sets with machine learning models to try to come up with consistent cell types to try to come up with consistent ways of identifying these cell types given a particular kind of data so before i get into the details let me just kind of frame what we mean by a cell type because i feel people have very strong opinions about this so i would just want to step aside and for the purpose of this work i would like to define what we think of as cell types for the purpose of this work right so in biology we try our where we are trying to define cell types as an as an abstraction to exploit structural function relationships so there is structure in the data that hopefully informs us about the function of these particular cells that we are looking at in the brain so eventually we are under interested in understanding how the brain works so to try and understand the parts list in terms of the function we are looking at structure in the data right and so cell types are this abstraction that we are looking at and now cell types are expected to capture cellular identity what i mean by that is these cell type these individual cells they demonstrate properties that are stereotyped across many different individuals so cell types are trying to capture that notion of cellular identity and cellular identity is revealed throughout to us through observed properties so for example if you're a geneticist you may be looking at gene expression in cells if you're a physiologist you're looking at the physiological properties the firing properties uh the characteristics of the spikes that these neurons are showing and so you have many different ways of observing a given particular neuron whatever you observe about this given particular neuron is informing you about what this cell is and so with this uh notion of what i mean by cell types i want to kind of talk about what we've already seen so far in next talk for example where we have uh observed gene expression through uh transcriptomic profiling of cells through single cell transit chromic experiments and so when we look at similarities of gene expression um across this set of cells across thousands of cells we can try to cluster them together we can look at the similarities and come up with measures to group them together and a kind of common way of expressing these similarities is through these hierarchical dendrogram relationships so here i'm showing you a portion of the dendrogram that is related to mouse carb gabaergic neurons and you can see here the major classes of neurons the lamp fives the sncgs the vips ssds and power arguments that are further split into many different cell types these are genetic cell types and these are genetic cell types just because they are inferred from observed gene expression right all right so these cell types are also telling you something about the stereotypical properties of these particular individuals that you're observing so given a particular cell you can tell something about its properties about its gene expression just based on knowing its cell type identity um so just because you are a sst chartered cell it tells you something about the fact that you may have a throttle gene that is being expressed in a relatively higher proportion compared to other genes all right so when it comes to multiple modalities you may be a physiologist then you may have your own notion of cell types and those cell types inform you about the physiological properties of cells so when we put these two together side by side the cell types that say a geneticist would be interested in versus cell types that a physiologist might be defining and be interested in uh can be pretty different so these are you know t types and e types that vasuka mentioned earlier in her talk and so what is one way of trying to come together to a consensus to make sure that we are talking the same language that we understand something about different modalities and eventually understand something about the identity of the cell so this is being enabled by paired observations of the cells so for example these days it's possible through patch c technology to measure gene expression and physiology in the same cell and uh there's a nice review paper summarizing this technology by one of the uh panel members today srijayat party and group so this passing technology is allowing people to measure multiple modalities of the data from the same particular cell of course this also involves morphology and there are other modalities that we will care about in the near future but for the purpose of this talk i'm just going to focus on the transcriptomic and electrophysiological uh properties of the neurons that were measured from the same cells so the dataset that i'm going to be referring to is this patsy data set from the allen institute that was published in 2020 this has more than 3 000 cells being profiled with both modalities and the technical details if you're interested in were recently published in uh in this paper a couple of months ago and i will link to that later on in the talk as well but for details for this talk please refer to the paper and i'm just going to try to demonstrate what this tool is capable of what are the consequences of having an aligned cell type space okay so so the intuition that i've been trying to develop is that you have some observations that you made you have a data matrix in this case for the case of transcriptomics you have a cells by genes matrix from that data we are trying to come up with a representation that allows us to identify similarities and differences in the individual cells and from this representation you can reconstruct or this low dimensional representation or the summary of the data itself allows you to tell something about the data itself so given a cell type you can infer what the expected properties would be so here what we are doing is we are learning we are we are learning these relationships from the data to the representation and from the representation to the reconstruction through neural networks so neural networks can just be viewed as non-linear dimensionality uh reduction methods uh in this particular case we're using particular neural networks called uh auto encoders so here in auto encoders you have an encoder network that is taking you from a high dimensional data space to a lower dimensional representation and a decoder network which is taking you from the representation to the reconstruction of the data in a similar way you can have another autoencoder which is looking at a different data modality in this case you're looking at the electrophysiological features across the different cells and learning a different representation ze in this case and with the paired observations what we are trying to do is align these two representations so at the end of the day we want to end up with two different two representations of the individual data modalities that are so well aligned that you can use them to infer properties across modalities all right so what do these representations that we learn look like so here i'm showing you the two different representations zt and ze that are obtained by starting from the gene expression on the left and from the electrophysiological features so you you might be confused for a second to be seeing maybe the same figure on the left and right but in fact these are two different figures if you look closely you see some dots that are not in the exact same space but what this is suggesting is that these two data modalities can be aligned to a very very high degree in a low dimensional space and so the analysis of cell types can happen in this single space that in the single align space to try to understand cross-model properties and to try to understand cell types that are consistent across these modalities so what i'm highlighting with the colors here are the same transcriptomic types that people have previously defined right so this latest in this data set the cells were independently assigned cell types just based on their gene expression and this is showing that the groups of cells that you see here so each and each dot is a single cell the clumps of uh dots are suggesting that there are these natural clusters in the data set and now by overlaying the genetic type labels on top of these individual points you can see that the major classes of cells are preserved in this aligned space not only the major classes of cells if you look at the vip or the ssd branches or the lamb five branch here you will see that the colors within that blob so the different shades of pink and red and yellow they are also clustering together which is suggesting that it's not only these five classes of cells that can be grouped together there is further subdivision there that might be worth exploring so in terms of cross-modal or multimodal cell types what we did is we tried to do unsupervised clustering uh for on these representations to try to understand how many distinct types of cells can we identify in this aligned space and it turns out to be roughly 30 and these are consistent across the different modalities meaning irrespective of whether you start from electrophysiological profiles or whether you start from gene expression you end up with calling the cell the same cell type which is great because irrespective of the observation modality you're telling me something about the identity of the cell and this aligned space is capturing that identity of the cell furthermore these unsupervised clusters that we obtained in this aligned space which had nothing to do with genetic cell types so far we just tried to compare how they line up with genetic types or t types for these cells and it turns out that they are indeed very integrated as we saw with the clumping together of related colors and a further analysis of this is uh presented in the paper as well okay so so what good are these representations other than you know clustering into however many clusters we can define in this space so these align representations firstly these are obtained with nonlinear transformations right and that is crucial because these relationships across different modalities are not simply related so there are methods like cca that can perform these kinds of alignments but at the end of the day they will fall short of allowing you to reconstruct the data from this low dimensional space because the relationships are indeed very complicated so we've done these tests in the paper and i encourage you to look at the baselines that we've performed with linear methods but now what we can actually do with this align space is that we can perform cross-motor data prediction so if we start from the electrophysiological profile for instance and we try to ask the question what is the expected gene expression so what you can do is input just the electrophysiological profile encoded into your z uh into your low dimensional electrophysiological representation use that representation to try to decode the transcriptomic properties of the cells so here i'm showing results that are grouped together by a particular t type and in the columns you see individual genes so here you see that the pattern on the left is showing measured gene expression for these particular genes average across these individuals that belong to the same t type and on the right i'm showing you results that uh or the predictions for the genetic ex for the gene expression values uh that were inferred using the aligned representations right and you can see that the pattern on the left and the pattern on the right is nearly identical and we can do this for all the different uh [Music] cell types in our data set or all the different individuals in our data set and then group them together by the t types and we see that the gene expression properties that are predicted for these cells are nearly identical so this is suggesting that i can take in electrophysiological properties as the input and predict gene expression with a very high degree of accuracy this is these are post average results but we can do this at the individual sample level which is very exciting because if you are an electrophysiologist wanting to know something about the genotype of your cell about the gene expression in that cell this model this align space that has been trained on the reference path c data set allows you to do these kinds of experiments [Music] in a similar way if you start from the gene expression and for instance you're interested in the electrophysiological properties you can do that as well because the spaces are so well aligned okay and and finally where where this work is kind of going is that we we are essentially trying to build these models that serve as dictionaries or serve as lookup tables for somebody else's data so say you are not able to measure all the different modalities but you are interested in knowing something about those different modalities when we train on patsy data for instance we can have the transcriptomic profiles the electrophysiological profiles and the morphological profiles in this case and so once you have a model that is trained where all these cells are aligned in their representations it allows you to identify properties of these cells in modalities that you might not be observing with your own experiment so we've done some preliminary experiments at including uh morphology data in our in our experiments also so just like you had two auto encoders looking at two different modalities and they were coupled together in this case we have three auto encoders that are all coupled to each other and so the representations that you get from such an experiment are shown in the figure above and these representations are still preliminary we are still trying to refine this further but hopefully this will allow us to sell say something about the morphological properties of the cell to in the near future so all the stuff that i've described in the talk today the results are all present in the first uh reference uh in the second reference there are some technical details uh for the um technical stuff that we had to improve upon for these machine learning models to work the way we intended them to and with that i'll answer a few different questions and uh you know thank the institute this is a huge project all the data collection efforts are just incredible and so this is just a small slice of looking into that data set and trying to understand how to make it more useful for the broader community so i should also mention that in our paper we link to a bunch of different resources there are interactive platforms where you can run our codes run some of these numerical experiments yourself without needing access to a gpu or anything like that there's also an open code repository and we also share some data as part of the paper uh so with that i'll take some questions great thank you so much for this beautiful talk rohan so there's a couple of questions trickling in already so let's jump to that how is it so the first question is from alex malin how is it that the z e is very similar or almost equal to z t is there some aspect of training of these auto encoders that rewards similar representations uh yes indeed so so the the auto encoder by itself is trying to minimize the reconstruction error uh through the low dimensional representations but we introduce a coupling loss so so meaning if the representations are not identical or not similar they will be penalized a little bit so we framed what we've done in this work is essentially framed the problem of coming up with consistent cell types as an optimization problem where the individual auto encoders are trying to just do the best job at capturing the structure in that modality but at the same time we are trying to align the different modalities and that that's the penalty that the auto encoders have to deal with or incorporate so in this this is the way we are trying to um may make the two representations similar that's great and then um another question is from um stephen smith uh just is it just a coincidence that all the genes on slide seven are neuropeptide related yeah yeah that's one of the details that i didn't really have much time to get into but it's a really interesting question right so these neuropeptide genes were found to be a set of genes so so these are signaling molecules that are found to be cell type specific so there was a recent paper in 2019 that actually showed by looking at just transforming data that these neuropeptide genes are actually cell type specific because they are celtic specific and they are involved in communication so neuropeptide genes are known as communication molecules as well um because they are involved in this peptidogen communication they were very interesting to us to just see you know are these gene sets somehow predictable from physiological measurements and it turns out that they are so it's not a coincidence that we i showed those genes but the results of being able to predict a gene expression from electrophysiology generalized to beyond only the neuropeptide genes so we've done these experiments with marker genes that are included as part of the paper and the results are more or less the same great wonderful thank you so much for these answers and there will be more questions for you to answer in the q a by writing and um now uh we'll need to move on in the interest of time uh we're a little bit behind thank you so much rohan and we'll hand it over to rebecca hodge who is an assistant investigator in the human cell types um part of the allen cell types database and she will present on cell type diversity in the human cerebral cortex revealed by single nucleus rna-seq over to you rebecca um thanks tom can everyone hear me okay all right so today i am going to talk a little bit more about our work in human cortex specifically i'm going to build on some of the concepts that nick introduced nicely in his talk and i'm going to focus in specifically on how this ability to integrate different data sets from different species allows us to dive into the biology of the human cortex in a bit more detail and in particular i'm going to focus on kind of one cell type as a case study and in in using the the mouse mapping to mouse to really allow us to understand what these different cell types are in human right so as nick mentioned one of our first kind of forays into doing single nucleus rna sequencing in human was looking at middle temporal gyrus and this uh taxonomy that i'm showing comes from our smart seek data set that he had discussed um and there are a number of things that we can learn just from looking at the taxonomy of cell type and human um one you know several which are we can see that these cell types split into major classes so for example we see inhibitory types and excitatory types splitting apart into different aspects of this dendrogram and as well non-neuronal types being their own kind of main branch we can also see that many of these clusters are rare so for example with the inhibitory types over here and this section of the dendrogram you can see that this bar chart is showing how many nuclei comprise these different clusters and they're all relatively small indicating that these are pretty rare types there are some more common types for example in layers two and three in this area of cortex but for the most part these cell types are rare and as nick mentioned we do these laminar dissections for these smart zig data sets which allows us to retain some of that sort of spatial origin of the cell types and we can see that some of them are localized to specific layers for example this set of inhibitory types is largely found in layer one others kind of cross multiple layers shown here and again the excitatory types tend to be somewhat laminar although overlapping different layers some with more focused kind of localization for example these layer six types here but for human this is kind of really what we can get to from this data we lack the ability to measure a lot of properties to infer to infer different properties about these types in human for example we don't have great tools to genetically access these types like you have in mice it's very very difficult to measure the connectivity of these types in human for example long-range connectivity is essentially impossible to measure uh in human cortex currently and so what i wanted to talk about today is how we can use some of the information that we know from mouse where we have a lot of a lot of this information has been experimentally collected in these animals and how we can use that to infer properties in human cell types so nick already introduced this concept of aligning data sets across different species and here i'm showing the same slide that he showed previously of how we did this initially aligning human middle temporal gyrus to mouse visual cortex and anterior lateral motor cortex data sets and basically you know the concept is that you're using sort of conserved shared gene expression to align these cell types and creating a set of homologous cell types from that information and so as nick mentioned a lot of these map at sort of the subclass or level they don't map kind of one-to-one across species except for very specific types like chandelier cells but despite that we can look at how these types line up in human and mouse and try to understand a bit more about humans so for example here i'm showing the excitatory neuron homology that we created for that initial study of middle temporal gyrus and what i'm showing here is i'm highlighting one particular cell type so in human there's one one type of this particular cell type called the excitatory layer four five fez f2 sn scn4b type and we can see that that maps to four different mouse types that are classed as in this in this uh diagram cortical fugal which is subcortically projecting or we have now started calling them extratelencephalic projecting cells so et layer5et neurons um and we can see specifically that the human type maps to all of these different mouse types now in mouse you're able to understand where these cells project you by doing retrograde labeling of the cells and then collecting those cells and profiling them using transcriptomics and so by mapping from mouse to human we can see now that we've uncovered a type in human that appears to be a sort of long-range projecting type that we otherwise wouldn't have any information about in human and what we can do with that information is quite a few different things now we can go and take the marker genes that we're able to uncover using transcriptomics um and look at those cell types in tissue in mouse versus human and so what we see here is we're showing nc2 hybridization where the red dots whoops sorry getting way ahead of myself there let's go back okay so where the red dots are um cells that are labeled with marker gene fam 84b which is a common layer 5 et marker gene and mouse and human and what you can immediately see is that there's a really big discrepancy in the frequency of these cell types in human versus mouse so these cells are very very rare in human and significantly more abundant in mouse layer five and so we can start to understand how the how the composition of cortex varies across the species using these types of methods so as part of our our work um in human cortex what we have wanted to do is to start to expand beyond just looking at middle temporal gyrus and getting into many different areas of cortex um and one of our our sort of next kind of forays into into creating a taxonomy in human cortex was doing a deep dive into the primary motor cortex and this is a an interesting area because it's very functionally conserved across species and also pretty anatomically stereotyped so we can go in and localize this area with pretty high confidence in human and it's also an area that's characterized by a particular kind of layer 5 cell type that's been something that's been noticed for many years is that it contains these cells that are called bet cells and these are very very large motor neurons that are thought to project to the spinal cord um and they're really massive in human they're up to 100 microns in diameter and so what we wanted to do with this particular data set and focusing in on these layer five et types is understand whether or not bets comprise bet cells comprise some of these layer five et neurons and so here i'm just showing a snippet of this primary motor cortex data set um here just showing you that we have with this is a 10x genomics data set so quite a large data set around 76 000 um nuclei that we profiled and we were able to find three different um cell types that correspond to et neurons again by mapping to mouse primary motor cortex to understand what these cell types are and we see now that we can take marker genes derived from this omics data and apply that to tissue to understand if these any of these three putative et cell types correspond to these vet cells and what we find when we go in and look in the tissue and i'm showing here nc2 hybridization in combination with immunohistochemistry to outline the the shape and size of the cells and we find that there's two of these different clusters at least two of these clusters that include bats vet cells so basically with this data we've been able to confirm that these uh these bet cells correspond to a long-range projecting layer 5 et type and that there's some uh transcriptional heterogeneity within these different clusters of bat cells that we can potentially use to kind of dive even deeper into their biology eventually and understanding maybe a little bit more about how their their connectional properties might differ differ for example and so in the next part of the talk i want to kind of dive in a little bit more detail to another study that we completed to look at a particular kind of morphologically defined cell type in human brain that's been really the subject of a lot of of interest um in the literature but very very difficult to study and these are von economo neurons they're found in only a couple regions of human cortex so the anterior cingulate cortex and frontal insula and they're only found in a subset of species so for example they're not present in rodents but they're present in humans great apes and a number of species that you can see kind of outlined in this table here these are cell types that have been defined by having a very characteristic morphology so they have this long spindle shape that you can see here in these images across these across different species they also have been the subject of a lot of interest because they appear to be selectively vulnerable to different neuropsychiatric and neurodegenerative diseases for example they seem to be selectively lost in behavioral variant frontal temporal dementia however they're really really hard to study because they're not in a good experimentally tractable species like mouse but as it turns out we can use some of the information that we've learned from mouse to understand more about these specific cells so this actually was one of the first data sets that we generated so in comparison to a lot of the data sets that you'll see on our website it's tiny it's a very small number of neurons um or of nuclei and so we're using again the smart smartseek technology this is a data set that's only made up of 561 nuclei and they're specifically dissected just from layer 5 of frontal insular cortex in human postmortem tissue and that's outlined here showing how we kind of did this experiment so basically we went in we dissected this layer from the specific region and profiled the gene expression within those nuclei and we again sort of created a taxonomy this is a small version of a taxonomy um but what we saw immediately was that there were a couple of marker genes that we had previously seen in this area in the literature that were specific for von economo neurons and those are the genes gabar q and adr a1a and we saw one cluster of cells that contained nuclei that expressed those particular genes and so we wanted to dive in a little bit deeper to try and figure out if that was the cluster that included these von economo neurons and so we did the same thing that we did previously with comparing human middle temporal gyrus to mouse cortex data again using the strategy to map to map cell types from these different species to each other and what we find again is that there's one particular cluster of layer 5et neurons that maps that includes this cluster and frontal insula that maps to the layer 5 et neurons and mouse it also maps to one to one to the layer five et neurons that we had previously defined in middle temporal gyrus so this was the first sort of suggestion that these von economo neurons are likely a long range projecting extratelencephalic projecting cell type we then wanted to go again sort of taking the marker genes these novel marker genes that we can infer from the omics data and putting that into tissue and looking at where these cells are do they correspond morphologically to von economo neurons and so i'm just showing a couple of examples of that here um where we can see um we're labeling with this et specific marker pou3f1 and fezf2 as well as slc17a7 which is a marker of excitatory neurons and we can see labeling with all of these different marker genes and cells that have this characteristic long spindle shape in cortex and that's shown a couple of different ways with different markers here but interestingly what we found from this study is that these clusters were not exclusively uh containing von economo neurons they also contained um neurons that had different morphologies for example what's shown here and this image is a fork cell and this is another kind of characteristic morphological type um that has these interesting extensions of dendrites off of the top of the cell and as well we saw just sort of more traditional pyramidal looking cells so we quantified that over here um and showed that actually the bulk of the clusters were pyramidal neurons and then von economo neurons and fork cells comprise a kind of smaller rarer proportion of those um and it's interesting that there's not uh kind of obvious gene expression differences that distinguish these morphologically kind of distinct cell types within these clusters granted again this is a small data set so perhaps if we dived a little more deep in the sampling we would see more subtle gene expression differences there but in any case we can now start to look at a little in a little more detail at what are the gene expression signatures of von economo neurons um versus other types of of extra telencephalic projecting cell types in in human cortex so here we're just showing a comparison of gene expression of the matched cell types these layer 5et types in human frontal insula versus in the middle temporal gyrus and what you can see is that we can now start to understand um specific patterns of gene expression within these different areas of cortex so here we're kind of highlighting a lot of the specific genes that are expressed in in von economo neurons in front of insula and we find many novel markers that haven't been described for these tell cell types previously in the literature which again allows us to understand a little bit more about potentially about the function of these cells and also just allows for better ability to go in and interrogate these cells in tissue using these new novel markers and so beyond kind of just looking at um at one you know example of a variation across the cortex we now want to go in and compare cell types and human cortex across multiple different regions um and one of the data sets that is present on our website that you can go in and look at now is is a data set that hi that looks at six different areas of human cortex um it creates a taxonomy from from those six different areas of cortex it's the data that's on the website currently is again smart seek based data and we can infer a lot of interesting things about human cortex from just looking at how the different cell types are distributed in this data and what's the frequency of those cell types and one of the things that we see again focusing in on these et neurons um is that they vary a lot in frequency across these different cortical areas so for example if you look at primary auditory cortex a1 and primary visual cortex v1 the cell types there are very very rare so they're less than one percent of the excitatory neuron population in layer five so extremely sparse sort of reminiscent of what we saw in middle temporal dries when we compared to mouse but if you look across the cortical sheet in human you can see that there's some areas where those are fairly abundant in cingulate gyrus which is another area that contains von economo neurons those are up to seven and a half percent of the excitatory neuron population uh in layer five versus in m1 where again we looked at those bet cells they're kind of a middle range of abundance of those et cells and so we can really start to understand with this type of data how how different cortical areas vary and how particular different cell types vary in their abundance and proportion across these different areas and going back to this idea of comparing across species we can also again as i pointed out earlier in the talk these are these cell types are very much much more sparse and human than they are in mouse and we can start to understand how that variation looks across a set of species so here we're comparing to macaque and mouse and we are looking at the abundance of layer 5et types using again these sort of standard markers that are common across all the different species so fam 84b for example and what we can see is that in human again these cells are very very sparse and in macaque they're kind of a medium and middle range about 10 of the excitatory neurons in layer 5 or et and a mouse they're over 25 so we can kind of see this range of abundance of of neurons and so we can start to understand again what this might mean for for the function of different cortical areas across species and so just to wrap up i wanted to point out that a number of these data sets are on our website the middle temporal gyrus taxonomy that i showed you as well as the cross cortical area and primary motor cortex are all available for you to explore and to view on the website and i will be talking about those later today in the tutorial session as well and so with that i wanted to say thank you to well to ever there's these studies were obviously a massive undertaking by many different people whose names don't all fit on this slide and thank you to paul allen of course for his vision encouragement and support and parts of the research that i presented today were funded by the national institutes of mental health thanks thank you rebecca for this wonderful talk well we have time for a couple of questions and i was just wondering if i could maybe start and kick us off so for a very long time it's been sort of presumed or the general view of the cerebral cortex is that servo cortex is sort of serial homologous across cortical areas but your data seems to suggest that in humans perhaps the molecular there is actually a lot of molecular diversity within a sort of very specifically defined cell type so i wonder do you have speculations about how many how much variation there really is across areas and could that inform us uh about um you know how cerebral cortex actually functions yeah i mean that's a great question i think we're a little limited in the data that we obviously have currently for human where it's very much like snapshots of of areas of cortex that we tried you know in the initial sampling to kind of spread them across the cortical sheet to capture that sort of range of variation um i think without without a really dense sampling across cortex our understanding of that is going to be limited um and that is you know that is something that we are currently undertaking as one of our projects is to go across cortex and sample more frequently and try and fill in some of those blanks great so in the interest of time we'll have to take the rest of the questions by writing thank you so much this is really inspiring and wonderful work thank you rebecca thanks and now we'll take a few minutes break um so seven minutes see you at in well six minutes ten past 10 o'clock pacific time we're good yup we will return it 10 after in whatever time zone you are in and we have two more speakers and then a reminder everyone we do have the tutorial with rebecca it's gonna be in a separate zoom link or youtube link depending on which platform you're joining us on you can find the link to that on the same page uh the with the login information where you got the link to join us here right now or you can just go to alleninstitute.org youtube.com allen institute and it will be in the list of videos there we will return at 10 after and our speakers are answering questions by typing those responses in q a so please do keep them coming we're gonna get started in just another couple minutes here we have two more speakers for the day in the main symposium and then we have the tutorial where rebecca is going to walk you through all of the practical considerations of how to use the transcriptomics data from the allen cell types database so sridra and fenna are going to be here in this same zoom link or on youtube the same youtube stream but the tutorial starting at 11 o'clock pacific time that's 50 minutes from now whatever time zone that you are in uh that is going to be in a separate link there is no pre-registration required you don't have to sign up for that tutorial in advance it is included in your symposium registration uh so please use the separate link you can find that on the page that you use to join us here today or you can just find it on our youtube channel and a reminder for those of you who are going to be joining us tomorrow and wednesday for the remaining talks and tutorials in this event those are going to be separate zoom and youtube links as well and you can find those all on that same login page if you're having trouble please let us know and we're happy to help you out all right we are just about it time to get started again so i'm going to hand it back to tom all right tom take us away you're on mute oh sorry about that it's we've only been doing it for nine months so great so should we start great it is uh thanks welcome back everyone after the break hope you got a chance to stretch your legs and and refresh um so now it's time for our um as a speaker who is not actually a member of the allen institute for brain science uh his name but he utilizes uh the data the beautiful data sets that have been generated by the allen institute and um his name is sri joy tripathi and he's an independent scientist at the center for addiction and mental health and also an assistant professor at the department of psychiatry at the university of toronto and he will be presenting a talk entitled identifying the transcriptomic signatures of cell type specific electrophysiological heterogeneity using publicly available patch seek dataset we'll look forward to your talk should i take it away thanks tom and thanks thanks to everyone for organizing this great symposium it's really an honor to get to speak at the symposium and um yeah it's great as tom mentioned like my lab uses the allen institute resources heavily like we there would be no trip lab without the industry resources so we're very grateful to them um so let me see if i can figure out how to control the slides okay cool okay so we've heard a lot about transcriptomics data and single cell transcriptomics data and i think the the question that the field is sort of we're quickly approaching if we're not already there is how do we draw functional insights from this emerging wealth of data so here in this slide just as a as a motivation for my talk um this is uh data like that we've you know sort of that we've reprocessed and analyzed that basically that rebecca just talked about so this is from the multi-region data set that the institute has collected of a single nucleus rna sequencing data sets from the different regions of the human cortex and this is just one interesting tidbit from this data that i'll sort of walk you through so here we've organized the cells so these are p valve like cells in the p valve subclass so p particle domain interneurons and on the y axis we're plotting probably probable human expression in p valve interneurons and so what we see is this really paradoxical finding where cells p-belt cells in cingulate gyrus don't really express that much of parallel beaming even though they're in the prevalent human subclass but cells in the visual cortex express a lot of problem with human and so the question is you know what what does this mean is there something functionally different about paravalebumin expressing interneurons and uh you know it's in the more frontal regions relative to neurons in more sensory regions so this is something that i kind of want you to keep thinking about as i go through my talk so um i'll be talking about our analyses of patch seek data sets so rohan and basilica talked about patch c quite a bit um briefly it's a method for uh collecting single cell transcriptomics following electrophysiology characterization if you fill the neurons with the dye you can also recover the morphology um the technique is relatively new uh it was you know the first publications with that came out in a you know just a few years ago um as as rohan already mentioned um i was part of an effort to sort of write uh you know a review paper on patch seek um we did that in collaboration with a number of people who developed this method and who have you know since perfected it including uh people at the al institute like kristin hadley here um and so i i really like this review i think we put a lot of work into writing this review and if you're interested at all in implementing patchseek in your labs i would strongly suggest you start with reading this review i'm biased though um so you know in the paper we talk about applications of patch seek and um so to me it seems like a lot of the efforts so far have been using patch seek to help um help annotate existing single cell transatomic atlases so for example if you have transatomically defined cell types like t types then you could you can use patchy and say well here's the electrophysiology signature of a cell that has that sort of annotated to this t type and you can go through and you can say well here are the electrophysiology and morphology morphology features of each cluster of cells and so that's prime primarily to date has been the one of the the primary uses of patch seek and one thing i want to talk about today is that um from how we can begin to infer function from these data sets so rohan's touched on this a bit in his talk and i'll sort of extend on this a bit i'm specifically focusing on like how we can use patchy to derive novel gene function relationships um so so basically the idea is that um given that you might see you know systematic variation among cells and cell types say for example in their functional characteristics like their electrophysiology features so shown here in the schematic then can we use the can we use the fact that we can sample gene expression from cells and cell types and then can we relate the systematic variation we see among functional characteristics to the levels of gene expression by the same types of cells so i want to admit that i've kind of been working on this for some time so trying to be hip and to use a common meme i thought i'd sort of go through like how it started how it's going me sort of illustrate my my my progress to date on this on this problem so here's a picture of me from 2014 uh i had a lot more hair back then and then here's a picture of me now me and my lab now from uh from you know last year so so back in the day you know not again not that long ago single cell transplants wasn't really a thing patch c wasn't a thing and if you wanted you know in my efforts to to work on this project like you know the best we could do is we could sample or we could try to gain access to gene expression and electrophysiology data by literally sampling by curating this data from the literature so there was a time when i had a team of 10 undergraduates reading neuroscience papers and curating information like electrophysiology information from the from the literature into a centralized database called neuroelectro we've had a similar effort for gene expression uh gene expression data sample from pool cell microarray data sets and so with this data we the best we could do best we'd hope to do is we could we would we could just get you know get information at the cell type level because it was basically impossible to sample gene expression electrophysiology data from the same exact cell at the time and so uh you know we gather data from multiple cell types altogether and then each cell type is shown here by a different color and then we would perform these correlation analyses where we sort of relate continuous variation among electrophysiology property shown here by the schematic with continuous variation among quantitative gene expression shown here and so we would say across these different cell types this gene is correlated with variation in this electrophysiology property um and so here i've sort of you know pointed to two papers where we sort of pursue this approach and so the key insight from this work is that many and most of the signals that you would get out of this analysis um they're they're largely reflective of gloss gross cell type differences so we would get out marker genes and so you know by staring at these gene lists what we reasoned is that um this analysis is sort of pulling out genes that are sort of more more more reflective of gross differences in cell types and they're likely not to be the genes that are driving causal differences in the cell's functional characteristics say for example like ion channel of genes and so so today um you know by by the efforts of people like at the allen institute and the talias lab and others there's now massive massive scale patch sheet data sets so rather than patching data sets from like say tens of cells we're talking like thousands of cells so this is a figure from um from the recent uh patchy data set from the allen institute uh with the first author nathan gowens at all um where they where they present a data set of around 4 000 uh neuron sample from the visual course 4000 interneuron sample from the visual cortex um where each cell has uh you have the gene expression the electrophysiology and some of the cells here have morphology and then there's a companion data center a sister data set um led by the talias lab that was recently published in nature with a 1500 cells that encompasses both excitatory inhibitory cells and so with these data sets what we can do is we can um just as basilica and others have mentioned um we can we can perform these correlative analyses at uh different levels of the cell type taxonomy and hierarchy so before when we were sort of there we could do nothing but like we had to look at differences across cell types now here we can look both within cell types as well so specifically we can we can perform our analyses with different cell type resolutions like at the class and subclass level as well as looking at different examples of cells from the same cluster or t type so here's some examples of of the you know different resolutions of the cell type taxonomy and here's what that would look like as well in terms of like you know grouping cells into classes subclasses and clusters and so um so here i'll tell you about a bit about a project in progress so this is work led by a phd student in my lab under christine niggum and she's assisted on this on aspects of this by alex hogarth he's an um an md student who sort of assists on this project in between like delivering babies and so um so so here like this is like the same the same figure that i showed before we were because we have patch taking because we have multiple examples of cells from the same t type we can perform the analysis of different cell type resolutions we can look at across cell type differences as well as focusing on cells of the same t type and we can look at cells we can do the analysis as well at the within cell type resolution and so here that's shown by the different colors so these are all the same color and so we can look at you know these these the analysis uh within cell types oh i should also mention that you know it's funny that so so rohan talked earlier and this is an example of the beauty of the allen institute data sets where you know you can you can the same analysis you can do so many versions of the same analysis on the same data set and come to different conclusions which is kind of a beauty of publicly accessible data that i wanted to emphasize here um so so how do we actually you know do this do this analysis of looking at uh looking at these correlations within different cell type resolutions so here we're using the the magic of mix effects models and so let me just quickly explain that so basically we're capturing the fact that we can analyze different cell type resolutions by using different grouping factors so we can do analysis at the class level by just using standard linear regression modeling where we're trying to you know model the relationship between electrophysiology and gene expression um and as well at the subclass and cluster level by having grouping factors which we model as random effects so where we where we use we're basically modeling um we have random intercepts for the different proofing factors so we can have random intercepts for the subclass and random intercepts for the cluster level and then by doing that we're sort of removing we're removing some of the uh some of the cell cell type differences due to uh you know gross cell type differences we're trying to preserve or try to remove that from from sort of influencing these correlations so so here's an example of what this looks like so let me sort of walk through it very very slowly so on the y-axis here um we're showing uh one electrophysiology property so here we're showing ap action potential with and so that's defined as the the width of the exponential half max for the action potential at radio base and so that's measured that's been you know quantified from raw electrophysiology data from each of say the the four thousand cells um in this data set from gallons at all and so each cell is a dot and so the y-axis is electric physiology and then on the x-axis we're showing gene expression as measured through single cell rna sequencing and sampled through patch seek and so we're picking here we're just showing one gene this is kcnc one and this codes for um a potassium channel and so we we're basically showing the same data three times and we have lines capturing the different cell type resolutions so at the class level we see that there's a negative correlation between um this electrophysiology property exponential with in this gene and we're looking across all the cell types and then we can do the same thing as well by looking at the subclass level where we have a different line for each of the four subclasses vip p-valve line 5 and sst and then same as well at the cluster level where we have one line per t-type and so across the three you know we're showing the the the fit beta coefficient which sort of quantifies the association between this electrophysiology feature and this uh gene expression profile and we see that across the three like the across these three resolutions this correlation this beta coefficient is always negative and so in general like these the the slope of the lines is going in the in the negative direction um one thing i want to point out is that generally when you go from class subclass to cluster the associations become less extreme and so and that's because we're sort of removing away a lot of the variability that's due to the different gross differences in cell type which is kind of what we're hoping for so this is an example of one electrophysiology property and one uh gene one one gene you know we're measuring they're measuring thousands of tens of thousands of genes these data sets and basically measure as many electrophysiology properties as you can quantify given that the the raw data are publicly available and so here this is basically showing like the the net result of this analysis where for every pair of genes and electrophysiology properties we sort of you know we do this uh this correlation analysis we do this linear mix of effects analysis and then we can just quantify how many genes are significant at some uh at some statistical threshold so here we're looking at the number of genes there's that are certainly significant with the bonferroni uh it's on that bonferroni fdr of less than 0.1 and so the you know the key point i want to point out is that when you go from the class to subclass to cluster resolutions there's a lot fewer genes that we see as statistically associated um at the subclass and cluster levels related to the you know can compare it to the class level so here we're talking like thousands of genes and here we're talking like tens to to low hundreds and so you know the the hope is that and as i mentioned this project is still sort of in progress like the hope is that you know while there's fewer genes at these at these resolutions the hope is that they're more likely to be you know causally related to the electrophysiology properties in question where the electrophysiology properties are sort of given by the they're shown here by the um the different bars so for example this purple bar this is uh input resistance or this is the memory time constant so um you know just just very quickly you know like this i'm i'm i really like this as a method for you know helping helping better understand like how um cell types might be or you know gene function relationships how they might be different to how we can quantify them within cell types um so you know the next steps for this project for us before we sort of start writing this up is how do we we need to sort of provide some assessment of which cell type resolution do we think is best like which one do we think is more likely to lead to more likely causal relationships likely more likely uh gene function relationships and then as i mentioned there's sort of other apache data sets that we can use to help those bolster these findings based on the the analysis i just showed you um from the uh from the salon institute data set so we're really interested to use like other ones like this more like this other one i mentioned from the tilius lab to help compare our findings to um so before i end i just want to get back to this uh this example i mentioned so how do we you know is it possible that we can use this analysis of mouse you know mouse patch sheet data to make inferences about gene function relationships using data from other species so so as i mentioned this is from you know human human postmortem cortex from single nucleus rna sequencing data and so you know can we can we can we sort of try to understand what's happening here in the context of parallel human expression and poverty even neurons in different regions of the human neocortex and so you know what we can do is we can go use this use this data set that i just showed you the analysis of the state of this apache data set from mouse visual cortex and we can ask is proven correlated with any of these electrophysiology features and so we see that yes indeed it is correlated with active potential with across these visual cortex neurons is correlated with this electrophysiology feature called fi slope so this is um this is the the the frequency current relationship that sort of captures like how frequently cells can fire when you give them the same uh you give them the same current injection stimulus and we see that it's possibly correlated with this feature it's negatively correlated with this feature so you know by and large this is some evidence that showing that levels of paravolvium expression in a cell is modestly correlated with aspects of you know greater excitability even among cells of the same t type so you know again that's from the mouse visual cortex but sort of like you know taking that insight that we can learn from that data you know it's we we might infer that you know when we see this this relationship with greater poverty and expression from you know more anterior to posterior regions of the of the human cortex you know this might indicate that there is greater levels of electrical excitability among cells in the in the you know parallel beam cells in the visual cortex relative to you know cingulate and medial temporal gyrus so this is a hypothesis that we can draw from analyzing these data sets so i just want to mention that you know it's going to be very challenging if not impossible to assess the electrical excitability of cells you know and not in the medial temporal gyrus in the context of the human cortex but what we can do is you know from we can we can start looking at cells you know in other species so for example i just wanted to give a shout out to this um you know this uh this collaboration this recent neuronex collaboration that my lab is part of where we're going to start looking at cells in different species of a non-human primate uh at using different modalities like single cell transcriptomics and in vitro and vivo physiology and behavior and try to understand like how are how are these regional gradients of cell types different across different regions of the of a human neocortex sorry of the non-human primate neocortex and so this is work done in collaboration with a number of groups including fenugreen and whom you here for from the top directly after mine and so i just want before i mention before i end i just wanted to mention that my lab is like hiring extensively we've been very lucky with grants and so we're hiring for a couple research staff positions and uh we're also we're always looking for applications from postdocs and graduate students uh i think we have a lot of fun in my lab and um and we have fun even despite the challenges of the pandemic okay so yeah i just wanted to end by acknowledging my funding sources and acknowledging the great data sets you know produced by the allen institute and elsewhere as well as the efforts of the the people in my lab that underground especially who led the project that um i talked about today okay okay so great fantastic talk thank you so much for this shri joy so we'll uh jump straight into questions over here um so there is a question from stephen smith um dear sure jerry how many different potassium channel genes do you typically see co expressed in individual cells or sodium or calcium channels for the matter what do you think this pertends to electrophysiological prediction that's a good question steven uh i would say most of them but that's not a very good answer um yeah most or many of them but on the other hand uh channels tend to be expressed at lower levels so you're more likely actually i don't really have a good answer for for you so many many potassium channels or the other channels are co-expressed and if i could give a give you a very precise answer in minutes but that would take a bit of analysis that i don't have time to do right this moment great and many uh maybe a very quick question from me um are you aware of anyone doing patch seek on various types of knockout gene knockout um mice or animals that you could incorporate to sort of really validate the predictions from your model or are you interested in doing it i'm certainly interested in that and i'm aware of no one who's specifically doing that but that doesn't mean that it's not being done sort of it's i think it's only a matter of time for for those experiments to be done in a very rigorous way one thing that i'm sort of struck by the patchseek method is that you know relative to other methods in single cell transcriptomics it hasn't quite you know it hasn't quite you know reached been adopted quite as as greatly as i would like you know think about 10x genomics it's sort of like become like the workhorse method of molecular neuroscience and apache has yet to be as incorporated by you know standard patch clamp electrophysiology labs and so i'm i'm very interested and very excited to like work with patch clamp electrophysiology labs to incorporate patch sequence that's great thank you so much richard this is really inspiring um in the interest of time i'm gonna have to uh we're gonna have to move on but thanks again so much for this and um maybe we can set up fenna with her slides so it is my distinct pleasure here to introduce fenna crenin who is a post doc currently in steve mccarl's lab at harvard medical school um she's also a fellow member of the next generation leaders council and so it is an extra privilege to uh introduce her talk which will which is entitled innovations and primate brain cell types um so over to you fenna thank you so much tom thanks everyone for uh tuning in and for the organizers for inviting me this has been so fun so far and it um all of the sessions this this morning and sri joy have teed up this observation that i'm going to start with which is just so uncontroversial and that is that it's a tremendously exciting time to be studying brain science uh but particularly i think when you have a comparative question in mind and in some sense we all have a comparative question in mind because we've relied so much historically on powerhouse animal models or in vitro models to try to ask access some aspects of brain function that might be our brain phenotypes that may be relevant to humans but how do we actually know when we when we arrive at that and i would say that everything that you heard this morning from from nick and from basilica and from uh rebecca and rohan um points to the tremendous value of being able to simply measure the same thing across species to simply measure in a high throughput and systematic way transcriptomic classes cell types in the human brain and i also i'm going to say something that many people have said this morning already which is this implicit intuition that we all share that something about evolutionary processes will lead us to expect that species that are more similar to our own will share more features in common in terms of their cell types and their gene expression programs but this is again for the first time we're able to actually make these apples to apples comparisons by having this kind of three high throughput measurement i'm going to be focusing primarily on transcriptomic types for just this short talk but but so much of what you heard about today is really opening that horizon to do comparisons across multiple modalities and interesting ways um and i um also in the spirit of the the summit the symposium uh theme we are collecting these data primarily in uh marmosets these are a small non-human primate and this is part of the brain initiative cell census network which itself is a huge and open open data experiment in which um a consortium that many in allen are a part of centrally tom is a part of it basilic is a part of it all contributing different data types modalities species developmental stages towards making data available access available available for the community to access almost in real time and so we are really happy to be a part of this consortium of course the allen has played a major role in convening it and hurting us around as we try to make our data available and just another plug for for the the open data um consortium that sri droid is plugged which is our neuronex where we're trying to really link across species brain regions and modalities insights that we can garner from this high throughput and systematic measurement to brain function in the course of our neuronex that would be working memory processes in the cortex so watch that space also as those data become available um okay so what i want to share with you today is really a comparative study of brain cell types and you've heard again many examples of this this morning in my case i acquired a single nucleus rna sequencing profiles from multiple species from humans from macaques from marmosets as i mentioned also from mice and ferrets and we're really asking the question of how do brain cell types evolve or in what ways can we start to get traction on that question we collected many brain regions but for the purposes of the story i'll just focus on neocortex striatum and hippocampus and although we're again in an unbiased way acquiring all cell types i'm just going to distill what we've learned so far from interneurons of this larger data set which again is available so what do we know about interneurons already i mean we know a tremendous amount about them already more morphologically functionally physiologically we know how exquisitely diverse they are and yet what has been a puzzle um from the single cell genomics landscape of available studies is that we tend to emphasize their conservation rather across species um across species but also across neocortical regions and at least that's something that we've observed in our own data in mice and has been observed by others measuring and sampling interneurons across neocortical areas in in mice so we think of them as quite literally a conservative place to start to look at the evolution of a given neuronal type and i should say the other reason why we're focusing on interneurons is more prosaic but it just happened to coincide with gord fischel who's a sort of an expert in interneurons relocating his lab up to um the broad institute in harvard where he convinced us to dwell a little bit longer on intern interneurons than we might otherwise have thought to do but whatever your cell type if you take a step back and ask how my brain cell types evolve we thought of a sort of naive list of a few possibilities and then nature actually gave us a fourth win so we thought of three of these and nature gave us a fourth you could change the abundance of the conserved type and you heard some examples of that this morning you could change the molecular details of a conserved type again you've seen some examples here's a cool one that we didn't really think about prior to this but you can think about reallocating or moving a conserved cell type around to different locations or structures across species and that's a really interesting possibility and then the most dramatic example of course is inventing a novel type so in the interest of time i'm just going to be able to focus on two of these um themes but again we saw examples robust examples of all four of these processes okay so um let's just focus on this question of how how often do you see changes in genetic programs of conserved types focusing just on neocortical interneurons one thing that we observed and others have observed as well is that there are basically four major categories and what is uh convenient about these across species is that uh the same four marker genes that we know and love well provo human semistatin they tend to be able to mark the same populations across these species and so that means that we can then examine the expression of other genes within these four canonical subclasses if you will and examine how much we see variation of other genes that are expressed within these core classes so here's an example here's an illustration of what i what we've seen here's a gene netran g1 and tng1 which in mice is enriched in paravellbumin interneurons but in the primates comes to be more uh quite a good marker for neuroglia form primarily nuclear form type lamp five positive interneuron so this is an example of a categorical shift of a gene that we know well across conserved types you can also examine these kinds of properties as relationships between species and so for example in irregular in one nrg1 the expression of that gene within the canonical subclasses of interneurons in the cortex of humans is nicely predicted by the primates you can see those two graphs on the right but not so well within the non-primates and so this kind of distills this point that with these single examples in regulin1 and tng1 genes that we think about is essential for normal neuron function or homeostasis or excitatory inhibitory balance they can yet exhibit some surprising flexibility in terms of their quantitative expression levels or categorical expression levels across species so we can look at this one example of a gene at a time but we can also look at this across the entire genome and one way we thought to focus on this is to think of a really strong test of how well quantitative expression levels of genes are conserved and we thought to see that whether genes that are intolerant of loss of function in humans might be highly conserved or not across species in terms of their expression levels within these again conserved interneuron classes because you might think that if a gene is dosage sensitive in humans then it would also be highly constrained in its expression levels within the same cell types in the mouse as well and yet actually what we see is that a height there's a high degree of conservation amongst the primates so humans to macaques or humans to marmosets marmosets to mechanics but not so well um in terms of the the conservation from mice to primates and so here's the key point we see that evolution does constrain the quantitative expression levels of loss of function intolerant genes but it's particularly these shorter evolutionary distances amongst primates and i think that's really important for these really hard problems of trying to select an appropriate animal model say for a disease risk gene okay so i want to show one more example of this uh this theme of changing a genetic program of a conserved type and it's sort of reflects some of the the principles that sri joy was talking about in his last talk which is when i told you a few slides ago that um we had expected to find a basic invariance of uh interneuron types by cortical area and that's some something that we'd certainly seen in mouse also something beautifully shown by basilicas paper a few years ago where she sampled two cortical regions and counted the number of differentially expressed genes in glutamatergic types and saw how much of a burden she had of differentially expressed genes and glutamatergic types but not so much so for matched gabaergic types of cortical location and we had sampled many cortical regions in the marmoset so the first thing that we could conclude is that there seems to be higher degree of local customization of interneuron types in the primate cortex we take any two of these cortical regions you find large numbers of differentially expressed genes so maybe primates locally customize interneurons to a higher degree but it actually turned out to be more interesting than this because we'd collected so many regions we could simply ask if a gene varies from the back of the brain to the front across this anterior posterior dimension how is it expressed in the region's interposed and it turns out there are many genes that show what seems to be a spatial logic that is they have graded expression by virtue of where their cortical location is along this major anterior posterior axis so here are a few examples where the gene is highly expressed in prefrontal cortex and not so much in v1 but then there's a sort of graded expression in the areas interposed we've validated a few of these with single molecule fish showing for example that ass1 which is highly expressed in the frontal pole and not so much in particle human interneurons in the visual cortex has a graded expression pattern across the dorsal extent of the anterior posterior axis and so i think these are really interesting questions again to to start to ask from a developmental perspective do interneurons encode local details about their cortical location in the same way that excitatory cells do when might this arise or are they responding to different cues and why do um mice apparently lack this or is it a much more modest phenotype in a mouse that we haven't appreciated yet okay so i just want to turn to the uh the other theme that i wanted to highlight for this short talk which is this possibility of a novel type and this is something that we um that we sort of counter intuitively turn to the stratum to to look at the stratum as you know is a part of the brain cell nuclei which are deeply conserved calls a bit called the basal ganglia and it's been emphasized in in other studies that there seems to be a conserved repertoire of basic cell types uh in mammals all the way back to primitive forms of fish and so you might expect if anything to find less innovation in a structure like the straight and relative to the more recently evolved as so at least as we think um structure like the neocortex in mammals and so it was sort of a surprise to us that we could find first all of the major interneuron um sub types or subtypes of the striatum in both mouse and marmoset um but then we also observed another type in the marmoset striatum we see that it prominently expresses tac-3 a neuropeptide and so we tend to call them tac-3 striatal type but we couldn't find this type a cognate of this type anywhere else in the mouse brain that we've sampled so far or nor in other regions of the marmoset brain it expresses a novel combination of transcription factors neuropeptides that we don't see in similar combinations elsewhere and what is interesting about it is that it's sort of transcriptomically situated between known and conserved classes the th and provo human types on the one hand or the somatostatin subtypes of the striatum on the other hand and so we can infer that it probably also originates from the mge the medial ganglionic eminence but this is something that we still need to to confirm and then finally what we had to do next was sort of convince ourselves that this isn't just some odd property of the marmoset striatum the marmosets don't just gain this selectively or conversely that laboratory mice don't lose this population and so by sampling the ferret striatum and the human striatum we could then confirm that this seems to be a gain in the lineage leading to modern primates and also that this type makes up some thirty percent of striatal interneurons in primates okay so in this short um summary i just want to share or review that we found examples of changes in abundances of conserved types in molecular composition i showed you a few examples of that in allocation across space or cortical uh or cortical and structures um and then finally the the gain or loss of the of the novel type and i just want to end by saying uh one of the things that is so valuable about the allen institute resources have has been mentioned of course many times already that it's an open resource that we can all use and learn from one practical way that i use this is a sort of method that other people have borrowed from extensively but you download their data and compare it to yours right so there are many algorithms for doing this i downloaded for instance rebecca and srigben edleen's human mtg data integrated with my own marmoset neocortical interneurons and try to draw some conclusions about how well these cell types match across species of course we have our own human data and our other own other species but this allows us to really make these inferences also across across labs and across technologies and so forth and one of the features and benefits of doing this that many of the talks before me have foreshadowed is the possibility that over time as we start to sample other features other modalities in these cell cell types and and other species and as alan makes this available that we can start to make these predictions or transfer features or modalities to data sets for which the feature hasn't been measured and so that might be morphology or electrophysiology i think that's a very valuable approach but the other one that i wanted to highlight that i find particularly valuable about the allen is the way that they are investing in some serious consideration about nomenclature and transfer of the idea of how to define and call and label a cell type and what do we mean when we call a type a certain thing and how well can we transfer those labels across species so i think that's also a very active space that the allen is invested invested in that we all benefit from with this open conversation of data access and accessibility okay so i'll just wrap up and say um we've learned a lot about how brain cell types evolved in this little study um but there's so many other questions we might ask how many other innovations might we find in non-interneurons and a lot of the allen work now is is exploring that space across wider taxa and wider space of the brain cell type taxonomy but i think we've already learned a couple of principles that i'll just close with and the first one is that i think we can expect that novel cell types are going to be exceptionally rare will probably be the exception rather than the rule the other is that our favorite markers in a given cell type in a given species won't necessarily transfer over flawlessly to another species or cell type and the final thing that we've sort of arrived at from this collection of studies that are emerging now is that um evolutionary proximity is tends to be a good predictor but by no means the only predictor of of cell type conservation or or innovation and i think as we sample more we'll expect to find more and more exceptions to that role in really interesting ways so with that i'll just thank you all so much as well as my collaborators and take any questions that you might have thanks great thanks war thank you fenna so much for presenting this was a very inspiring talk um so let's see if there are some questions trickling in there we go so the question is from youtube actually um dr crenin how will you predict the evolution of the brain in this new era of technology um where a lot of work is done by humans that that perhaps is is sort of you know how much of the you know large-scale data infrastructure do we need to collect and uh you know to be able to sort of really understand what we're the main drivers of evolutionary change so let me make sure i understand the question because i mean this is a provocative question many levels so on on one one interpretation of this question is um are we going to see evolution of the of the human race as technology advances and we can sort of tune off uh in terms of doing some of the some of the work that machines can can do and the other is how much of what we can do in terms of learning about fundamental principles of brain organization can be ultimately unsupervised if you will and not not sort of privy or susceptible to the whims of our own um our own biases our own understanding of how the brain works um so these are two very different questions i'm not quite sure which one uh referring to but i think they're really fascinating questions that we do have to grapple with i mean for one we tend to to view species as sort of static objects in this point in time that um that everything that is accumulation that is accumulated over the course of evolution sort of stops at the time of measurement but of course each species that we're measuring and sampling from is also still experiencing rap in some type some cases rapid evolutionary change and we can see that in terms of measuring acceleration and variation across genomes across individuals and in response to particular niches so i think we this is a these are active spaces and debates for exploration and something that we'll have to grapple with uh over time as we collect these and other data types and other species as well yeah that's great and i also had a question you know what are the species that you think would be worth adding in the future you know do you have any favorite uh sort of species that you would really like to know um you know how their interneuron diversity and you know i'm just curious curious on your thoughts what what are some of the main research directions that you'd like to see uh being pursued in the future yeah well i think that um that nick and rebecca and ed are betting on a strong horse by first exploring within a narrow range of the great ape clade how much evolutionary innovation do we actually see in a human brain relative to others of course um nick showed that beautiful image of just the the size and complexity of different brains along the the primate lineage and marmosets are these tiny listen cephalic guys at the the far end and then you have this massively convoluted human neocortex and the question is you know what explains that phenotypic variation and how much of their how much do we see core conserved principles um at play when we we look at the evolution of brain cell types within 45 million years versus seven million years um so but i actually think that we could have chosen any range of five to seven species in any any clade and we'd probably find something massively exciting and interesting in terms of variation in brain cell type evolution at shorter or longer distances in terms of out groups relative to us because we always tend to be a bit human-centric in our interest of how cell types evolve um i think it would be really wonderful to kind of think more closely about uh certain species or clades that have experienced particular uh evolutions in the in sensory modalities that they that they utilize or particular niches that they occupy so i'd love to know what's going on with mega bats and microbats i'd love to have an out group comparison where we're really taking on uh or marsupials where we're really taking on key branches in the evolution of mammals that may lead us to to find that we have gains and losses for instance in in many things that we sort of think of as core principles or features of a mammalian cortex remember mice and humans diverged when 90 95 million years ago that leaves hundreds of million years to be accounted for just within the mammalian radiation alone so i think it will be fascinating to sample a bit more broadly absolutely and do you think that any of those studies could also inform something as about disease susceptibility so there is a question you know whether mouse or marmosets might be more suitable as an experimental model for studies of alzheimer's disease that's right i think that and that question was asked in another way with uh with nick this morning about drug discovery so absolutely i think there are certain cases where when you do find an evolutionary divergence it makes a lot of sense to go to to go closer to the human if that's if that is the goal if the goal is to model a uniquely human disorder and i would argue that many psychiatric diseases probably fall in that category then we really want to convince ourselves that the basic uh the basic cell type arsenal and and expression programs are conserved across species as much as possible so i think there will be some key cases where key examples where when you see a failure of an animal model that we've relied on say a mouse model to model a particular disorder alzheimer's was raised alzheimer's of course is an interesting one because um just the the basic longevity of different mammals can vary so much that you may not be able to observe some of the degeneration phenotypes that we see in humans in shorter lived species um like mice or rodents and i think there there's also the question of um how much are we helping ourselves and how much are we harming ourselves by using massively inbred model organisms so for example a lot of our non-human primates of course are still quite outbred and have a lot of uh genetic diversity and that can be i think a boon ultimately if we're able to do the measurements in a systematic way um to convince ourselves that what what any uh manipulation or drug that we're studying on a complex genetic background is going to give us an answer that is more translatable to the human context fantastic thank you so much when i thank you once again for a very very inspiring presentation so with that we'd like to again thank you for joining for the first day of open neuroscience symposium um at the top of the hour so in three minutes here on pacific time there will be um a there'll be a tutorial on alan cell types database transcriptomics that will be led by rebecca hodge and please return and join us tomorrow for the second day of our three-day event tomorrow's topic will focus on online brain observatory and the recording of this program will be available on youtube as well as the recording of to do this tutorial for those of you who cannot attend all of it um so please follow the links and and um and yes thank you everyone for your attention and let me think again the speakers were they're fantastic um for their fantastic presentations you
Info
Channel: Allen Institute
Views: 26,256
Rating: 4.8657718 out of 5
Keywords:
Id: oWPktWADlr4
Channel Id: undefined
Length: 180min 19sec (10819 seconds)
Published: Mon Mar 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.