TEDxWaterloo - Miriah Meyer - Information Visualization for Scientific Discovery

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so one of the defining characteristics about this moment in our history is that we are generating enormous amounts of information about absolutely anything and everything that can be measured about our social networks our governments our bodies and our universe and from all this information we're really hoping to answer some of our most pressing questions answers that we typically tend to seek by reducing these massive complex datasets down to more manageable and smaller amounts of information but this reduction strips away much of the richness of that original data take for example the popular college rankings that are published each year in the US News & World Report now these rankings are based on loads of information that's collected from colleges and universities all across America information like test scores of incoming freshmen student to teacher ratios and even professors salaries now all this information is fed into a numerical model that then spits out a ranked ordered list of schools a couple of weeks ago I was reading a piece in The New Yorker by Malcolm Gladwell who's an author that was actually raised here in the water of the region and in his piece Gladwell was taking a critical look at this ranking system and particularly he pointed out to universities there were given nearly identical scores one of these is a large public university with a nationally ranked football team and situated in rural Pennsylvania while the other is a small private religious school in the heart of Manhattan with separate campuses for men and women now by most measures the educational experiences of students at these two universities couldn't be more different and yet this ranking system rated those two systems are nearly the same and the problem is really that the numerical model that created these rankings inherently takes a specific ideological standpoint about which aspects of that collected data is most important now in the sciences today access to cheap fast tech Knology has led to unprecedented amounts of information and absolutely breathtaking advances in knowledge in biology the transition to a data intensive science has been absolutely stunning just two weeks ago we celebrated the the 10 year anniversary of the sequencing of the human genome a feat that itself took 10 years and at a cost of 3 billion dollars but now for 5000 bucks you can sequence the genome in less than a day but biologists aren't just interested in DNA they're conducting experiments and collecting data about loads of other things like chemical reactions that occur in our cells which genes are turned on and off under different experimental conditions and massive amounts of information related to clinical and long-term outcomes in short they're creating large complex datasets now this flood of information has fundamentally tied biology to statistics and has given rise to the new field of computational biology experts working in those field apply sophisticated statistical methods to biological data in order to understand how our bodies work at a molecular level but just like that college ranking system these statistics are stripping away much of the richness of that original data a simple example of this is ants combs quartet which was created by the statistician F J ants come in the 1970s now these four sets of numbers have the exact same mean correlation and variance so according to these simple statistical measures they are equivalent and yet when we look at this data we see very different stories we see a weak correlation outliers and a nonlinear relationship we can see this information that was hidden in those statistical measures now ants come created this quartet to illustrate the importance of visualizing data before analyzing it now the power of visualization comes from harnessing our perceptual system in order to free up our cognition for higher-level tasks so for example looking at this string of letters if I were to ask you to count the number of times you see the letter V how hard do you have to work but if I instead show it to you like this the answer is immediately obvious our eyes are just drawn to those red letters but with advances in computing power we're no longer just limited to static images and creating visualization tools that allow people to explore their data can provide an immense amount of sense making now in my own research I develop interactive visualization systems that allow scientists to interact with their data and to make discoveries that might otherwise be hidden in the results of statistical methods I've found that by just providing interactivity I can increase my collaborators understanding by orders of magnitude I'm going to illustrate this with some visualizations from a group I work with at the Harvard Medical School who study fruit flies now these images are how this group was looking at their data when I first started working with them so on the left each of the data points plotted in that black square represents a cell in a fruit fly embryo and on the right we see information about which genes are turned on and off in each one of those cells now these two views are linked together with shape and color so for example the blue circles are linked to the column of data with the label B circle but what the scientists couldn't do from these static images is actually know which strip of data in that column correspondent to a specific blue circle on the left so the first prototype that I developed for them took the visual conventions that they were using and append I added the ability to select a single cell and see that cells gene information on the right using this prototype this group was able to explore their data on a cell by cell basis for the very first time and they made numerous insights into the computational methods that they were using now I've worked with this group for two years and the final tool I developed for them shown here is called multisim from that initial prototype I applied known visualization principles to the encoding and also created a general framework that allowed them to explore computational results along with the underlying raw data multisim is now one of the primary analysis tools used by this group as they start to untangle some of the mysteries around fruit flies mysteries that have implications for our understanding of human disease now one thing I want to stress is that I'm an engineer and I take a systematic approach to visualization design my designs are meant to present information in a clear accurate and intuitive way to enable rich and complex data analysis and as an engineer I rely on principles and rules to build things and it turns out we know a lot about how to design visual representations in a principled way a lot of the early visualization research focused on fundamental visual encoding channels so for example here I'm showing you the basic channels we have for encoding numbers now these early researchers also conducted controlled laboratory experiments to understand which type of channels are easiest for us to interpret so it turns out that color is hardest for us to interpret numbers while spatial and codings is easier I'm going to drive this point home starting here this visualization is called a heat map and it is perhaps one of the most widely used visualizations in biology today in this image quantitative values are encoded with color where green indicates low values and red high values now each one of these strips each one of these seven strips is encoding a value that changes over time so looking at this image can you tell which of these strips contain peaks or valleys or even which ones are changing over time in a similar way if instead we look at this data using a spatial encoding where now values are encoded as position along the vertical axis the nuanced characteristics of the data is much more clear translating changes in position is more natural than translating changes in color but visualization isn't just a set of techniques it's a process it's a process that has distinct stages that help guide the development of visualization tools and the process that I use emphasizes the need to work closely with my users to ensure that the designs I create for them are effective for helping them answer their specific questions now reality is always much much messier than theory and this is probably a more honest portrayal of how I actually work but in all of this meth there's one particularly critical step and that's translate this step is about translating the language of biology into the language of information visualization and it's absolutely critical to get this step right because no amount of brilliant design can overcome designing for the wrong thing for me this is the most challenging step and where I spend most of my time to get my translations correct I work hard to get into the heads of my collaborators and to really see the data as they do I spend most of my days in biology labs across Boston and I've even learned a few experimental techniques along the way and while I wouldn't trust myself to pipette anything of importance these experiences have really helped me to better understand that intuitions of my collaborators intuitions that I then feed back into my designs now what specifically do my collaborators get from these visualizations well they're able to find errors and noise in their data they get new ideas for more informative more informative statistical methods and they make discoveries that lead to new hypotheses and experiments I'm gonna briefly tell you about the experiences of a computational biologist I work with who's developing algorithms to compare the genomes of different species we work together to design a tool called ms bean to help him explore his results so what I'm showing you here is the very first dataset he loaded into MS Bea when he told me when he saw this was that he was really surprised and very disappointed now he was disappointed because he actually had no idea his algorithm was producing data with so much noise the data that was so cluttered and messy so he spent a couple of weeks tweaking parameters and his algorithm and he was able to get this far at this point he decided to take a completely different approach he developed a brand new algorithm and that algorithm gave him this data set just last year he published a paper on this algorithm and he's released a software into the scientific community I asked him how long would it have taken him to make this breakthrough using the methods to look at the data he was using prior to Miss B he told me he didn't think he would have even thought to try a new approach as he had no idea just how messy that original data was so currently he's using this algorithm along with Miss B to try to understand the genomic origins of species adaptation now from these collaborations I've come to learn that interdisciplinary research is absolutely critical for scientific discovery today I've also come to appreciate just how hard it is to do that it requires skills that aren't taught in scientific and engineering domains it requires things like empathy and curiosity trustworthiness and a willingness to learn about a new field now I absolutely love what I do because it lets me do a little bit of a lot of things visualization is a young and vibrant field that links together computing design science and humanism and it is vital to our understanding of biology and of science and of the whole world around us this is the future of discovery thank you you
Info
Channel: TEDx Talks
Views: 19,271
Rating: 4.931818 out of 5
Keywords: research, biology, mathematics, tedx talks, TEDx, science, Miriah Meyer, design, ted x, computation, ted, discovery, TEDxWaterloo, ted talks, ted talk, medicine, genomics, data, tedx talk, education, proteinomics, visualization, tedx
Id: Sua0xDCf8MA
Channel Id: undefined
Length: 12min 26sec (746 seconds)
Published: Thu Apr 07 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.