MATLAB for Analyzing and Visualizing Geospatial Data | Master Class with Loren Shure

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody um thank you so much for  joining me today and welcome today's masterclass   on matlab for analyzing and visualizing geoscience  data my name is lauren shore and i've been at   mathworks over 30 years some of you may know  that my background is actually in geophysics as   it is and today i'm going to try to share some  of my knowledge as i do in this whole sequence   of classes and in addition you might think  about looking for useful matlab tips on the   blog i maintain the art of matlab on the mathworks  website now let's go on to um the seminar content   okay so today what i want to do is i want to  show you how i can build an interactive map   containing um information about geospatial events  and then i want to get into the topic of how   do i handle large amounts of data because  let's be honest that happens to us in geo   geosciences a lot and then a little bit about  the community resources that are available   so the first demo i want to do is i'm trying to  research earthquakes from around the world and   these are going to be recent ones let's suppose  i want to find out around the world for the past   30 days how many earthquakes what the earthquakes  were that were magnitude 4.5 and above in other   words um ones that may be felt a little  more broadly than just locally potentially   and then i also want to think about where  those earthquake epicenters are in connection   to tectonic plate boundaries to see what's going  on so i'm going to come over to matlab now and   show you how i would explore this here's my  matlab desktop and i am going to take a minute   out and say something because one of the things  you see here is i am actually using an editor now   but i am not using code that is traditional matlab  code in other words code that ends in a file   name with dot m you'll see this this document  ends in dot mlx and i've made this my editor   to be the size of my um desktop my matlab  desktop here and you'll notice that there's   a live editor tab here and that's because  in this case i can come into this document   and not just put code and comments but i can  put formatted text and i can put all kinds of   fancier things controls and tasks uh in there and  i might even want to put some mathematics in there   now i don't have any mathematics specifically to  share with you today but i want to show you what   might happen so i again i can put in i can  put in a picture i can put in a hyperlink as   i've done here to show you where you can find out  information about the feeds that i'm getting from   the us geological survey and i can put equations  in and i can either use tech if i want but i'm   not so good at it it didn't exist when i was  finishing my phd just a few years ago of course   but i can use the equation editor and i might want  to put in anything any sort of mathematical symbol   from the keyboard but also the greek alphabet  and other kinds of symbols that we may know   whoops sorry about that let me come  back to here um uh i have compound   um mathematical symbols i might want to look at  and i might want to be able to put in a matrix   so in fact i'm going to put in a matrix here  um and the way i put things in is i can either   point and click i can say theta here um or i can  say x and if i know the latex which happens to be   up arrow there i can put it in and i can put  any one of these complex ones or compound ones   i'm going to take the nth root and we'll do the  third root of x cubed no let me go back let's do 3x to the 17th minus 3. okay and so i  can put in mathematics and the reason   that's so crucial is because this document i'm  creating is code but it's not just code it's a   tool that you can use for collaborating for um you  can for example share um some code with someone   else maybe with your matlab drive and go back  and forth and edit it you may want to actually   just show the output of all this suppose i know  i haven't showed you how to run anything here   yet but suppose i do run something when i get all  done i can choose to share my document my matlab   live script here either with someone else who has  matlab and just give them the live script or i can   come here and i can save it as i can export it  to pdf word html or latex and so it's a really   nice way to share the results that i have with  other people so that's one part of collaboration   there's other ways to collaborate too of course  and maybe i'll point them out as we get along but   if you can put the equations in people can then  check the code to make sure the equations match   the code you write and so on so but i don't  want to do that what i want to do here is i   want to come and i want to get the last 30 days of  data from the usgs that are magnet 4.5 and above   and so just in case i'm not online i just have  this extra in case i'm not online statement i   have some data saved but what you're going to  see is i'm going to take this section notice   this section had regular text it had titles  and subtitles and normal text and it also has   code right here in order to run this i can come to  the live editor and i can just say run the section   and now it is running the section and when i do  i'll show you in my workspace i now have some data   now what did i actually do well i used  a web read well if you haven't heard of   webread it might be interesting to find out  what that is so let's get the help on webread   and i'm opening up the matlab documentation  here let me make it a little bit bigger because   i know you're looking through perhaps a small  screen and if you have not used a lot of our   documentation recently i really urge you to look  at it because we've spent a lot of effort trying   to make it better and better and better and web  read says that we can read content from a restful   web service there's two or three things that i  love about the way our documentation pages are   first and these are not in order of anything just  there's a few of them um examples if there's an   example that read image from a website read you  know whatever any one of these i can just look   at that particular example and then i can maybe  copy and paste the code and put my own url in   and then i'm ready to go and um go on with  my project because i've been able to get data   another thing that's extremely  useful is looking at the see also   the c also will tell you many things  when it got introduced into matlab   other functions that you might find interesting  so as well as web read we have web save and if i   open web save what you'll see here is instead  of reading it it will allow me to save it on   my hard drive so that if i become disconnected  later i can still analyze that chunk of data   um and then you'll also notice there was a web  right and then some topics in the documentation   that may help me okay so i'm going to put that  aside right now so i just read in the data and   it's reading it from a json feed basically and  i'm going to take the information that's in the   data and i'm going to separate it into two  pieces i'm going to take the feature with   all its properties and put it into the information  about the earthquake and the location stuff which   is the geometry i'm going to put in a separate  variable the moment and so i've got some new   variables here and what i want to do is i want  to take the information that's in this structure   and remember everything that's in this structure  is really all going to be the same format so for   every quake it's going to have something about  the epicenter and something about where it was and   whether or not it introduced a tsunami and  things like that so it's going to be a very   regular format and because of that i can convert  this structure to a table and so that's what i'm   going to do tables are a relatively new construct  they're not that new anymore but the first column   if i take a look at quake table let me go  like this let's open it and quake table is   a table and you can see there's 451 events and  it says it's times 25 so i have 25 columns here   the date and time and of course i can make this  column whoops wider so you can see it so you can   see the date and time magnitude a place um time  zones if it's needed but since everything is in   i think you utc it doesn't matter whether or not  it was felt and a bunch of information about it   the magnitude type how it was derived and so on  okay so that's what a table is it's basically   an m by n array just like a regular matlab array  only notice that the columns need to be uniform   per column but each column doesn't have to be the  same as the next one so if i just come back here   really briefly you'll see some of these columns  like tsunami felt uh you know was either 0 or 1   but the network is in a string in quotes so i  can have strings and numbers together in the same   data structure if you will there so when i get the  time though out boy this time looks really strange   and that's because it's posix time and so what i'm  going to do is just convert the time from posix   time to time that we understand and i'm going  to show it to you in the table and now this is   this is time that we know how to read and um so  we can see the different dates and times here   and you can see the magnitude the place and so on  and so matlab will have all the columns and you'll   see that i can scroll to them in this document  and see what's going on now tables are super super   useful and so i can find out more information  by finding out what variables are in the table   and so we can look to see what they are and i can  actually take my equation coordinates from the   the original the other variable that i put them  in the location variable here and i'm going to   get from that um longitude and latitude and i'm  going to take the longitude latitude and depth   and place them also into my quake table now my  quake table here if we look it says it has 25   uh columns when i do this and i i'm adding three  it now has three more columns the final ones are if we look here the final ones are whoops my longitude latitude and depth okay so  let's just take a quick look at the um uh first   um uh first piece of the column one of  the columns just the magnitude notice i   can get to the magnitude i don't need to know  whether it's column number 5 or 17 or whatever   although if you know what the number is you  can use that instead and these magnitudes are   not an ordered list because i think the  list is actually sorted probably by time   but what i can do in this case is now i can  choose for my month to sort the table by magnitude   and matlab will keep everything else in order and  now you see the top five magnitudes go from 6.1   to 6.9 so a couple pretty big earthquakes this  past month not the biggest we've ever seen the   biggest one right now is the last one so  i'm just using regular matlab excuse me   regular matlab indexing into my table and i'm  saying tell me what's in the last row of the table   and then i'm saying you know what i want to know  what's in these four um columns but i don't know   what their numbers are necessarily so i want  the place the latitude longitude and depth   and when i run this what you see is we found  out that it happened on 18th of september 9 in   the evening central mid-atlantic ridge latitude  um uh zero degrees so pretty near the equator   um longitude a little bit west of um the  greenwich meridian and fairly shallow 10   10 kilometers i think it's in kilometers  okay and what i can do if i want is i can   shift the dates and subtract a calendar month and  make sure that we have a month's worth of data   and in fact it's going to trim a little bit of  data off i had a little bit more before and it   trims one data point off but that doesn't  matter for what we're doing and now i want   to find out what's the smallest and the biggest  magnitude earthquake because what i'd like to do   is i'd like to make a plot and i'd like to make a  plot that's meaningful so i'm going to add color   to help me understand which of the earthquakes  has a large magnitude and which which one might   be small so what i'm going to do first is i want  to show you the color map i'm going to use and i'm   going to come here and instead of using this blue  area and using my mouse to run it i could either   run a section here and i can also you'll see i  can run the section in advance and you'll see i   already added that to my quick access toolbar here  so if i choose to give more space to the editor   i still have access to run in advance right  now so i'm going to run that section in advance   and when i do it's advanced at the next one and  i want to show what the color map looks like   and here it is and you'll see it's going to go  from a dark blue up to yellow and we did it a few   years ago in 2014 changed the default color map um  because our original one way back when went from   red through the spectrum and back to red again  which meant like if you saw something that was   read in your plot was it small value or was it  a big value you didn't know we also know that   there's many people who have different kinds of  vision issues various kinds of color blindness   for example and so making something that you could  see um equivalently if we convert this immediately   to grayscale that it would look from like it was  going from dark to light in a gray scale as well   so um that's why we chose that new color map  and then i'm going to just take the names   from the title in the quake table and i'm going to  convert my um my quake table into points locations   that we can map so i've got these geo points and  now what i'm going to do is instead of plotting   it in a matlab regular matlab figure i'm going  to use one of our web a web map to plat plot   the earthquakes and what it's going to do is it's  going to plot this in the web browser and you can   see it's got a base map here and it's just taking  it a moment to get all the data in there and now   each one of these little markers is a location  of where an earthquake happened in the last month   i can click on them because i put that information  into my geo points into the data i'm plotting   and it tells me that this one's in tanzania  and um we actually centered this originally on   the one that was the biggest one um it doesn't  look centered anymore i already moved things but   here's our 6.9 magnitude earthquake  on the central mid-atlantic ridge   um it is slightly below the equator and a little  bit west of the greenwich meridian and you can see   um there's a bunch of fracture zones in here  and the mid ocean mid-atlantic rise as well   um and this is very handy if i want to  super pose many different kinds of data   so let me show you how this web map works first of  all i can choose any one of a number of underlying   web maps so i could put for example world  imagery or i can do a national geographic   map i'm going to come back to the ocean base map  because many of our earthquakes happen near ocean   basins and that's helpful you'll notice also that  my quakes are in there and if i wanted to i could   turn them off and on and then some other things  come and i could put world boundaries on i can   also just for grins put on the nexrad radar and  find out for the u.s and uh indeed i was told that   we have a storm coming here in the northeast uh  and uh later today so i can basically manipulate   the information on the map and you'll see i can  also pan and i can zoom in if i want apparently   there was another earthquake not as big as our  biggest 6.9 by the 6.2 um earlier in the month   as well so near in chile okay so that's our  web map and now um now i want to think about   where are these earthquakes compared to the um  the plate boundaries and so what i did was i found   um from a i'm in the wrong folder here that's  okay i found from one of these uh from looking   that there is a location at the university  of wisconsin it's listed here that has a um   a data set that includes the locations  latitude and longitude for the boundaries   um for the um tectonic plates so let me just  open this and when i do you'll see it has a   little notation up here that's for the beginning  of a boundary and then latitude longitude and it   goes on for quite a while and if i were to scroll  far enough i just saw one ah somewhere in here   um there's another plate boundary so i've got  these plates separated by a line that has some   numbers and i get all the different plates in  there well i would like to import that into matlab   and in order to do that i what i'd like to  do is point out that if i have my tool strip   available here and i'm on the home tool strip  you might notice that if i zoom in here there   is something on the home tool strip says import  data so if i don't know how to read in this file   or any other kind of file it's an opportunity for  matlab to perhaps help me and so let me show you   how i could do this i'm going to take that all  boundaries text file and matlab is going to do   the best it can to try to understand the data  and that thinks there's three columns because   of something else that's in there but i don't  want the third column so i'm going to actually   unselect the third column i'm going to select the  two though and i don't want to get it into table i   want column vectors and i'm going to ignore that  first line and i told you that um we're going to   be interrupted every now and then with um uh these  names for the different plates abbreviations for   the plates i'm going to rename these variables  uh latitude and longitude now whoops tood longitude okay and um i'm just going to go to  the end and notice that matlab gives me a choice   whether i want to um i'm here whether i want to  exclude any rows that that don't have the right   data and them well i don't because i kind of want  those place markers so i know where one plate ends   and the next one begins i certainly don't want an  excluded column that's missing some data because   then i'm going to exclude all of my data in  this case or i could choose to replace any   missing data with um that i can't import into  numbers because that's what i'm choosing here   into not a number when i get all done i can come  here and i can say good matlab um and you'll see   it said that it just loaded in a bunch of  data that's great but if i'm thinking about   maybe i want to run this as a report this month  and next month and the month after and the month   after for our purposes within us a year or two  those plain boundaries are not really moving and   so i kind of don't want to go through this whole  selection process of trying to read it in through   here but what i can choose here with the drop down  for import selection is i can choose to generate a   script or a function or a live script i'm going  to choose just generate a function right now   and you'll see this is the matlab code it is  a little after 11 a.m here on october 7th so   it just generated this code it says it's import  file one you give it a file name and you might   give it optionally some lines to read and then  you can just read whatever you want here this   is the matlab code that's going to be gener that  got generated i'm going to ignore it for right   now because i already saved a version of this  and i saved it in a file called import plates   and so i'm going to import the plates and  you'll notice that i'm also going to import some   coastline data and then what i'm going to do is  i'm going to plot this using the mapping toolbox   also on a world map and this world map is going  to have um uh the coastlines on here you can   see the coastline of north america and south  america and africa and antarctica and so on   um and then i also have on top of there the um  the boundaries the latitude and longitude here   um of the uh the coasts um and so i'm planning  that's what is in blue so we can kind of see   here's the big north american plate and  this is the big african plate and they   look a little bit like the continents  themselves but of course they're not   exact because the boundaries of the coastlines is  not the same as the boundaries of the plates there   are places where they are coincident but it's just  not always true well if i want to i can find that   end of that first flight uh plate because remember  i put that not a number those nands in there   so i can find the first man and just plot on top  of this onto my plot in red the first plate which   in this case happened to be the african plate  okay so it's a really nice way for me to be able   to pull through the data and be able to focus  our attention on one region or another as we go   okay and i'm going to take the boundaries of our  latitude and longitude that we just read in um and   make that into a geopoint geo points or something  that i'm using when i use that web browser and so   now i'm going to draw on top of our web browser  um the um plate boundaries as well i bother with   the coastlines because we can kind of see them  fairly easily now one thing you'll notice here is   that there seem to be earthquakes mostly but not  only near plate boundaries this is not surprising   that's where a lot of earthquakes come i'm  going to use my scroll wheel and zoom out a   little bit right now and you will notice there  are some locations for example um up here in um   the middle of spain where there was something  that happened um and uh it wasn't necessarily   it was 4.5 i don't know what it was does it  tell me here it says it was an earthquake um   and we have things for example up in the himalayas  that are plate boundaries or near plate boundaries   but they're certainly not near um edges of  continents or anything like that and so you can go   along here and find out much more information by  zooming in getting the information or overlaying   extra information that you would like to see  on there and notice that when i did that not   only did i put the locations but i used that same  color scheme so that the yellow ones are higher   magnitude earthquakes than the orange ones that  are higher than the green ones and so on so we get   a clue about what's going on by by choosing um how  to portray the data on top of the um the map there   okay and i've been doing this in a live script  and if i wanted to like i said when i'm all done   i can come here i come to the live editor and  i can do my save as let's do an export to word   and sure i'll put it there we've got a live  document that should or not a live document   you will see i didn't take out my silly math from  in there i still get links that i can use and then   incorporates some of the plots it doesn't  know how to incorporate the plot here but if   i wanted to i could capture those separately  and put them in and i had captured them so   another day in earlier day which is  why there are versions of them in there   so um i get to then share that with  other people um however it makes sense so i hope you see that that with the um in matlab  i can read in my data i can do some analysis on   the data and i can make this very transparent  by by sharing the code with everyone sharing   the document without the code if i don't want to  and make it available and it becomes reproducible   because other people can run the same code i'm  going to come back to the slides for a minute here   and say what i did well you saw me i  loaded earthquake data from the web   and instead of having to download earth earthquake  by earthquake all the information for all the 450   or so earthquakes in the last month i was able to  use web read to just scrape that website and bring   the information into matlab for myself so that  then i could pull it apart from the structured   data that it was and i pulled it apart into a  table so that we could sort it and visualize   it and do other work with it i then loaded in  plate boundaries that we found from the web 2 and   i showed you how to generate the code that would  let us do that and i also loaded in coastline data   that we ship as part of a demo in matlab and then  we were able to superpose all of those together   on various kinds of maps and i was using  matlab and the mapping toolbox for that okay um and there's a lot of other relevant  toolboxes for the geosciences it really depends   on what you're doing if you're doing any sort of  mapping of course mapping toolbox might be helpful   if you're doing a lot of imagery you might  find that in addition to mapping toolbox   either possibly image acquisition but certainly  image processing toolbox might be helpful uh   statistics and machine learning toolbox might be  helpful more and more people are um trying out   machine learning techniques to analyze their  data these days that that would be a potential   avenue for you to go and optimization toolbox  a lot of math oriented toolboxes that you might   take advantage of and then there's other places  that you can go look for information so we keep   a page this is from you can see the um matlab  uh 2014 documentation so it was a long time ago   but if i come search here in matlab let me  go search in the documentation uh scientific whoops format come on and let me find my here we go um here's our page that has the different kinds  of scientific data sets and you'll see that   we have net cdf and the different kinds of hdf  files abandoned early files cdf files and so on   and there's just a lot of information that you  can get by looking through here and finding out   the different kinds of data sets in addition to  data sets that data kinds of data that we can   read automatically in matlab there's other people  who have written capabilities uh for reading in   other kinds of data sets for example the iris  group has written iris fetch and so if you um i   actually have a link for it i think in the next  demo i'm going to do um i might not but anyhow   iris fetch you can get noah has sources for data  so a lot of sources for data and then for reading   the data and so iris fetch is meant to be able to  read data that they archive on the iris website   earth cube is uh very interested in  making sure you can get your data into   everywhere including matlab same with ngdc and  earthscope and so on okay and now i want to come   to the next topic which is okay it's fine i can  use this to understand a small amount of data   but what if i need to analyze a data set that is  much larger and so um i want to think about that   and i have access to an historical seismic data  set and it's data that started in the early 1900s   and so some of the data is quite old and  it's messy and they didn't collect all the   information back then that we collect now um and  so there's issues with the actual data set there's   also issues potentially with the quality of the  instruments that collected the data back then   compared to data that gets collected now and so  what i'd like to do is see if there's any um real   value to this data set and uh and trying to see  sort of what's happened over time and it's really   quite interesting because there's um there's  a lot of great information you can get from it   so i'm gonna um get rid of this particular  demo i'm gonna come here on my um quick access   toolbar here you'll notice that i have some  favorites this is my maybe my favorite favorite   um that i call ccc and it's for clear clears  my workspace clc clears the command window   and close all fours closes any figure windows  that might be open i don't really haven't much   open because i was running in a live script where  the output mostly goes into the document itself   but i'm going to do that now and i show  you that there's nothing in my workspace   i'm going to close the variables editor because  i don't think we need it right now and i'm going   to come over here i'm going to click on the tab  here for this file and i'm going to right click   because i want to change to that folder so that  i can get access to the the data set that's there   now i also don't need my web map  anymore so i'm going to let it go   go away as well and i don't need that anymore from  the import tool okay and now i want to come over   to my big data set with the historical earthquakes  and i'm going to open this csv file that we got   with the historical data set and you'll see  here and i can zoom let me zoom in a bit   um that's a little bit more than i want to zoom  so i want you to be able to see more there's a   whole bunch of comments at the top um and  it says where to get the information from   and then eventually at line 57 we see labels for  some columns and then a whole bunch of data here   and you can see my scroll bar is near the  top still there's a fair amount of data here   and i would like to load that into matlab now what  this data consists of let me go back up to the   towards the top again and show you is it consists  of a lot of columns it's going to give me a date   and time all together i believe let's just see   yeah date and time together a latitude  of longitude so this is going to be   um and a depth where we have one so it's going  to give me the hypocenter for the earthquake   and it's going to give me sever semi-major and  minor axis axes in the strike maybe and it's   going to tell me what it thinks the quality of the  information of the semi-major and minor axes is um   or the depth in this case and the uncertainty  and the magnitude and the um quality of it   and uncertainty notice these qualities are c if i  scroll down and i'm going to go a little bit here   you'll notice that some of the qualities  are a and b as well so not just c and i might say you know what i don't  want to monkey around with any of the data   unless it's considered to be quality a so i'm  going to want to filter a lot of the information   out here but if we keep going you'll see that  there's a moment and that i might get and a   lot of other information here so i'd like to read  this into matlab but i don't need all the columns   and i don't need all the rows so and if i do read  them all in it may be so large that it would be   at best cumbersome and at worst it might not  even work okay so um what i'm going to do   instead of just opening that file is  i'm going to say make this a datastore   and what a datastore is is just it's a reference  to the file and i'm telling it something about   the file here i'm going to say use comma as a  delimited delimiter excuse me the number of uh   header lines is 56 because it wasn't until  line 57 where we got the headers for the uh   the catalog and if i run this here you'll see it  it gives me a warning because it doesn't like the   names of some of the things because some of those  columns had funny things in their first name like spaces or underscores or things like  that and so we just modified the names   to make them valid matlab names and it gives me  information about the different file the file   what folder it's in um whether or not i can  rename the variables and lots of other information   in here so if you find that it's not something  if it's misconstruing something that you thought   it should read a certain way you can before you go  and commit to anything you can change what it does   and here's what the preview shows me it shows  me um the first eight rows of the table and so   i get the first eight um uh dates and date times  for the different earthquakes i then also um can   see that it has latitude longitude and all those  different columns all the way out here and many   of these at this point are not filled in so we  said by default if you're going to read them in   and there's nothing there we put a nan in so  that we know that the data are missing there   now i don't want all that information as i  said so again i'm just showing you a preview   i would like to tell it what subset of information  i'd like to read and so i'd like to select the   variables and i'm going to get the column that's  date latitude longitude depth the quality for   the depth the magnitude the magnitude for  the depth the moment and the moment author   and i'm telling it for column one um what  format i wanted to understand that year and   date and everything and it's going to  be four digits of year a hyphen months   days hours minutes seconds decimal seconds  and capital d says to matlab make it a date   um then you'll see for columns five one two this  is an interesting thing um one two three four five   wait a second five seven and nine oh well  this is the whole table um five five is q1   uh q2 is seven and the last one is i say i want  the author to be a string that's fine but you'll   notice q1 and q2 i'm saying capital c well if you  happen to know formatting from reading files from   through um whether it's python or c plus plus or  java you're used to lowercase c for a character   that's not what this is this is categorical  because when i'm reading in these   quality measures i know i'm reading in a or b or  c and so i don't need to keep storing a or b or c   and imagine that it wasn't a string like that  and suppose it was excellent fair and good or   excellent fair and miserable or something like  that i don't need to store each of those strings   literally millions of times i basically can  string to can keep the categorization in mind   and basically it can without having to store the  full volume of every single incident of the string   i can get that same information in a categorical  and it means that reading in the data takes up a   lot less space when i do that so let me come here  and let's select our variables and then now let's   take a preview and the nice thing is look at this  the preview those c's still look like c's so we   know it's quality c it's still very readable so  when you get a categorical variable it knows about   the categories it represents and like i said if we  instead of having abc had excellent fair miserable   we would see miserable there right now for c okay  now i still may be in a position where um i can't   um load all the data in at once i don't know  because i haven't tried to read all the data   in yet i'm just previewing it here and it might be  that there's more than than i can comfortably chew   on on my local machine and so what i'm going to  do next is i'm going to say let's assume that's   date that data set is tall that's just pointing  to the ds right now now what tall allows me to do   is take my data store and say you know if it's  tall you don't have to read it on all at once   you're going to read it a chunk at a time and  what i want to get is i just want to get the   equations that have the first quality measure q  underscore 1 equal to a notice i don't have to   do a stir comp i can just say equal and then  the ones that are true there i just index in   and i get the earthquakes that are high quality  and matlab doesn't even run this whole section of   code between the tall and the gather until the it  hits the gather so it's not running a lot of code   but it has to do it twice first it needs to find  out which ones are acceptable which ones had the   value a which ones were true and then do the pass  through a second time and do the indexing for them   and so you can see it took us about 10 seconds to  do that operation but now i have here my eq qual   and if we look in my workspace i have 13 000  of them so what however many rows there were in   that other table i've got a lot less now okay and  i'm going to take my table which had as a first   column that date and time and i'm going to convert  it to a time table and the reason i'm doing this   is because um the time is important but it's  not measured data in the same way that the   epicenter is and the um quality of the the moment  and and all that sort of stuff so it's important   i need it um but it will allow me to do some other  things that i'll show you in a minute if i turn my   table into a time table there's certain things we  let you do with time that make it very convenient   to work with this way so now what i'm going to do  is i'm going to just take my earthquakes that have   high quality q1 and i'm going to plot them here  and you'll see i'm not plotting them on a map   so it doesn't look like kind of a map but you  can kind of see there's the probably the um the   beginning of the ring of fire on the west coast  and you can come down you can sort of see the   uh the trench off of the coast of south america  and you can see part of the australia plate here   um but this is not a map projection so it's  not really um the best way to show some of this   probably before i show it on the map though let's  um let's see what it looks like let me find out   what the minimum and maximum time our earliest  high quality earthquake was later than the first   earthquake in that table it's in 1914 and  the um most recent one was in december 2011.   okay and so now let's make a histogram and  we're just going to count them by years and here is the histogram i'm going to  minimize my tool strip so you can see   this more now so here i have a histogram by  year of the number of earthquakes per year now   if i were with you in the room i would ask people  were they worried what were they worried about   when they see a plot like this and for the  non-geoscientists they're often a little bit   alarmed at first that or some of them are that um  it seems to be that there's a growing number of   earthquakes over time and they're getting worse  and worse you know more and more over time well   that's not true what's true is that we're getting  better measurements over time and we have more   instruments over time so we're able to see more  and better and smaller earthquakes over time   you know very few early on and much more now now  i might want to see what's the median time between   quakes of high quality and so i can do  that and i can either do it in terms of   hours minutes seconds for the median the max and  the min or i can convert it basically into um uh   just kind of a big number of seconds for  everything well i think this format is a little   bit easier to understand my media and time between  high quality earthquakes is just under a day this number of days is um it's 5400 days  so that's about 15 years 14 or 15 years   and we sometimes have earthquakes as close as  just a couple seconds it would be interesting   to look at the ones that are very close  are they nearby one another or are they not   you know in other words are they part of a swarm  or perhaps aftershocks of something these are   the kind of questions you can begin to ask when  you have the information about the date in there   okay um now i'm going to check to see if  the data are sorted in here and if they're   regularly sampled well they are sorted by  date that's good but they're not regularly   sampled so i don't have one every 10 minutes or  every hour or whatever but we kind of knew that   and what i can do is i can just find the latest  ones by picking a time range from basically   the last 29 of the i'm going to get the last 30 of  these here by um or the last 29 of them by taking   the time range from the last one here the last  29th one to the end and here i get a subset and   here's the uh subset of them here and when i put  them in here you'll see that i get whoops excuse   me i'm sorry if i keep my mouse within within this  um you can see that i can scroll through the the   table and find out what's going on here and i  can see the most recent ones and i can also see   and i don't know why you'd ask this particular  question but you know which ones were within   uh a few minutes of 10 a.m and there were a few  maybe you want to find out what happened near your   house at that time could anyone feel it i don't  know okay um but let me come back and look at the   earthquakes of high quality versus time  again and now instead of looking at them   binned up i'm going to look at individual ones and  i'm looking at the magnitude here and what you see   is we have a lot more lower magnitude earthquakes  showing up now than previously and that's because   we have much better coverage in the world with  that with uh instrumentation and there and the   instruments that are out there are much more high  quality in terms of the measurements they take   um and it's not because i don't believe that  the statistics have generally changed that   much for them and then if we're looking we can  see well there were some big earthquakes in   2010 2011 2000 i don't know if that's six or so  um 1964. um maybe another big one so some big ones   back there and we can begin to delve into these  if we want um but before i do that i can say well   i can look by year well i can look by month as  well for example so we could say well is there any   seasonality to earthquakes and here is what i get  when i bin things by month irrespective of year   and location now this isn't really seasonality  because of course when it's spring in the   northern hemisphere it's autumn in the  southern hemisphere and vice versa but   what you can see is that february has the lowest  count well it also has the lowest number of days   and so there's nothing so compelling in here to  make me want to look at this any further i haven't   showed you though that while i'm in here with  any of these you'll notice that i can start doing   things like if i want to i can zoom in um for  example i can zoom in on a couple decades in here   and when it zooms in it offers to show me the  code so if i always want to do that i can put   the date and time right in there and that will  call it like that the next time if i want it   so matlab will let you explore and then it will  actually help you create the code that you need   from it in much the same way that the import tool  allowed me to create code from that import data it   made a script so that i could import that data or  data like it again without having to revisit the   import tool physically itself now when i was  creating this demo which was a few years ago   my my manager happened to walk into my office  and um he and i share um a love of japan   and so he said before he he had only seen up to  literally this last this last plot by month and   he said well what happens in japan and i said mom  i'm glad you asked so what happened is i then took   a subset of the data i'm going to set some limits  for a box a latitude longitude box around japan   and i found a study where they did some  analysis of earthquakes in that region   and what i'm going to do is i'm going to pull  out those earthquakes that lie between the   two latitudes and the two longitudes  and i'm gonna those are gonna be my   earthquakes for japan and i'm just gonna convert  the depth so that it's negative values now and i can find the largest earthquake here by  just taking the largest magnitude and when i   do because i'm indexing using smooth parens i  had a table before i get a table now but it's   a table with just one row but that's okay it's  still a table and we find out it's march 11 2011   and it's latitude and longitude and so  on and its magnitude was a magnitude nine   and um indeed if we plot it um i'm going  to plot the earthquakes in the geo bubble   onto a map in our latitude longitude area and  you can see we have the japan regions and the   biggest earthquake is in here and you can't tell  because it's hidden right there but it's it's in   uh it's underneath this mass of earthquakes this  bubble of earthquakes that are there right now   so that's one way i can visualize  the earthquakes near japan   um uh but then i can actually put them on a map  like where i was using from mapping toolbox before   i'm going to load in some topography  basically an image of the topography of japan   and i'm going to create a world map but just for  the latitude and longitudes that we wanted and   i'm going to overlay a color map and i'm going  to basically plot the earthquakes from japan   on this and the earthquakes are being plotted  in yellow and the um color bar here shows you   elevation above sea level and then depth below sea  level and you can see um that these are just the   epicenters showing the epicenters right now and  you can see we have a large number of larger dots   kind of in an arc here we also have them on  the other side of japan but we have them in   the arc parallel to this trench the japan  trench here well that's kind of interesting and i can take my values here which are in  latitude and longitude and i can basically   convert them into local information um  for using the reference ellipsoid i can   get the um from the latitude longitude into  basically kilometers east north and the depth   um and so i can make a plot now that's more  localized and here i'm just showing a 2d   plot i'm showing north versus east and we can  see indeed that we have the earthquakes going   down um and that's kind of interesting here's  our biggest one it's going down and it's at a   depth it's not at a depth that's actually a  little bit above or or near the surface here   um now this is north diversity east excuse me this  isn't depth uh this is just our biggest epicenter   here um if i want to see the depth um what i can  do is i can change to a 3d view and when i do i   can see these same epicenters now the hypocenters  and if i wish i can come here and rotate this and   see it in any other view i want i don't want to  change that so in fact i'm going to run this again   just to get it back in the view we wanted because  what i want to do is i want to pop this out   and the reason i want to pop this out is i want to  share with you a diagram that someone else made in   this particular journal here from  their journal article they drew   the japan region um and they have a cross  section as a function of depth here going   down 200 kilometers and they show from the trench  the trench going down and then where the different   earthquakes are and so there's a whole bunch of  shallow earthquakes in the crushed crust excuse   me and then you see a bunch of earthquakes  that are on the top of the slab going down   and a bunch of other earthquakes going down and if  i take this and we compare it side by side you see   that the historical data that we looked at very  much mirrors this same um this same behavior here   and so i've kind of verified that with the  historical data that we have from this particular   collection is rich enough that it actually can  support the same assist even though the data is   a different data set than was used by these people  i'm going to come back to my powerpoint now and   so what i just showed you now is i loaded a subset  of data from a very large comma separated value   file remember the data were messy both because  there was stuff missing and because um there was   some data that was deemed relatively lower quality  and i was able to show you how to read in just the   portion we wanted the high quality stuff and just  the columns we wanted instead of reading all 28   columns we read about six or eight columns in i  think and from those data we were able to plot   the epicenters and do some exploration by looking  at histograms by year and by month and we were   able to look at the data very near japan and  finally we were able to validate this data   compared to data that had been published in the  research article listed there and i was using   matlab and the mapping toolbox although remember i  was not using the web map this time to help me out i want to talk a little bit more about the big  data because what i did there might have felt   a little bit mysterious so if we think about  big data one of the things that we've tried to   do is to make accessing big data for you really  very much like accessing the data that you would   otherwise access with a small number of changes  but basically writing the code you want to write   so you'll see here i have some data here that says  read table with a file and i'm converting it to   a time table and doing some stuff computing some  means and standard deviations if we look at this   and i have a whole bunch of files maybe i don't  have pump data but i have pump data 1 2 and so on   what i'm going to do is i'm going to make  a data store out of my collection of files   i'm going to say make that a tall data set so  that it will not be processed um one by it will be   processed one by one but i won't have to  manage that the data store will manage that   and when i'm all done collecting the information  i gather the information in fact the mean and   standard deviation back but in the meantime  the table the time to have all the code   outside these um highlighted blocks are  the exact same as the the data the code   that you wrote beforehand for your one  data file and so we've tried to make it   very very easy and efficient for you to go  from a situation where you're working with a   small data set and trying to understand and make  sure you're doing the correct analysis to going   okay i'm ready to do the full thing right now  and you don't have to much write the code and   rewrite the code you get to try it um very easily  by adding a small number of um commands here   so tall is relatively new it's a new data type for  data that doesn't necessarily fit into memory at   once and let me tell you how it works what we do  is we break the data into chunks what i didn't   tell you here is that i am actually running on  my matlab with a parallel pool this parallel pool   is using four workers because i have four cores  on my laptop here and so what it's going to do   is it's going to divide the data that was in our  data set up into four chunks and it's going to   work on each of the four chunks or it's going  to work on chunks four chunks at a time and   then it's going to march down the list of chunks  and gather whatever information we're trying to   gather there and then it will aggregate it when  it's all done the nice thing is that many other   things understand the tall data set data type  including a lot of the machine learning techniques   for example in statistics and machine learning  toolbox and a lot of other functions in matlab   furthermore if i have access through a parallel  server to a cluster or maybe the cloud the chunks   don't have to be living on my four cores here  if i connect in my session to the cluster or the   cloud it will do the same thing by taking chunks  in um not just my local machine but in the cla   cluster or the cloud and it will process things  in the same way that we're doing here so things   immediately scale up to a larger size um once  you're ready to tackle the large data set okay and i'm not going to really go through  this example right now i think i've talked   about tolerance enough but one thing that's nice  is you can also visualize tall arrays even if the   data's not all loaded at once it can load in some  and once it's loaded in as much as you want you   can come here and you can do things like you know  i want to see a little more detail and so you can   zoom in and if you're ready to you can then start  panning and the panning will potentially cause   more processing of the data getting more of the  data in as you go and there's more different   kinds of support for um tall visualizations  over time and you can look that up in the   um in the documentation in the release notes  um we also have some working with tall arrays   and customizing it if you can't use the tolerance  straight out of the box the way we have it   and you can also export some of these capabilities  very easily too i'm going to now finish a little   bit on community resources i hope you know that  there's a big active community on matlab central   um there's matlab answers where people ask and  answer questions all the time it's very active   the mean time to an answer is something slightly  over an hour i think that was the last time i   looked there's our file exchange where people  donate code and other people borrow take code   and use it we have a bunch of blogs including my  blog we have cody and we have thingspeak where   you can work with internet of things and devices  that are connected and explore the data online   there's also a global community in terms  of the geosciences and i'm just pointing   to a few of the things there's um gmt for  matlab users some people prefer to map   to block their maps with that rather than a  mapping toolbox uh there's gizmo for doing   seismic analysis there's iris fetch that i  mentioned and there's also um a circ research   page for educators for any of you who are teaching  geosciences there's a lot of good teaching modules   and um uh examples to show and problem sets for  people to um mull over there um so i hope you   enjoyed this today i hope you learned a lot um  i hope you learned that we have a lot of support   for your work in geosciences including from the  beginning to the end from getting your data in   doing all the math and visualization and mapping  sorts of things you need to do whether it's on a   large data set or not connecting you to the  community and so i would like to thank you   for watching um we're going to put links in the  description below on the youtube page that you   might find interesting i want to point out two in  particular one of these is the matlab expo 2020   is an uh it's on available on demand and it  features technical talks covering a broad range of   topics where math works products play a central  role and if you're interested in simulink you   might be interested to learn about the simulink  student challenge where if you put in the winning   selection you may win a thousand dollars and  i'm going to go over there's a question um juan asks if you have a time component that are  serially dependent is it possible to use tall   arrays i don't understand  the question um i'm sorry if you could take a moment and ask  it maybe a little bit differently   um i'd be happy to um try it out though let me know if there's any update on that in the meantime um again uh i want to remind  you finally to um subscribe to the mavericks   channel to get reminders for the next events i'm  doing another six of these in the next six weeks   and they'll remain online so you can watch them  again if you need to okay and so one of the   questions is can you import map  extents with good zoom resolution   we have some ways you can look in the  documentation we have some ways of um   importing some other kinds of overlays if you will  for the maps that are higher resolution and so you   can do that it's in the documentation how you  do it great question thank you for asking that and with that um i think  we're finished thanks so much you
Info
Channel: MATLAB
Views: 7,026
Rating: 5 out of 5
Keywords: work from home live, matlab, simulink, mathworks, matlab tutorial, using matlab remotely, working from home matlab, time series data in matlab, how to plot time series data in matlab, time-series, time-series data
Id: 0xC0ZO_zaYs
Channel Id: undefined
Length: 64min 18sec (3858 seconds)
Published: Wed Oct 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.