Is It Possible to Learn The Language of Planets?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

One of the most amazing things about the internet are the opportunities for people across the world with similar interests to meet...

This is why we made a Discord chatroom!

It’s about Discover Earth OC, Carl Sagan, David Attenborough, nature, space, psychedelics, psychonauts, meditation, pantheism, and Alan Watts. Hopefully see you in there!

Discover Earth Links: App | Blog | Newsletter | Podcast | About

New to Discover Earth? Start here!

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/AutoModerator πŸ“…οΈŽ︎ Jul 24 2021 πŸ—«︎ replies
Captions
five years ago i walked into the magic johnson theater in harlem and watched the science fiction film arrival i left the theater stunned by the unique portrayal of the aliens and their language and the idea of linguists using patents and information theory to decode complex data it gave me an idea we can't learn to speak an alien language yet because we don't have any data but could we learn to speak planet could the powerful and emerging tools of computational linguistics that we find in our phones and smart speakers make sense of the growing array of exoplanet data that is pouring in could we birth planetary linguistics thanks to support from our donors and the data science institute here at columbia university we were able to perform the first investigation of this frankly outrageous research idea it's the kind of thing that conventional funding bodies would never support but we're really excited that we get to tell you about some results from this effort today in particular i'm going to hand you over to dr emily sanford who led this research she did this at columbia but has now moved to cambridge university for her post-doc now this is a complex idea but dr sanford is a phenomenally gifted science communicator so stay with us and you will learn about something that i guarantee you have never seen before hi everyone i'm emily sanford i'm here to tell you about a new project done in collaboration with david kipping and our collaborator michael collins so as we've discussed on this channel before uh we are at a very interesting turning point in exoplanet science where we now know of enough individual exoplanets that we can begin to talk about them as a population so at the individual level we can ask questions like what is the mass of this particular planet but at the population level we can ask questions like how common are light planets versus heavy planets but more than that more than individual planets or even the population of exoplanets what we now know of are many many interesting examples of planetary systems so these are analogous to the solar system imagine we have the sun mercury venus all the way on out to neptune in their particular order on their particular orbits and when we look at a planetary system like the solar system we can ask new questions we can ask things like how many planets does this particular system have what is its multiplicity and when we look at the population of planetary systems we can ask questions like how common are two planet systems versus three planet systems versus eight planet systems like the solar system if you're interested in this you should check out our previous video about multiplicities the question at the heart of this new project is what information belongs to the arrangement of planets in their planetary systems often in science we're interested in breaking down whatever we're studying into its smallest constituent components what is a star like as an individual what is a planet like what is a rock like what is an atom like but equally often we're confronted with the truth that systems are more than the sum of their parts the classic example of this is the difference between diamond and graphite these are two substances that are both made of carbon atoms but they're very different substances they behave very differently there's nothing diamondy in a single carbon atom any more than there is graphiteness in a single carbon atom these properties belong to the arrangement of the components in the material rather than to the components themselves this is called emergent behavior what if planetary systems are the same way so as an example in our own solar system we have four inner planets mercury venus earth and mars that are small and rocky and then next out we have the gas giants beginning with jupiter and saturn it's thought that jupiter and saturn formed first before the inner planets and that after they formed there was a period of time where they sank inward towards the sun and as they sank in they ate up all the material in the protoplanetary disk outside of about earth's present-day orbit that left very little material behind to form the inner rocky planets and so in this way the large sizes of the gas giants jupiter and saturn are causally connected to the small sizes of the inner planets and by noting both of these facts you can start to infer things about the formation and the dynamical history of the solar system so note that all of that information belongs to the arrangement or the ordering or the configuration of the planets in the solar system not to any of the planets individually it's all in the relative sizes and locations of the gas giants versus the inner terrestrials in this way the solar system is like a diamond or a rod of graphite the whole is more than the sum of its parts however there is one important difference between diamond and graphite and planetary systems we're in the strange position of discovering the constituent parts of planetary systems piecemeal slowly one at a time and we know that our information about planetary systems is incomplete we know that there's planets that we have not yet discovered but still it's an interesting question to start asking how do we study the emergent relationships between the constituent planets in planetary systems so this is really not trivial how do you design an experiment that investigates that studies that tests the intangible relationships between the tangible components of the system that you're actually observing it's also hard because planetary systems are quite diverse we have systems like the solar system or kepler 90 that have eight planets each and then we have systems where you have a single hot jupiter orbiting a giant star they're qualitatively different it's apples to oranges so there are a couple of ways that various groups have devised the first is to take all of the bodies in the planetary system quantify all of the pairwise relationships so relationships between every possible pair of objects in the system and then study those this is easy enough to do for low multiplicity systems but it gets unwieldy pretty quickly for larger systems it's the same exact problem as how many handshakes does it take till everybody at the dinner party has shaken everybody else's hand so for an eight planet system like the solar system that's really nine bodies if you include the sun and it means you would need eight plus seven plus six plus five plus four plus three plus two plus one thirty six pairwise relationships to characterize just one system that's a lot another way you could think of and this was proposed in a really excellent paper last year by gregory gilbert and daniel fabriki would be to think only in terms of quantities that capture something essential about the entire planetary system in just one number one such quantity is the multiplicity if you have a two planet system then that single number multiplicity equals two describes the entire system and it's very easy to compare it to another system where for example multiplicity might equal five gilbert and fabriki define seven numbers like this including the multiplicity the characteristic spacing of planets in the system the monotonicity which describes how size ordered the planets are and in this way they represent each planetary system in all of its complexity as a single point in seven-dimensional space and then they can study the entire population of planetary systems as the distribution of points in that seven-dimensional space so that's the background that's what's been done before in this new project the one i'll be telling you about today with david and mike we take a brand new approach inspired by the study of linguistics the fundamental analogy we're drawing in this work is between a planetary system made of a star and its planets in their particular order and a sentence made of words in their particular order we're going to be using a brand new breakthrough in a long-standing linguistics problem called part of speech tagging so first i'll explain what part of speech tagging is then i'll explain this new linguistic breakthrough and then finally i'll explain how we apply it to planetary systems first what is part of speech tagging part of speech tagging is the problem of taking a sequence of words so here we have a random snippet of wikipedia as an example and labeling each word by its part of speech so nouns are people places and things verbs are action words etc by doing this we're essentially labeling each word by the grammatical role or function it plays in its sentence why is part of speech tagging useful why is it an interesting problem first it's closely related to the problem of text prediction or predicting the next word in a sequence based on what came before there are a lot of direct applications of word prediction things like autocomplete or text prediction on your smartphone as well as a lot of applications that are slightly farther afield like computer speech recognition how can part of speech tagging help with computer speech recognition the answer is that if the computer is able to use part of speech information to make an intelligent prediction for what should come next based on what it's parsed already then his prediction is a lot less vulnerable to noise or garbling as an example i don't know how many of you have ever had the pleasure of riding the new york city subway during rush hour but the system breaks down all the time and that means there are constantly service changes and interruptions that need to be communicated from the conductors to the passengers [Music] on the older subway cars the pa systems are a garbled staticky mess and it makes it very difficult to understand what the conductors are saying but if you can anticipate the form of the announcement then you're much less vulnerable to confusion so if you can kind of hear them say this train will be running express2 and then you know that the next word in the sentence the form of the next word the part of speech if you will will be the name of a subway station farther down the line than when you hear them say 5th street then you can make a pretty good guess that oh 125th street this is your brain using rudimentary part of speech tagging to avoid confusion in real time a second reason that part of speech tagging is interesting is that we can approach it in a way that gets it exactly the sort of intangible relationship between a word and its surrounding context that we're interested in when it comes to studying planets in the context of their planetary systems in fact some of the most successful approaches to part of speech tagging exploit exactly that information the information that belongs to the relationship between a word and its surrounding context the first of these approaches was proposed in a 1992 paper by peter f brown and colleagues at ibm and it's since become known as brown clustering brown clustering operates on the principle that similar words similar in the sense that they behave the same way in their sentences appear in similar contexts to take an example from the original paper let's consider the words wednesday and friday which behave very alike you will frequently encounter sentences like on wednesday we're going for a walk or last wednesday we went to the movies or let's schedule it for wednesday afternoon if you substitute in the word friday to any of those sentences nothing is amiss right they behave exactly the same way and exactly the original sense of the sentences is maintained of course there are exceptions where they behave differently you never hear for example thank god it's wednesday or wednesday the 13th so how do you translate that observation that similar words appear in similar contexts into actual groupings of similarly behaved words or part of speech categories brown and collaborators started with a huge quantity of english text as their training data 366 million words from a variety of different sources this text was made up of roughly 260 000 unique vocabulary words and if that sounds high for the number of unique words in the english language it is remember that their training text will be full of names and proper nouns and even typos and their goal is to assign these 260 000 vocabulary words into meaningful part of speech classes where each class contains words that behave similarly here's their big insight if you look at neighboring pairs of words from the sample text these are also called bi-grams then the appropriate class assignments or part of speech categories of the two neighboring words will be related to each other they will not be random so from the subway example earlier if our sample text includes sentences like the next stop on this train will be 125th street then that means our sample text will include bi-grams of the form 125th street or bedford avenue or astor place so in part of speech terms or class assignment terms all of those bi-grams have exactly the same form which is street name street synonym and of course our sample text also includes lots of diagrams like 2 125th 2 bedford to aster and all of those have the same form as well preposition street name so what this means is that if you were to encounter a sentence fragment something like express to blank avenue then you could use this part of speech information to infer that the form or the category of the word that's likely to fill in that blank is a street name even more abstractly if i gave you a sequence of class assignments preposition blank street synonym then you can probably infer that a street name is likely to go in that blank mathematically what this means is that the class assignments of neighboring words have a mutual information that is greater than zero and mutual information is exactly what it sounds like you have two measurements if you know the one how much can you infer about the other if i roll two dice and i tell you the outcome of the first roll that won't allow you to guess anything meaningful about the outcome of the second roll that means that there's zero mutual information between the two dice rolls so for brown and colleagues who were seeking the appropriate class assignments or part of speech categories for their 260 000 vocabulary words the name of the game was to assign the words to classes such that the mutual information between the class assignments of neighboring words was maximized that's not trivial to actually do you can partition these 260 000 words into an arbitrary number of classes and that means that the number of combinations you'd potentially have to test is essentially infinite so in order to make this computationally tractable they started by assigning each of the 260 000 words to its own unique class then they calculated the mutual information over these 260 000 classes it's not great because you're leaving a lot of information on the table by not grouping similar words together but it's not terrible because there actually are english diagrams where if you know the one you can be reasonably certain of the other so examples include mumbo jumbo or helter skelter but you can obviously do better what they do is start merging classes together one at a time to gradually improve the mutual information step by step at each step they calculate which merge will improve the mutual information the most then they do that merge and then they recalculate this is called a greedy algorithm because it does the best possible action at every step it's not guaranteed to converge to the globally best solution that is the best class assignments out of all of the near infinite possibilities nevertheless you can learn a ton from the class assignments that this approach produces i'm going to run through a few examples i will leave the interpretation to you but try to think about the sentence structures and the bi-grams that must have appeared in the training set to enable the identification of these clusters of similarly behaved words friday monday thursday wednesday tuesday saturday sunday weekends sundays saturdays feet miles pounds degrees inches barrels tons acres meters bites down backwards ashore sideways southward northward overboard aloft downwards adrift that the t-h-a v-at t-h-e-a-t that one's a bit of a cautionary tale before moving on i just want to explicitly point out one of the great advantages of this approach which is that it's unsupervised what i mean by that is that the algorithm was able to identify these classes these clusters of similarly behaved words all on its own purely by maximizing the mutual information it did not need to see a huge library of words that were painstakingly labeled with a correct part of speech by some poor grad student in the linguistics department that's a big advantage for us you know seeking to apply similar machinery to exoplanets we're interested in what types or classes of planets and planetary systems there might be but we don't have a library of millions of correctly labeled training planets we don't even know what the appropriate labels might be next up why is this technique from 1992 relevant to us now it's because of two recent papers one from david mcallister in 2018 and one from carl strattos in 2019 and together these papers revisit the calculation of the mutual information between a word and its context which if you recall brown and collaborators were not able to globally optimize in his paper david mcallister proposes a reframing of part of speech tagging let's imagine that instead of trying to partition a vocabulary into optimal parts of speech we're doing a kind of prediction problem let's say that we're shown a word with no context let's take the word raven for example we want to predict what part of speech this word belongs to in other words we want to make a statement like there's a 92 percent chance that this word raven is a noun there's a six percent chance that it's an adjective as in raven hair there's a two percent chance that it's a verb as in the present tense form of ravening meanwhile let's imagine that we're shown the context minus the word itself and we want to make a similar prediction let's imagine that we're shown the fragment a blank is a bird belonging to the corvette family we want to predict the part of speech of the missing word and based on our knowledge of english grammar we actually can we can say there's a 100 percent chance that that's a noun and a zero percent chance that it's an adjective and a zero percent chance that it's a verb what we have now with the way that david mcallister has set this up are two separate predictions for the class membership of this word based on two separate sources of information you have these two numerical predictions and you can very easily calculate the mutual information between them so that's david mcallister's work carl stratos then takes this setup and uses it empirically to demonstrate that you can get state-of-the-art performance on real part of speech problems so he trains two neural networks one of them sees the word itself the other sees the surrounding contexts both of them make predictions about the part of speech category that the word belongs to and they're rewarded for agreeing with each other and if you're interested in more about neural networks i did a video on those over at the core world's classroom channel recently the one major downside of this approach is that we have to decide in advance how many classes or part of speech categories that the networks are going to attempt to divide the data into of course you can always experiment with different numbers and then decide after the fact which classes seem most meaningful the big upside for us astronomers is that this setup is very generalizable instead of words we can feed in planets which brings me finally to this latest project with david and mike instead of working with a big corpus of english text we're working with the population of exoplanets discovered by the kepler space telescope as of when we started the project so this is 4286 planets organized into 3277 planetary systems and by the standards of computational linguistics this is a tiny tiny data set i mean even back in 1992 brown and collaborators were working with hundreds of millions of words to analyze these planets we're going to take a set up very similar to carl stratos's we're going to train two neural networks one of them will see a target planet and the other will see the surrounding context these will be called the target network and the context network we define the context to mean the host star of the planetary system and then the neighbor planets surrounding the target planet so two planets to the interior and two planets to the exterior if earth were our target planet then the target network would see earth and the context network would see the sun plus mercury and venus and mars and jupiter if there aren't enough neighbor planets that's okay the context network just gets fed in a placeholder blank instead so if venus were our target planet the context network would see the sun blank mercury earth mars when i say c here like the target network sees or the context network sees what exactly does that mean each object star or planet gets summarized as two numbers to feed into the neural network so each star gets encoded as its temperature and its surface gravity which describes how compact or how puffy the star is each planet gets encoded as its radius and its orbital period so for each example planet in our training set the target network receives as input two numbers the radius and the period of that planet itself the context network receives at most 10 numbers as input that's the temperature and the surface gravity of the host star plus the radii and the periods of the two inner neighbor planets and the two outer neighbor planets so that's the setup inspired by carl stratos's work what do we do with it we actually first consider a different problem than the part of speech tagging we ask can the context network which again does not see the radius and the period of the target planet nevertheless make a prediction for that radius and period based solely on the surrounding contextual information in the system if it can that would be very cool one application would be if we were to discover a planetary system with a suspicious gap between observed planets we might suspect that there's a planet yet to be discovered in that gap and if there's contextual information that can help us predict what it will be like then we'll be able to narrow our search for it more specifically what we want to know is can the context network make a prediction for the period and radius of the target planet that is better than random what is random mean for our random comparison let's just draw a whole bunch of training planets from our data set whose periods place them in between the inner neighbor and the outer neighbor of the target planet so that's our random sample and let's just take the average of all of the periods of those planets and the radii of those planets and that's going to be the really basic naive prediction for what our target planet's period and radius might be fingers crossed hopefully our context network can exploit contextual information to do a little bit better than that and it can the predictions of the trained context network have a mean absolute error that's about half that of the predictions of the really naive random sample and what that means is that the context network has learned something beyond just this target planet should fit in the gap between its neighbors a couple of interesting notes about this result first of all the higher the multiplicity of the system that is the more planets there are in it the better the context network's predictions are of both the periods and the radii of the planets in that system in other words the more contextual information there is to exploit the better the context network performs and that makes sense as a related note the context network's predictions for the one planet systems are generally quite poor they're not really any better than the random draws and that's interesting because it's not like there's no contextual information to exploit in the one planet systems there is it's just limited to the information about the host star so we can infer as a result that the host star information is less important than the neighbor planet information for making good predictions so there is one exception one subset of single planet systems for which the network is able to make very good radius predictions these are universally giant planets orbiting giant stars so what is going on with these why is the network able to do so well the answer is selection bias the only planets that kepler can reliably detect around big bright giant stars are big giant planets as a result the network has learned that if it sees a big bright giant star that is a star with low surface gravity then it must predict a big radius for the planet around it because big planets are associated with big stars in the data set and that's cool that the network has picked up on that association the next thing we do with our two network setup is something a lot closer to the part of speech tagging problem we started with what happens if we ask our networks to act exactly like carl stratos is part of speech tagging networks that is what if we one decided a number of classes into which we'd like to split our kepler planets two train our networks by asking them to loop through all the planets in our data set and evaluate the probability that each planet belongs to each class and then reward them when they agree and three look at the classes that result and see if the networks have picked up on anything interesting we have to be quite careful throughout this process because we have to tell the networks how many classes to divide the planets into but what we really want are classes that are physically meaningful and not just an artifact of our particular modeling choices what we do to minimize that possibility is to run the training many times over for different choices of the number of classes so for one training run we ask the networks to divide the planets into two categories for the next training run three and so on all the way up to 10. then we look at which groups of planets and which groups of planetary systems always get stuck together regardless of that choice these are our robust classes among the planets we end up with seven categories which we can see if we plot the planets in the radius versus period plane i'm going to give these categories names just to be able to talk about them but these names are purely descriptive i don't mean to associate these groupings with categories of planet identified by any other method moving approximately counterclockwise through the plane first we have hot subneptunes which have very short periods these are less than about 10 days then we have short periods of neptunes with periods between 10 and 25 days and we have short period neptunes with similar periods but a larger size intermediate period sub neptunes which have periods between 25 and 40 days long period sub neptunes with periods greater than 40 days then long period giant planets and finally hot jupiters these seven groups are analogous to parts of speech now do we really believe that these seven groups constitute the seven fundamental types of planets out there in the galaxy no this is a very small sample size of planets and it is subject to some of the vagaries of our modeling it is the case that the groups that are closer to the edges of this plane so the hot sub-neptunes the long-period sub-neptunes and the long-period jupiters these tend to be the most robust and the most stable to perturbations to our network training the boundaries between the groups in the middle of the plane tend to move around a bit more if we look instead at the stars in the surface gravity versus temperature plane then we identify four groups and if our groups of planets were analogous to parts of speech then our groups of stars or groups of planetary systems that they host will be analogous to types of sentences the first group are the systems orbiting giant stars which are the big bright puffy low surface gravity stars these systems preferentially consist of lonely giant planets which again is a consequence of selection bias it's very difficult to detect anything smaller than a giant planet orbiting a giant star the next two groups are the cool dwarf systems and the warm dwarf systems which are systems orbiting stars with high surface gravity and cool or warm temperature respectively these are preferentially low multiplicity systems one or two planets and the reasons for this will become clear in a moment but among them and overlapping with both the cool dwarf systems and the warm dwarf systems are this other group of stars and these are stubbornly classified together no matter how we mess with our modeling what systems are these it turns out that these are the hosts of compact multi-planet systems this is our fourth and final group and it's the only group where membership is determined not by any intrinsic characteristic of the host star but rather by the architecture of the surrounding planetary system these are high multiplicity systems made up of short period small planets we also end up with a ton of fascinating planetary systems that are just not easily lumped in with any of these four groups these are the leftovers we call them the indeterminate systems and almost all of the really interesting high multiplicity systems are in the indeterminate group this explains why only the low multiplicity systems are left over to be easily lumped in with the cool dwarf and the warm dwarf systems for example if the solar system were in our sample it would be indeterminate this is really clear evidence that we just don't have enough data yet don't know of enough planetary systems yet to resolve the indeterminate group into more meaningful classes of planetary system so there we have it a first exploratory step into the grammar of planetary systems we've done the best we can with what is still a very small data set and it will be fascinating to see where this goes as we discover more planetary systems which will be soon the test planets are still rolling in and we have discoveries from the gaia satellite to look forward to as well as the upcoming roman space telescope and plato mission in fact the number of known exoplanets seems to be increasing exponentially with time this has become known as mama jack's law so stay tuned i think this method shows a lot of promise for picking up on subtle patterns in planetary systems that's all for now ask your questions below and subscribe to see more updates from cool worlds you could stop making noise sorry you cannot be purring into the microphone coming back up yeah thank you so much for watching everybody this research was supported by many of you watching right now donors to the cool world's lab in fact i want to thank our latest donor that is andrew shawn thank you so much for your support if you too want to help us out then click the link up above where you get access to special perks as a donor to the cool world's lab so until the next video see you around the galaxy
Info
Channel: Cool Worlds
Views: 92,608
Rating: undefined out of 5
Keywords: Astronomy, Astrophysics, Exoplanets, Cool Worlds, Kipping
Id: QAcHikVKF5A
Channel Id: undefined
Length: 30min 36sec (1836 seconds)
Published: Fri Jul 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.