The Biggest Ideas in the Universe | 20. Entropy and Information

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

In the Q&A video, Sean argues for the PH on the basis that the BB argument shows that if ~PH, we would observe ourselves to be in a far more likely fluctuated world (e.g. we wouldn't see stars and galaxies outside our island), and since we do not observe this, ~PH is falsified. However Sean's argument neglects to consider that the correct conditional probability should include the condition that we would choose to ask ourselves about the likelihood of the PH in the first place. In other words, we should consider the reference class of BB observers that are fooled into believing that they aren't in a fluctuated world, and that reference class is far larger than the one including the same experiences consistent with the PH.

👍︎︎ 1 👤︎︎ u/ididnoteatyourcat 📅︎︎ Aug 09 2020 🗫︎ replies
Captions
hello everyone welcome to the biggest ideas in the universe i'm your host sean carroll and today we're finally getting to an idea which i think many of you have probably been hoping for for a long time certainly long time fans followers of stuff that i like to talk about will know that i was going to get here eventually we're talking about entropy today in fact entropy and information information turns out to be closely related to entropy but related to it in a kind of subtle and confusing way so it's worth talking about in its own right now entropy is something we've already mentioned we mentioned it in the episode on the nature of time because as we'll talk about today we'll fill in some gaps back in the time episode we talked about how the fact that entropy increases is responsible for the arrow of time for the fact that the past and the future are so different in our everyday lives so we'll be able to fill in some of the gaps here but we also talked about entropy just in the last video when we talked about probability and randomness because the way that entropy comes into the history of physics in the 19th century uh involves this appreciation that we don't know everything about the universe we have to talk about probabilities for things and people realize that was at the heart of how we should be talking about entropy so we even gave the formula for entropy in uh the in in the sort of boltzma one of one of boltzmann's various ways of uh writing down entropy this is the reason why entropy is hard uh not the only reason but a reason why entropy is hard is because there are many different definitions of it and none of them is wrong right all these definitions serve a purpose they apply to a different set of circumstances entropy was uh it's funny because entropy is plays a role in the second law of thermodynamics right which can be roughly stated that entropy increases but doesn't decrease in closed systems but the second law was invented by saji carnot before the word entropy was ever invented so he formulated without the word entropy the word entropy was suggested by clausius who reformulated the second law in terms of entropy but he didn't really have a definition for it or rather he had a definition for it in the macroscopic world of thermodynamics but the microscopic definition in terms of atoms and molecules wasn't there so finally the definition came along with boltzmann and others and it turns out there's more than one definition that's what makes it complicated anyway here's how we're going to think about entropy for today entropy is a way of characterizing our ignorance about the system that's what we talked about last time in the fact that probability comes into the clockwork newtonian laws of physics by saying well but we are not laplace's demon we don't know the positions and velocities of everything so there's some things we don't know how can we nevertheless do physics and make progress so entropy is a way of thinking about given how much we do know so we do know some things right we know the temperature the energy the total number of particles in a system or something like that some observable macroscopic features entropy tells us how much we don't know how much the macroscopic information leaves out from the ultimate microscopic specification of the state so it's a measure of our ignorance in some way and just saying that just saying those words you should already be suspicious okay because um entropy is supposed to play a role in the second law of thermodynamics um that's a very important law in physics people like arthur eddington famously said you know you can you could violate charge conservation or conservation of energy or whatever but if you violate the second law of dynamics then your theory is obviously wrong so it's a very beloved law but if in this way of stating what entropy is it makes it sound you know personal and subjective like how much i don't know compared to somebody else you know what about an alien or a robot could they no more would the entropy be different for them and this is part of one of the reasons um this is one of the reasons why the concept of entropy has been so contentious over the years uh we're not really going to get into that very deeply but i want to let you know that was one of the issues it turns out you know here's what it turns out i mean in the in the spirit of not uh keeping surprises to the end um yes there is this idea when we talk about entropy that there's something we know and something we don't but that's not arbitrary right it's not like well some people know something some people know other things there's no way of reconciling what's going on we have ways of observing systems we observe a cup of coffee we can see you know what its temperature is where the cream is where the coffee is stuff like that okay what we choose to be macroscopically observable is not arbitrary it is something that we have access to given the laws of physics there are no robots that look at cups of coffee and see all of the atoms there isn't enough information coming out in the photons from all the atoms there might be robots that see more than we do okay and that's fair and that's something to keep in mind but it's not completely up for grabs it's not completely random roughly speaking any alien or any human being or any other animal sees the same sort of size and shape and temperature for something and those determine how much we can include how much we can infer about the microscopic state and that's what the entropy is telling us okay so let's get into some more specific notions i'm going to contrast i think i'm going to define entropy four different times over the course of this video so first let's give two different definitions of entropy both of which live in the context of classical statistical mechanics okay so we're thinking we're in the 1870s now we we've become convinced that atoms are what matter is made out of and we're thinking about the statistics of these atoms okay so classical mechanics lives in phase space as we talked about last time so you have the set of positions x i and momenta pi okay and what one way of thinking about entropy and this is you know i know a lot about the history but not nearly enough the history is fascinating of this stuff i'm going to give you the definition that is engraved on ludwig boltzmann's tombstone in vienna okay but apparently number one um it wasn't he never wrote that down he never wrote down the definition of entropy that is engraved on his tombstone and number two it wasn't even originally part of the tombstone i i'm informed that it was sort of put on there later and uh max planck more than anyone else was responsible for that way of writing down the entropy but nevertheless we call it boltzmann's definition he certainly had ideas like this in mind and the idea is to coarse grain this is the important move we're going to make here and what that means is you say look there are many many particles in a macroscopic system i don't see every single particle position and velocity but there are some things i do see and so given what i do see maybe the size the temperature things like that the microscopically observable features given that there are probably many different microscopic arrangements of atoms and molecules that would look that way right and so i coarse grain by saying i'm going to include all of the different microscopic arrangements all the different points of phase space i'm going to call them a single macro state if they look observationally the same to our macroscopic point of view so basically coarse graining is saying uh if microstates so let me let me say it this way macro states are what we will call sets of microstates and a microstate is just a point in phase space because a point in phase space tells you exactly what's going on in the system so microstates are set of points in phase space that look the same i'll put look the same in quotes because then you can go home and argue with your loved ones about what that means to look the same it might be dependent on what you're able to look at but once you've decided what you can look at the coarse graining is well defined so the picture that we have is something like this let me draw it this way kind of looks like a bullseye so this is phase space okay the space of all possible microscopic states of the system microscopic in this sense by the way doesn't mean small it just means exact it just means complete it means i'm giving you all of the information if i give you the microstate of the system the position and velocity of every single particle in that system and so the phase space here is coarse grained into macro states so these regions of phase space are the macro states okay so i basically arrange them so some of them are small some of them are big et cetera it's a fascinating question to talk about what is the right way to core screen what is the right way to divide up into macro states what are the properties that you naturally get i think this is actually an area where there's a lot we don't know or a lot of work that has not yet been done a lot of a lot of things are assumed to be true even though we haven't really investigated them carefully yet let's let's put it that way okay so now we can pick out let's let's pick out this macro state right here okay the microstate is the whole region because the microstate is a set of microstates and then boltzmann says look that region in the entire phase space that region has a volume okay and basically he says the entropy what we'll call the boltzmann entropy even though he had other definitions s equals k log w where w equals the volume of that macro state that we're pointing at so in other words there's a volume in phase space there's an integral that we could do to figure out how many states are in that macro state and this is a very nice once once you define the macro states this is a nice definition because it's very it's very precise like once i pick out a point if i knew what the microstate was i could still assign an entropy to it under this definition right so this entropy is not subjective in the sense that once we agree on what the observables are and therefore what the macro states are everyone agrees on what the entropy is okay it's not directly about your knowledge again once you agree on what the macro states are this is something which for every microstate you can just say well what macro state is it in let me count the volume of that macro state that's w i take the logarithm of that multiplied by boltzmann's constant k equals this constant of nature called boltzmann's constant and then i get the entropy so the log we talked a little bit about logarithms last time remember the logarithm is an operation mathematically that undoes exponentiation so the log of e to the x equals x and e to the log x also equals x so e to the x is the inverse of the logarithm of x in in some way and we can plot that you should have done this last time sorry about that this is log x versus x so if you think about it when x equals one okay what is the logarithm of it going to be well uh actually this is that's not good yeah so what is the number that we that we have to raise e to think about this definition okay what is the number you have to raise e to to get one well e to the zero is one so any number to the zeroth power gives you one so the log of one better be zero and then the logarithm function goes like this it'll never hit zero itself but it sort of grows forever but it grows ever more slowly in fact famously in physics logarithms appear all the time in physics but a logarithm is a slowly growing function it's even slower than linear once you get to be an expert physicist if you if you listen to uh the podcast i did with scott aronson there's a lot of discussion about the growth rate of different functions because he's a computational complexity theorist and that's how you count how hard it is to solve a problem in a computation how many steps it takes and in particular how many steps the computation takes as a function of the size of the input does it go as the log of the size of the input does it go linearly quadratically exponentially and all these are caring about how fast the function grows asymptotically okay as you go towards very very large values so logarithms grow very slowly now why why do you want to count the entropy to be the logarithm of this short version is remember the idea of entropy was already there okay it preceded the idea of kinetic uh kinetic theory as we call it the idea that atoms are what make up gases and so forth or statistical mechanics as the modern version would call it and so we already had to map on to that pre-existing definition and the feature was that if you have two systems and bring them together the entropy of the total is supposed to be just the adding up the entropy of one plus the entropy of the other but if you have two systems made of atoms and you bring them together and let the atoms cross over into each other the phase space grows a lot because now there's a lot more room for all those atoms to move so it turns out that the phase space grows exponentially when you bring the two systems together and therefore if you take the log and define the entropy to be the logarithm of the volume rather than the volume itself you maintain that feature that the entropy of the combination is the sum of the two entropies and it also it has other nice features but that is historically why uh boltzmann invented it okay so this is a nice thing uh because like i said if you give me the microstate i can specify exactly what the entropy is it it's subjective to the extent that you need to pick out your coarse grain you need to pick out what counts as a macro state but once you do that you're done let me contrast that with a very different way of thinking about entropy not a very different way but a different way and this is usually called the gibbs uh entropy josiah willard gibbs as an american chemist and physicist and it's actually closer to the definition we used last time and it's very close to definitions that boltzmann and maxwell also use so all these people were talking to each other at the time but at the end of the day you need a label okay so when i call this the boltzmann entropy don't think that this is the only thing that boltzmann thought about i'm just labeling this particular definition s equals k low w because it's on his tombstone all right that's as good a reason as anything uh and here is the gibbs entropy which comes from you know philosophically a very different place the boltzmann entropy comes from coarse graining so even before you tell me what the system is you tell me what i can observe about the system and use that to chunk up phase base into macro states the gibbs entropy says forget about coarse graining it doesn't use coarse graining at all it just says there's something i know about the system and something i don't and what the way that that knowledge is conveyed is in the probability distribution so we had to do probability first in the last video so that we can now talk about the probability so let's imagine that we have a probability distribution on phase space and i'm going to simplify my life a little bit by just using x to denote the the point in phase space rather than just writing x and p every time because last time when we talked about the maxwell boltzmann distribution we cared about the difference between x and p now we just care about where we are in phase space so we're not even going to distinguish between x and p so i'm just going to use the single letter x to be a point in phase space so we have a probability distribution on face space and i'm going to call that p of x so x is now a point in face space that's just notation not not doing anything sneaky or or physical there just simplifying my uh what things are gonna look like on the page and then the entropy mapping on it's close but not exactly the same as what we talked about in the last video is minus the integral of p of x times the logarithm of p of x over all dx okay so this would be the gibbs definition of entropy although like i said boltzmann and maxwell and others also used it and uh there's probably a k here right from now on boltzmann's constants from now on i'm setting k equal one as i promised in the last video um but i think that there's usually going to be a k there so this is a particular thing this is a number the entropy that we don't attach to microstates okay so in the boltzmann way of doing things we could take a microstate ask what macrostate is it in and therefore say okay that's the entropy of that particular configuration this philosophy the philosophy of this approach is very much related to how much we claim to know about the system the probability distribution of being in different places in phase space so it's absolutely tied to how much we know if you know exactly the microstate okay if your probability distribution function is what we call a delta function it's one or it's actually infinitely big at one point in phase space and it's zero everywhere else then you can you can figure out that the entropy according to this formula is uh zero it's zero entropy state there's nothing you don't know this is telling you how much you don't know about the system so let's uh let's let's do an example let's show that uh in a simple one-dimensional example it becomes difficult to draw the pictures in more than one dimension so here's all of phase space which i'm going to write as x and here's a different probability distribution p of x okay so a high entropy probability distribution is one where you don't know that much which is a way of saying you give a lot of probability to different possible points in phase space so maybe something like this would be a high entropy and you can actually you know go home and calculate you know just take a constant for example plug it into that formula figure out what the entropy is and then you could contrast it with a formula that looked like this okay and there is a constraint the integral of p of x itself has to equal one remember that was one of kolmogorov's axioms that if you sum up all the probabilities for all the different possibilities you better get one so the two curves have that normalization you know their total area under them has to be equal to one but you what you're asking is are they concentrated in some region and very very small elsewhere or are they spread out all over the place the ones that are concentrated are low entropy so low entropy really means i'm pretty likely to have the system be in a certain region right you know sort of this region right here is probably where the system is if i'm in that low entropy state uh whereas high entropy says i don't know i have really no idea where the system is so you can see already that there becomes a relationship between entropy and information in this way of thinking about it the higher the entropy the less information you have about the system the low entropy system you know something about it and just to make you happy that we're not just completely making things up this is compatible with boltzmann compatible in the following sense uh if you know that your system is in some certain macro state so if you have phase space and you have coarse grained it and you know what macro state it's in but you don't know anything else okay then you can say well i give a uniform probability distribution to being everywhere in that macro state and zero everywhere else then the two formulas are going to coincide so if i have in this one dimensional example if i have let's say here are my coarse grained macro states okay unless i'm in this macro state right here then if i put a uniform probability distribution here's p of x and i normalize it so the integral is 1 etc and i calculate this then s equals log w is also going to equal log of p the integral of p log p dx there's a minus sign there okay so they're compatible in that sense this particular kind of uniform p of x uniform in one macro state uh gives you the same entropy either way but operationally they're different okay they play a slightly different role they evolve in different ways the boltzmann entropy and the gibbs enter being so let me give you an example of that so the second law of thermodynamics which is where we're trying to get to here second law of thermodynamics says uh that the entropy of a closed system either increases or remains the same over time for a closed system so um you've heard this before right i'm sure that we've heard that statement before now that we've given you two different definitions of entropy let's think about whether or not you would expect this sentence to be true uh well for the gibbs entropy you don't expect it to be true in fact it's false well it's not it's not false for the gibbs entropy you get a much much stronger constraint which is that s remains constant for closed systems because the entropy according to gibbs is literally a way of measuring how much you know about the microstate of the system but even though you're not laplace's demon you don't know the exact microstate you have some probability distribution it turns out that you can use the equations of motion to evolve the probability distribution at one time to a different probability distribution and if the system is truly closed then for the gibbs entropy you can just write d s dt the the change in the entropy with respect to time is just zero for a closed system so i mean that's a true statement but it's not the second law of thermodynamics uh so what is going on um it's because you're not sorry let me explain why this is true it's just a statement that you're not forgetting anything about the system right if entropy goes up according to the gibbs definition of entropy then what you mean is that you know less and less about the system but in principle if you knew what it was doing in the past and you know laws of physics you know the same amount now that you did then so it's not surprising that the entropy doesn't change in the gibbs formulation what you can do however is modify your situation a little bit remember in the last video we talked a little bit about being handed a box of gas by a mysterious stranger that was completely isolated and we said you know for many purposes it's actually easier more physically illuminating to imagine being handed a box of gas that is coupled to a heat bath okay so if you have a heat bath so if your box of gas is embedded in some world that has a temperature and can slowly very very gradually exchange energy between the inside of the box with the gas and the outside at some temperature t then you get a new rule with the gibbs entropy okay the gibbs entropy this this equation about how much you don't know if you think about it if you started by knowing a lot in a low entropy state inside the box okay but now it's being jostled by all the air molecules in the heat bath or by the outside world more generally even though molecules don't come and go information momentum and energy can come and go and you don't know what's going on there you can't keep track of all the exchange of information and heat and so forth with the heat bath and therefore entropy can go up sadly entropy can also go down in this situation if you think about it um you know this is sorry about the siren out there philosophically this is saying how much you don't know but you know something which even in this heat bath case you know something which is the temperature right and the temperature is relevant if you remember the formula for the maxwell boltzmann distribution the temperature came into the actual probability distribution that you reach in equilibrium and it's not surprising to think that when the temperature is higher there's more room for the molecules to be doing all sorts of things so given a box of gas with a fixed number of particles inside if you increase the temperature the entropy will go up if you decrease the temperature the entropy this gibbs entropy will go down the molecules will settle down into a smaller region of phase space as they get colder if you take a bottle of champagne that is warm and you put it in a refrigerator its entropy will go down okay so we are still looking for a correct way of stating the gibbs entropy and we're still in this video scientists know how to do this what is the correct way of stating the second law in this case the answer is that there is a law that says that the change of entropy is greater than or equal to the heat that is exchanged with the outside world so if the system and the heat bath it's embedded in are essentially at the same temperature then you will get the second law of thermodynamics then the entropy will just go up but you have to if you want real physically applicable laws let the entropy of systems go down when they're not closed systems okay this is this is not a rule that is supposed to be true for closed systems this is a rule that's supposed to be true if you're coupled to a heat path but philosophically you know morally it's still the second law at work uh it's just letting you letting you be in a more general situation where he can be exchanged so that's the the slightly subtle situation of the second law in the gibbs formulation um what i should have said i should have said this uh let me say it now if you contrast this gibbs formulation with the boltzmann formulation you know given that the second law is not quite as obvious in the gibbs form you might say well therefore i think the boltzmann form is better which is fine the problem is that if you actually want to study the evolution of a system over time that is not a closed system that is open most systems in the world do interact with the rest of the world after all it's much easier to solve the equations in the gibbs formulation event in the gibbs formulation of entropy and it is in the boltzmann formulation if you think about it this probability distribution can change gradually over time and the entropy can gradually increase whereas in the boltzmann formulation as you move from one macro state to the other the entropy suddenly jumps up or jumps down and so that's harder to keep track of numerically you don't know if you're in the boltzmann formulation if you're in some macro state are you near the boundary of that macro state you know when is the entropy going to go up it's just harder to deal with numerically so we actually end up using formulas like the gibbs formulation much more often in uh real uh fluid dynamics or statistical mechanics calculations but the boltzmann formulation gives you actually a much cleaner idea why there is something called the second law of thermodynamics and why you'd be tempted to write about the second law foreclosed systems for closed systems why well it's kind of obvious right okay why the second law the idea is if you start with this kind of coarse-grained structure in phase phase okay phase space and you start the system at a low entropy state and just let it go let it evolve however it wants to evolve uh it's going to wander off from a low entropy macrostate to a high entropy macrostate for the simple reason that there's a lot more volume in the high entropy macrostates there are a lot more ways for the system to be high entropy than to be low entropy in fact this figure does not come anywhere close to doing justice to the real world in the real world the high entropy macrostates are enormously larger than the low entropy ones so this looks like well yeah probably you moved a higher entropy but maybe not but in the real world the probability in the system with many many particles the probability for moving to higher entropy states is overwhelmingly large um if you start with if you start in a low entropy state now but nevertheless even though that sounds pretty straightforward and convincing and probably in the modern 21st century way of looking at things you're not bothered by that in the 19th century they were bothered by this they thought that the second law of thermodynamics was a law they many many people let's put it that way it was an independent law so the project of maxwell and boltzmann and gibbs and all those people was to reconcile the clockwork classical mechanical world of newton and laplace with this uh time-ordered probabilistic world of randomness and statistical mechanics and so what boltzmann argued was that we should think about the second law of thermodynamics as something that is almost always going to happen not even almost always in the mathematical true sense is overwhelmingly likely that entropy is going to increase in a closed system and you can quantify exactly how likely it is um but people didn't like that they're like no no it's a law it's a law like the conservation of energy is a law you don't say that energy is usually conserved you say it's conserved full stop okay so they were bothered by boltzmann's formulation of it and they tried to argue with him so let me talk about some of the arguments because they're actually you know people are still arguing we know the second law is true we know that atoms exist but exactly the right way to formulate these laws and to think about them is still a little bit contentious so let's mention a couple of the objections uh low schmidt low schmidt was a professor who's actually one of boltzmann's teachers so he was one of baltimore investors he was he was on boltzmann's side like they were friends they talked with each other but he didn't really like boltzmann's formulation the second law at first he came up with what is called the reversibility objection it's a pretty simple thing it's one of those things where you know right place at the right time you can get your name on something that is actually pretty simple and he said look if you think about this picture okay let's just draw redraw the picture in a slightly smaller way okay here's a smaller version of the picture the underlying theory the microscopic theory is newtonian classical mechanics and a feature of newtonian classical mechanics is time reversibility there is no arrow of time in newtonian mechanics for every trajectory going one way there is another trajectory going the other way in time all you have to do is take momentum and go to minus momentum typically okay so for every trajectory in phase space with ds dt greater than zero there's one you can reverse it that's why it's called the reversibility objection there's one with ds dt less than zero so in particular if you say well i start here and i go out in various ways depending on where i start from there's an equal number of trajectories like an exactly equal number that go from high entropy to low entropy so in the space of all trajectories in the space of all things the system could be doing there it is equally likely that you would find the system increasing in entropy or decreasing entropy and that's not wrong he didn't make a mistake low schmidt um and boltzmann you know argued against it and and it's funny so i did when i wrote a book about entropy in the arrow of time called from eternity to here you're welcome to buy the book and read many of these historical details um boltzmann responded in various ways and the funny thing was you know boltzmann was always convinced that he was right but the reason why he thought he was right changed with time even within a single paper he would offer different uh responses to this and so the but we now have settled on more or less what the right response is and the right response is there is a symmetry in newtonian mechanics between going one way in time and going another way in time that's time reversibility so the coarse graining does not break that symmetry nor does the dynamics we need something else to break the symmetry if you want to claim to have an explanation for the arrow of time boltzmann thought he was just deriving the fact that entropy had to go up and that's not true he did not derive that because low schmidt was right entropy could go down in the space of all trajectories what you need is a boundary condition you need an initial condition for the universe you need to break the symmetry the time reversal symmetry with a boundary condition and this is one of the ideas that boltzmann put forward he said you know he called it you know proposition a or something like that the idea that at some time in the past the universe started in a very low entropy state that's it it's just an extra assumption it's not derivable from anything we've said so far so david albert has labeled this the past hypothesis david albert was one of the guests on the mindscape podcast we talked about quantum mechanics but one of his most influential works was a book called time and chance where he really laid out the logic of this situation you need to be able to say that the universe began i'm putting began in quotes and i'll explain why in a low entropy state so by that simple stroke of the pen you simply eliminate all of these bad trajectories that are decreasing in entropy you just say that you start here and if you start there then indeed you do are you are very likely to increase in entropy okay why do you have to say begin in quotes because really what you need to make it work is that at some point in the history of the universe not too long ago the entropy was much lower than it is today and by not too long ago we mean much shorter than the equilibration time for the universe the time it would take to go from low entropy to high entropy and indeed this is true empirically uh when we look at the universe the past near the big bang 14 billion years ago had a very low entropy and so this is indeed true near the big bang and in fact you know one thing that you could wonder a good question for historians of science or for science fiction writers is why didn't boltzmann invent the big bang theory uh you know we're talking about circa you know he was confronted with the reversibility objections in the 1870s you know he poo-pooed them they moved on and then but he revisited the problem for various reasons in the 1890s it was really in the 1890s that he wrote his most in-depth explorations of this question and he never quite you know he did he did suggest that the universe had some kind of condition at early times you know but he never said well that would be the beginning of the universe and you have to imagine in the 1870s we didn't know about general relativity we didn't know about expanding space or the expansion of the universe or anything like that we didn't know about singularities so it would have been extremely bold for boltzmann to say well maybe there was a first moment in time in the history of the universe but it wouldn't have been crazy and it's not impossible to imagine he could have said it look you look on the sky and stars are shining okay boltzmann better than anyone else knows that that can't go on forever because the stars shining increases the entropy of the universe and they will eventually use up their fuel and stop shining so the idea that the universe we observe today could exist forever in that kind of configuration is not at all natural and so it would be very natural to say even before general relativity was invented you could say well i think the universe began a few billion years ago right and that would have been a bold move that boltzmann or someone else could have made but no one did even after einstein came along people like friedman really needed to convince him that there probably was a big bang at the beginning okay um let me also talk about the fact that the entropy of the early universe was low so um there are by the way the entropy of the early universe was low but there are people even today modern cosmologists who don't know that and it's uh it's a little bit weird and the reason they don't know it is because if you take the box of gas right that we were talking about before and you open it up and you look at the box of gas which is in thermal equilibrium so it's as high as entropy as it can be it looks like a black body it is giving off radiation according to the black body radiation curve that we talked about way long ago with planck and quantum mechanics and all that okay so the universe the early universe was a black body it gives off the cosmic microwave background radiation that we can observe in our telescopes today and a box of gas in its maximum entropy state looks like a black body it's the same distribution inside so therefore people say aha the early universe was high entropy just like a box of gas that is a black body that is thermally distributed but that's completely false and the reason why it's false is because inside the box of gas gravity is not important and i don't mean the gravity of the earth pulling the gas down i mean the mutual gravity of the particles inside the box right gravity is a very very weak force and the particles are going to bump into each other but their gravitational interactions are completely negligible in that regime when you can ignore the gravity between the particles it is certainly true that a smooth distribution of gas at more or less the same temperature is as high in entropy as you can get but that was not what the early universe was like the early universe was incredibly densely packed with stuff the gravitational pull between different parts of the universe was really really important and when you turn on gravity there's another way to get much much higher entropy than with a thermal distribution of gas namely just make a black hole okay so the entropy of the universe of let's say the entropy of gas in the universe so what i mean by this is we have an observable amount of universe that we can see today let's fix that volume of co-moving space so let that volume shrink into the past so we're keeping track of the same stuff as the universe expands or contracts so the same number of particles right i mean there's some particles that leave the volume but an equal number come in so roughly speaking an equal number of particles in the co-moving universe so if we draw it like this time going up the universe used to be smaller this is a co-moving patch with a certain number of particles and we actually know roughly speaking how many particles there are in our observable universe and we can calculate their entropy and the answer is around 10 to the 88th okay a big number but still it's not the biggest possible number uh the entropy of a single 10 to the sixth solar mass black hole so we talked in the q a video for gravity about black hole thermodynamics stephen hawking jacob beckenstein and others showed that black holes have entropy you can even calculate them we gave the formula for it proportional to the area therefore roughly proportional to the mass squared of the black hole uh the centers of big galaxies like the milky way have black holes that are millions of times the mass of the sun a single black hole that has the mass of a million times the mass of the sun has an entropy of around 10 to the 90th okay the black hole the center of our milky way galaxy is something like 4 million times the mass of the sun so it's easy to make a black hole with way more entropy than the observable universe had and in fact we think that the maximum entropy that you could make from all the stuff in the observable universe is around 10 to 123. so 10 to the 88th is nothing it's incredibly tiny compared to what the early universe could have had as an entropy which is very compatible with the past hypothesis that the universe began in a low entropy state and indeed the history of the universe since the big bang is one where the uh radiation cools but matter becomes lumpier right matter pulls itself together under the force of gravity once gravity is important a smooth distribution is not the highest entropy distribution footnote there eventually once the universe gets dilute enough it will go back to a point where the highest entropy state is smooth once again and again that's what we expect to have happen the black holes will eat up all the mass in the universe but then they will gradually evaporate and we'll be left with nothing but a thin group of particles and that's the true highest entropy state so once you have gravity in the game the behavior of entropy is weird when you're very densely packed and gravity is important high entropy means lumpy if gravity is not important high entropy means smooth if gravity is important but things have diluted away again then high entropy means smooth again so it's a little weird but that's okay we're uh we figured it out this is stuff we actually do understand what we don't understand is why the early universe had a low entropy state we have we have no agreed upon theory let's put it that way and i say that very very carefully because most working cosmologists don't even try they don't even try to come up with a theory it dynamically explains why the early universe had low entropy i've tried i wrote a paper in 2004 with jennifer chen where we proposed a complicated model with baby universes and quantum nucleation of space-time that would dynamically explain why the whole universe had a low entropy but there's not not that our model is that great but we have almost no competition people are just not trying and i think that more people should try i think i'm not saying that i mean our model has good aspects it might point the way toward the right model but i think there's probably still some work to be done before we understand that but anyway that's speculative that's not what we're here to talk about um at the biggest ideas series we're here to talk about the ideas that we think are right okay so that was low schmidt's reversibility objection then there is another very closely related objection called zermelo's objection now zermelo turns out he became a famous guy at the time circa you know 1895 he was i think a postdoc or a graduate student working with max plunk so he started out well he became very famous as a mathematician set theories or melo funkel set theory is something that you will read about if you become a pure mathematician so he was more a mathematician than a physicist and you know he read about boltzmann's work and so forth but he also read about uh this theorem by henry poncaret the french physicist and mathematician about recurrence in physical systems so ponca ray is interested in things like the solar system so imagine that you just have the solar system you don't don't let the planets interact with each other okay think of all the planets just circling around the sun and they all look the whole arrangement of planets looks different at every moment of time right but what poncare proved is that if you wait long enough any given configuration of planets around the sun will appear again you know the plants will all move at different rates it takes different amount of time for the plants to move around the sun but if you wait long enough any arrangement that they started in they'll get back to okay so that's called the poincare recurrence theorem and it led zermelo to come up with what is called the recurrence objection so zermelo being embedded in this german tradition at the time uh also felt that the second law of thermodynamics should be a law so he had a very simple uh objection so poncare i don't know which way the accent goes from punk ray's name uh says that a closed bounded so these mean two different things closed means no interactions from the outside world bounded mean there's only a finite number of things that the system can do so in the solar system the only finite number of places each planet can be classical system returns to any configuration so there might be configurations you never reach but if you reach a configuration at one time you will come back to it again and again infinitely often if you live in eternal universe if you wait it might take a very long time in fact typically it does the poincare recurrence time is very long but it will eventually happen so zermelo says um z says a function he's a mathematician remember he's thinking of entropy as a function of time can't both be recurrent coming back to the same value that it had in the past so what he's saying is if the universe is a closed system and has a certain entropy its micro state should come back to the same configuration again and again and again so the entropy can maybe goes up but it's got to go down again right to get back to where it started so that's recurrent and it can't both be recurrent and monotonic monotonic means changing in the same direction for all time but the second law dsdt greater than or equal to zero says that entropy should be monotonic ponca ray's theorem says that entropy should be recurrent but that you can't have both so zermelo's conclusion from this was that the the whole idea of boltzmann and maxwell that gases can be thought of as microscopic systems of particles obeying newton's laws is just wrong right he thought that the second law was a law that he he didn't think that physical systems really did recur he thought that they just had entropy that increased okay and he said that boltzmann did not succeed in explaining why not so boltzmann at this point you know in the 1890s boltzmann was uh kind of a big deal he was a famous guy and he responded uh pretty snootly actually to zermelo's objection he said something like well you know i suppose i should be happy that people in germany pay attention to my ideas at all but it's sad to see they completely fail to understand them so and then he goes on and explaining why the recurrence objection isn't valid but you know he's not very convincing about it to be honest now there's a simple way out just say the universe is not bounded right the universe is not a finite sized system it can just expand and change forever that's what uh jennifer chen and i took advantage of and our solution to this um so i mean the big universe out there right so maybe that is the perfectly obvious thing to think but you know boltzmann tried to be a little bit more clever so boltzmann says well maybe it is recurrent right maybe that's the actual behavior of entropy over time but i'm going to invent a new principle he didn't call it this but we would now call it the anthropic principle so boltzmann appreciated as he should because he was boltzmann that in order for life to exist you need entropy to increa be increasing okay um everything that we do in our lives that is embedded in the arrow of time relies on the fact that entropy is increasing we need to take in low entropy food and eat it and then expel heat and waste into the environment in order to stay alive so boltzmann said look imagine you have a really really big universe okay uh that is infinitely big both in space and in time and there's a uniform distribution of matter on average again we didn't know about general relativity he didn't know the universe should expand or contract etc so he had fixed newtonian space-time full of matter okay he also didn't know about gravity and instabilities i mean he knew about newtonian gravity but he didn't realize that there should be instabilities in that kind of situation he imagined just imagine it's like a box of gas but really really big and then what happens then in that box of gas he says look if you plot in some region of space the entropy as a function of time if it's an eternal universe then usually there'll be some maximum entropy okay and usually that's where the universe will live it will be close to its equilibrium state almost all the time but since i'm boltzmann i know there will be fluctuations downward in entropy so the real plot won't just be a straight line it will be usually up there but there'll be little deviations and sometimes there'll be a big deviation if you wait long enough right so this should be and this can be recurrent right there can be another big deviation later so he says this is completely compatible with recurrence and then he says well in the regions where entropy is just high and you're in equilibrium life cannot exist so by the anthropic principle which again he didn't call it that but he says i'm not going to ever find myself in that region of the universe whereas when the entropy goes down and then springs back up that's when stars and planets in life can exist so maybe we live here right maybe that's where we find ourselves in the aftermath of an occasional unlikely random fluctuation downward in entropy and then back up that was one of the ideas that he had he also had this past hypothesis idea but this is one of the ideas he put forward he was a little embarrassed to put it forward he actually attributed to his old assistant dr schutz but he later sort of stopped giving anyone else credit for it when he mentioned it later um it doesn't work this idea of boltzmann's is just bad and and you know why probably boltzmann as far as i know never pinpointed the reason why it was arthur eddington circa 1915 who really understood why this couldn't possibly work and the answer is there's a big problem with this and the answer is boltzmann brains what we now call boltzmann brains neither boltzmann or eddington ever called them that uh i think that coin the term was coined by andy albrecht and lorenzo sorbo in the 2000s right fairly recently um so the problem is sure if you really do have this infinite universe that lasts forever and there's fluctuations downward and back up you will get occasional fluctuations that create a galaxy or even a whole universe but far more often you'll get tiny fluctuations and maybe they will just make a little brain i don't know how to draw a brain okay in other words there will be an enormously larger number of small fluctuations than big ones and the fluctuation you need to make a person or a planet is enormously more rare than the fluctuation you need just to make a single brain that lives long enough to go look around and go huh thermal equilibrium and then it dies technically the probability of a fluctuation downward in entropy is proportional to e to the minus the fluctuation size so large fluctuations are exponentially more rare which says that it's just another way of saying that if you want to fluctuate into a person if you think about it it's a random fluctuation you sort of need to separately have a fluctuation that makes a brain and a fluctuation that makes the body of the brain and if you want not just a person but a planet and a solar system you need separate independent fluctuations for all of those things and it's just way easier to get a fluctuation into one of those things than do all of them at the same time so eddington says with overwhelming probability observers in this multiverse of boltzmann's randomly fluctuating multiverse would find themselves to be single observers lonely and without anything else anywhere around they would not find themselves like we do to be thermodynamically sensible creatures living in the aftermath of a low entropy big bang and this this objection is correct you know i've written a paper explaining that there's a little bit of philosophy to be done here and explaining exactly why the argument works because you know we look around as we say well we're not a boltzmann brain so can i just say even if most of the observers are boltzmann brains i'm not one of them therefore it doesn't bother me uh you can't say that but it's it's a complicated reason why so i'm not going to go into it but the conclusion is still the same okay the conclusion is we cannot live in a randomly fluctuating universe that lasts eternally like this uh better solutions than the randomly fluctuating one are that either the universe is finite so finite in time in particular so if the entropy of the universe well no that not the entropy that's called the size of the universe as a function of time does this okay if there's a big bang and a big crunch in the in the future then there's not enough time for all these recurrences to happen and everything makes perfect sense the problem with that is that in 1998 we discovered the universe is accelerating not decelerating so it doesn't look like the universe is gonna crunch in a finite time the other possibility is that uh the universe is not bounded so finite in time is not bounded in entropy so what we mean by this is the the thing that did all the work here in the fluctuating scenarios that there was a maximum amount of entropy that you could have and therefore you would tend to stay there most of the time but what if the plot of entropy versus time has the feature there's no maximum entropy state that you can be in then the plot can look like this it can increase both to the past and the future and so there will be people living on this side who say oh this was the past toward that region there'll be also people living over here who say that's the past so in the far past and far future of the universe everything is completely symmetric it's equivalent to each other statistically anyway and that's the kind of thing that i suggested in 2004 with jennifer chen we don't know which of these is true it's still very possible the universe being finite in time is a good solution to this problem but people again don't think about it enough this is a huge problem for modern cosmology that people are not spending nearly enough time thinking about that's that's my two cents anyway okay there you go that's the uh reason why we have an arrow of time in the universe and that's what the arrow of time is uh at least the thermodynamic arrow time as we call it the entropy increasing error of time i want to claim though much more i want to claim that all of the arrows of time are due to the thermodynamic arrow of time and this is again a contentious claim uh but it's correct you should leave me on this one not everyone believes me but they will sometime the claim is that the second law underlies all of the arrows of time what i mean by all the arrows are things like the biological arrow we get older right we get a little creakier as time goes on um there's a macrobiological arrow evolution happens toward the future there's a psychological arrow of time you remember the past but not the future there's a causal arrow of time when i do something now it can affect the future but it cannot affect the past there's an electromagnetic arrow of time you turn a light bulb light goes away from the light bulb but light never focuses in on the light bulb when you turn it on okay so the claim is that all of these imbalances asymmetries between past and future are ultimately because of the second law of thermodynamics and like i said this is contentious and i'm not going to completely explain it but i will give you the basic reason why this is true and the reason why is because of the past hypothesis if you believe in this statistical microscopic view that boltzmann and maxwell and their friends put forward the past hypothesis of low entropy at some time a few billion years ago 14 billion years ago is the only thing that breaks the symmetry between past and future and therefore it had better explain all of the arrows of time and the rough way that works if you think about time going upward and this is not space that i'm plotting here but it's you know the state of the world so it's phase space roughly speaking the state of all possible ways the world can be and the claim is the idea is that we have some information about the current state of the world uh we have some directly we have some by inference but we have some state about what you might call some information about what you might call our current macro state okay so this represents our current macro state and as as low schmidt would have told us correctly if all you know is our current macro state and the microscopic laws of physics there are trajectories toward the future from our current macro state and there are trajectories from the past into our current macro state and they are precisely equal in number okay there's no difference in the number of ways the microstate can evolve into the future versus the number of ways it could have come from in the past because the underlying laws are reversible information is conserved in this evolution okay so there's no difference here between past and future but a difference appears when we say i have a past boundary condition from the past hypothesis that the past was in low entropy so it is a feature of most of all these other trajectories here i give them another color these have higher entropy than the current macro state and that's exactly because of you know this picture we drew way up here if you start anywhere that is not maximum entropy most of the trajectories coming in and most of the trajectories going out are both higher entropy places that you come from and go to so that's true in the future but it's also true in the past so there's many ways for entropy to increase toward the future there are many ways for entropy to have been larger and to get us into our current macro state the symmetry is broken by the idea there's a past hypothesis that says that early universe had low entropy and that means that given our current macro state and the past hypothesis you can figure out things about the trajectory that got us there and those things are called memories or records of the past if you're on a beach and you see a footprint you say to yourself well i bet there was a foot i bet that someone walked on the beach and left an impression of course you're correct about that but it doesn't follow simply from the fundamental laws of physics in the space of all possible ways for that footprint to get there most of them have the footprint randomly fluctuating into existence okay the reason why in your brain you're convinced that the footprint is evidence for a previous foot being there is because secretly you're conditionalizing your probabilities on the past being low entropy given that the past was much much lower entropy then the easiest way to get footprints on the beach is to have a foot stand on them not to wait around a gajillion years for a random fluctuation so that's true for all the ways in which we remember things and by memory i don't necessarily just mean a physical memory in our brain i mean any kind of record a history book a photograph a fossil footprint any of those things they all make sense because we're assuming a low entropy past same thing is true for why we can cause things toward the future but not the past the rough idea is that the past seems fixed and settled because of the low entropy past hypothesis the future seems more up for grabs and affectable because there's no future hypothesis the entropy just grows as far as we know and that's why you know we can talk about free will in a sensible way because we can't talk about free will toward the past but we can talk about it toward the future because you know if you were laplace's demon there'd be no free will right you would just have a microscopic world and it would obey the laws of physics and the past and future would be equal but in the real world we don't have access to all that microscopic information we can say things about the past and future but there's an imbalance about what we can say so we act as if in the macroscopic world we are able to influence what happens that's the sensible way to talk about big human scale things that's why it makes sense to talk about choices and responsibility and blame and all that stuff that we normally like to talk about with human beings okay that was one important aspect of uh the past hypothesis and the increase in entropy here's another one life i talked about the fact that boltzmann recognized that uh you need an increase in entropy for living beings to exist let me just mention that one more time with a little bit more specificity you know here we are on earth here's earth and we have living creatures so here's a person on earth here is you know plant flower and to some extent as you will learn i keep advertising my podcast here but i just released a podcast a couple weeks ago about the definition of life and one of the parts of the definition of life is homeostasis you can maintain your integrity as a creature as a physical object and the fact that seems like a very natural thing you know we all know in the back of our minds we need to breathe we need to drink we need to eat in order for that to happen but it's really possible because we are embedded in a very low entropy out of equilibrium environment in particular you know we get light from the sun so here's the sun okay radiating it's light to us and if you ask someone well what good is the sun to us here on earth they might say well the sun gives us energy and we use energy to do things and that's true in a sense but it's not the complete story it's not even the most important part of the story because the truth is the earth radiates out into space almost exactly the same amount of energy as it gets from the sun historically exactly the same amount of energy but these days we have global climate change which is warming up the earth a little bit we're reading away a little bit less of the energy than we get from the sun but roughly speaking the same amount of energy so the point is not that we get energy because we give the same amount of energy back the point is that we get energy in a low entropy form we get energy in a concentrated form that is very useful to us so for every single photon of visible light that we get from the sun we radiate out i'm not going to be able to draw all of them well let me maybe do 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 photons of infrared light this is not something you can calculate from first principles this is a true fact about the earth and its atmosphere we can radiate away infrared light we absorb visible light so that means that we take a certain amount of energy from the sun you know infrared light per photon has less energy what we're saying here is that on average for every one photon we get from the sun we radiate back 20 photons with 1 20th of the energy each but the number of photons is proportional to the entropy the energy of the photon has nothing to do with it so we've increased the entropy of that energy we get from the sun by a factor of 20. we've drawn down the orderliness of the universe made it more disorderly and in the process we stay alive so there's photosynthesis in the plants photosynthesis takes that energy from the sun it locks it up into a sugar or something like that we animals eat the sugar our bodies use it to make atp sends the atp molecules all throughout our system with their little batteries and they release their energy to you know maintain ourselves move our muscles uh make our brain work all that stuff i mean it's it's amazing to think about like you know how much science has figured out what's really going on here and the role of entropy in the arrow of time in the whole thing it's all because we are very far from equilibrium if if the whole sky or the temperature of the sun we'd get a lot more energy than we do now right but it wouldn't be very useful we here on earth would quickly come to the temperature of the sun to thermal equilibrium and then we would all die but also if the sun were if the sky were completely dark if the sun were not there then we here on earth would quickly come to the temperature of the night sky equilibrate and quickly die we stay out of equilibrium we can be living creatures because we get low entropy energy use it up and then expel it back in a much higher entropy form again lots of science to be done here that we don't yet know the answer to so this is uh ongoing research that we're trying to understand okay i'm going to keep talking because this is my favorite topic entropy there's a million things to say it's worth many many lectures you can buy the book if you want to hear more but i'm going to give you some of the highlights um because i got to get to information right i promised you entropy information so let's get there let's talk about maxwell's demon so this takes us back to the 1870s um james clark maxwell was trying to understand this idea you know that they could understand entropy as a formula based on the molecules which seemed weird to a lot of people and maxwell thought he came up with what seemed like a counter example he came up with what thought he what he thought seemed like a situation where you could violate the second law of thermodynamics without doing work again you can always put a glass a bottle of champagne in the refrigerator and let its entropy go down but guess what the back of the refrigerator is expelling a lot of heat the overall entropy of the universe is going up even if the entropy of that bottle of champagne is going down that's why you cannot cool your apartment by turning on a refrigerator and opening the door because it will always be expelling more heat at the back than it is cooling in the front because of the second law of thermodynamics okay so but aside from that maxwell's saying that without increasing entropy anywhere he can come up with a way to decrease entropy in an isolated system so you all know the situation probably here is a box of gas and there is a barrier in between with a hole in the barrier and there's a demon standing on top my picture of a demon looks exactly like a person except that he has little horns and the demon has access to a little paddle which he can raise up and down so he can move the paddle in front of the hole to either let particles through or prevent them from going through and the important thing is it's all frictionless you know there's no air resistance or anything like that but the lifting the paddle up and down does a net total of zero work the demon is not adding to the energy of the system nor is the demon taking away energy of the system he's just changing the boundary conditions of the box in an apparently innocuous innocent harmless way and the idea is you know you have atoms molecules flying around in here and in thermal equilibrium both sides of the box would have the same temperature the same number of slow-moving molecules and fast-moving molecules but maxwell's idea was that the daemon could look into the box without disturbing the molecules he could see when a high-energy molecule was approaching his little paddle and if so he would let it through from one side to the other and the other direction he would let the low energy molecules through and by doing that he would create hot gas on one side of the box and cool gas on the other which is a lower entropy configuration so maxwell's worry is that the entropy of the universe is spontaneously going down without being created anywhere else now everyone in the world thought ever since maxwell came up with this that clearly somehow the demon is creating entropy and the the quest was just to figure out how exactly the actions of the demon are creating entropy in the universe and it took decades and decades it wasn't really until the 1980s believe it or not that people really settled on an answer which most people accept now but still not everyone accepts it and it comes from rolf landauer and charlie bennett and their essential insight is that the relationship between information and entropy so the demon needs to be able to know whether a certain molecule is high energy high velocity high energy or low energy right so the demon needs to record that information somehow there needs to be an information recording device that the demon has access to and that information recording device you know they make an argument that it essentially has to be finite sized otherwise you're sort of giving the demon an infinite resource to play with and the demon if it has a finite size recording device the demon must record info and eventually if the whole thing goes on for a long time erase that information erase it okay it is that process of erasing information in his record-keeping device in his little notebook that's what increases the entropy of the universe because one way that you should keep in the back of your mind the increase of entropy has a close relationship with irreversibility remember when we said that the gibbs entropy formula would just stay constant if you had perfect laws of physics and no nose looking inside the box and that you were not embedded in a heat bath or anything like that okay information is conserved everything is reversible once things are not reversible that's when entropy begins to be increasing and the act of erasing something in your notebook is not reversible the whole point of erasing is you don't know what it was anymore so you've erased a zero or a one and afterward you no longer know whether it was a zero or a one so you cannot reverse that and they argue that this creates entropy in fact erasing one bit uh of information creates an entropy delta s of k log two the natural logarithm of the number two so that an exact calculable amount of entropy that you create in the universe by erasing a single bit of information and doing it requires energy as a this requires energy e equals k t log2 where t is the temperature of the surroundings that you're in so and this these ideas are known as landauer's principle ah that was terrible and worse principle and i'm not doing justice to the subtleties here because i just want to get the basic idea on your plate that we can begin to see the relationship between entropy and information basically the demon needs to have information and losing that information counts as increasing the entropy of the universe in a way that relates directly to things like temperature and energy this is this is the idea that is trotted out when people want to say information is physical okay knowing things can be used to do work can be used to manipulate the world and in fact these days people are building tiny little maxwell demons uh it's not that maxwell's demon is impossible it's just that you need to have some way for it to have information and eventually that information will be erased thus increasing the entropy of the universe so the point is that you can build a little machine that can actually using the information it has about its environment seem to do work it can convert information into work by work we mean physical energy that you move from one place to another like you can move a piston okay or uh arguably there's a connection here with what molecules do inside your body we talked about the atp other things going on inside your body seam and again this is cutting edge stuff that we don't completely understand but there seems to be a relationship between the use of information and the growth of entropy and complexity in your body there's another very promising research area for the youngsters to go into if you're interested in this stuff but i just like it because it you know it's a fun way that we start going from physical stuff molecules and things moving around to what seems like a slightly more abstract realm of information and knowledge so let's go down that rabbit hole a little bit further more into the direction of information because you've all heard of claude shannon claude shannon was the pioneer of information theory he worked at bell labs and he was really interested in again down to earth stuff like this is this whole field he was interested in communications sending signals along wires this whole field of statistical mechanics and thermodynamics and entropy has a lot of connections with tangible real world stuff carnot saji carnot who first formulated a version of the second law of thermodynamics his motivation was not to understand fundamental physics his motivation was to build a better steam engine right it was early 1800s he was french he was upset that the french were not doing as well at building steam engines as the british were and so he tried to do better and he prove that you could never do a perfect thing you could never build a perfect steam engine every steam engine has to dissipate and be irreversible in some way so likewise shannon really literally was interested in the most efficient way to send signals over wires knowing that there would be errors right there's going to be mistakes you never get the signal and interpret it completely so you have to be clever you have to think about how best most safety most efficiently most reliably to send your signal over you know a transatlantic cable or something like that so here is his setup and it's going to sound completely different than what we've been talking about so far but then we'll see how it joins back up the setup is you want to send symbols and by symbols you might mean letters in an alphabet or you might mean words or if you're more mathy you might mean just zeros and ones uh morse code or something like that and you want to send them over a channel so a channel it could be a wire but again what we're doing here is taking the tangible problem i want to send a telegraph signal over wire and turning into a formal question so i'm sending something called symbols over something called the channel right and these these all have mathematical realizations and if i receive a whole bunch of symbols i receive many messages over time um if you like if you if you read a book if you read an english language text you will notice that not all words appear with the same frequency there are common words and uncommon words there are common letters and uncommon letters letters like e and s appear a lot letters like q and x appear less frequently okay so you can just count the frequency you can be kind of frequentist about it and say well what is the probability of the next symbol that i'm going to receive being a certain symbol versus something else some are more probable than others so let p of x be the probability of receiving simple x i don't mean the letter x i mean whatever symbol x stands for here then what shannon says is some signals some messages some communications that you get convey more information than others by virtue of the fact that they are more surprising than others okay and so he uh relates the information content to what we sometimes call the surprisal i call it that because i think it's a very cool word and the formula is not that hard the information content associated with receiving symbol x is just the logarithm of one over p of x okay so the more probable the lower the information content remember the logarithm has this shape but it goes up as the number that it's a logarithm of goes up and the logarithm of one over p of x let me tell you another fact about logarithms if you don't know uh the logarithm of a to the power b equals b times the logarithm of a it's just a math fact about logarithms you can you can check for yourself so p of x one over p of x is p to the minus one so this is just minus the logarithm of p of x oops and this is called the surprisal okay the more the lower the probability of a message of a symbol reaching you the more information it contains so the way to think about it is you know someone says oh the sun rose in the east this morning okay you're not that surprised like you go okay good i hope what else would it have done but if someone says the sun rose in the west this morning if they were a reliable witness you'd be extremely surprised and that would convey a lot more information to you so in some naive way the statement the sun rose in the east and the state of the sun rose in the west they both convey one bit of information did the sun rise in one direction or the other direction right but the bits are not created equal this is shannon's great insight the bits depend on the context in which you expect to receive certain bits of information with certain probabilities if every morning you get a report the sunrise in the east or the west the probability of it being in the east is really really high you are not surprised when you get that not much information is conveyed okay so shannon then goes on to say all right in different kinds of alphabets different kinds of sets of symbols different kinds of languages we might expect on average to be surprised and get information frequently or infrequently okay so he defines the average surprise surprisal which is just the sum over all possible symbols that you might receive of the probability of getting that symbol p sub x let me write sorry to be consistent what i've been writing p of x times the surprisal the information content i of x right so this is a formula that says you know some symbols are going to happen a lot let's calculate the surprisal there and some symbols are going to happen rarely let's calculate the surprisal there and get the average okay so the average weighted by the likelihood of actually getting those symbols um so guess what i is minus log p okay so we define this to be h of the set of symbols x and it is minus the sum over x of the probability of x times the log of the probability of x so shannon derived this formula he didn't even know about statistical mechanics he didn't know about boltzmann and gibbs he didn't know that there's already a formula exactly like this this is exactly what we called before gibbs's formula for the entropy okay so basically the way to think about what shannon has done is to say look if i have different kinds of alphabets different kinds of sets of symbols different kinds of messages i might receive x are the different symbols i receive and there's the different probability for getting different messages x if the probability function looks like this then basically what you're saying is almost all the time you're getting this symbol so even though you're getting a symbol it's not telling you very much you're not conveying that much information because it's the same symbol you're getting all the time so this is a low information alphabet it is a it is exactly equivalent to a low entropy configuration in phase space whereas something like this now you're saying well sometimes you get the symbol sometimes you get that symbol every single time you get a symbol you're learning something because there's no one kind of symbol that is overwhelmingly likely so this kind of symbol set is high information so if you care about sending signals over wires you want to get a high entropy high information content encoding for your messages so that you actually convey information all the time now you might wonder you might say well okay sure if i have this peaked distribution most of the time i'm getting a low information content message but on the other hand when i do get an unlikely low probability message doesn't it convey so much information that it wins out well the answer is no it does not win out the fact that the information is lower is is swamped by the fact that the probability is even lower and you can sort of feel that here or here because remember the logarithm grows slowly so the logarithm is going to lose the battle against the actual probability so what counts for whether or not your alphabet is high information conveying or low information conveying is the probability of the most probable symbols that's what the one that's what really matters so this is now called the shannon entropy or the expected information content and there's a story which many of you probably have heard that shannon was talking to john von neumann of all people and said i have this new quantity that i want to define what should i call it and and uh von neumann says purportedly says i'm not sure if it's true he said well you should call it the entropy for two reasons number one is because there's already a formula that looks like that which is called the entropy and the other is nobody knows what entropy means so in conversation you can always be sure to win because no one knows what you're talking about and yeah it's a little bit unfair but there's a there's a sense of truth to that now what i want to mention about this is that this is all about information this is clearly about information this is not you know these x's are not points in phase space they're symbols in an alphabet but there's a little bit of a mismatch between how a communication theorist would talk about information or care about information versus how a physicist would okay so shannon is saying that low entropy corresponds to low information high entropy corresponds to high information that's the opposite of what we said before right when we talked about the knowledge that we have the information we have about a system given a certain probability distribution if the probability distribution is very peaked then we know a lot about the system we know more or less what microstate it's in whereas if the probability distribution is all spread out we don't know anything about the system it could be anything so to a physicist physicists tend to associate high entropy with low information content because they're thinking of the entropy of a distribution on phase space and the information that they have in mind is where are you in phase space but a communication theorist like shannon has a completely different notion this is mathematically formally the same but what they mean by high information content is how much information is conveyed by any one element of that alphabet and if the probabilities are all spread out equally a lot of information is being conveyed by getting any one symbol in that alphabet in other words by sort of measuring the microstate of the system once by one by one you learn more in a high entropy configuration than in the low entropy configuration so i just want to let you know that because when everyone agrees there's a close relationship between entropy and information but different people from different subfields think the relationship goes in opposite ways so just keep your wits about you they're both perfectly sensible ways of thinking but they're different ways of thinking depending on what you're thinking about okay final thing good one more thing which is because this is the biggest ideas in the universe we would be remiss if we did not talk about quantum mechanics because we love quantum mechanics and there is a separate notion i told you i was going to need four definitions of entropy so i gave you the boltzmann definition with coarse graining gave you the gibbs definition the sum of p log p uh for distribution function for for a probability distribution in phase space i gave you shannon's definition which looks formally exactly the same as gibbs is but it means something different and now i'm going to give you a fourth definition in the context of quantum mechanics i'm going to i'm going to tell you there is a definition without writing down the formula because it requires extra notation to get there but this comes from jon von neumann von neumann uh the great hungarian mathematical physicist came up with a way to understand that quantum mechanics leads to a kind of entropy that is new or at least it's an origin of entropy over and above what you used to have so remember if you think about the gibbs entropy you have some probability distribution the entropy is really telling you how much you don't know about the microstate right so what van loyman points out is that because of entanglement you can have a situation where you know everything there is to know about the state of a composite system but there's a sense in which you don't know the states of the subsystems that are entangled because there is no such thing as the as the state of the subsystem so remember uh so this is entanglement entropy or sometimes just called von neumann entropy and we go back to alice and bob and they're entangled spins so imagine that alice and bob's spins have a wave function psi that is one over the square root of two both spins up so there's alice's spin up bob's spin is up plus one over square root of two both spins are down for alice and for bob and you remember the story right like if alice measures her spin we instantly know what bobspin is but bob doesn't know that so the bob is still a 50 50 chance etc etc this is a 50 50 chance of both spins being up or both spins being down were you to observe either one of them so here's the situation i told you the wave function remember when if you go back to our discussion of the uncertainty principle the point is that there's no more information to give you about the physical set up of the system you know if you're a hidden variables person maybe there's more information but here in wave function land this is the entire quantum state of the system we're not leaving anything out so you might say in some sense this is a situation with zero entropy you know the microstate this is the microstate of the system and that's true that is a perfectly valid way of of calculating the entropy for this system but you might also want to say well what is the entropy of alice's spin right and what we what i tried to teach you to say um back when we talked about entanglement is there is no wave function for alice's spin because alice's spin is entangled with bobspin so alice's spin does not by itself have a wave function there are not separate wave functions for the different pieces of the universe there's one wave function for the wave function of the universe and it's entangled in this way okay but what von neumann points out is that observationally when it comes to actually calculating anything you might want to observe about alice's spin anything alice might want to observe about her spin okay we can ignore bob's spin and replace alice's the replace the wave function of the composite system with an uncertain wave function and again this is this is this little subtle so follow along here you we talk about a wave function of a spin being uncertain to with respect to what spin you are going to get if you were to measure it but now we're doing a different thing we're saying we're uncertain about the wave function that's a whole other level of uncertainty and so we call this this is a pure state up here and now we're going to consider mixed states where we don't even know what the wave function is and so observationally it's as if alice's spin is described by a mixed quantum state where the probability of alice's spin being just spin up so literally the probability that alice has a spin with a wave function that is 100 spin up is 0.5 and the probability that alice has a spin with a wave function that is completely down is also 0.5 so it's like having the probability distribution on phase space right when we had a probability distribution phase space we said well there is some point microscopic state of the system we just don't know what it is so we have some probability distribution this is trickier because there is no such thing as the state of alice's spin the quantum wave function of alice's spin it's entangled with bobs but for everything you might want to possibly calculate about what you're going to observe when you do things to alice's spin you would get the same answer if you started in this mixed state and you can even there's a formal thing that fond neumann describes about tracing over bob's spin to get this mixed state and these are described in the literature by density matrices we're not going to go into what that is but as usual that's lingo if you want to look it up density matrices or density operators so in some sense so what's going on here here's the crucial thing you have two systems a and b and classically a has a state and b has a state you might not know it and if you don't know it if there's some probability distribution there's an entropy quantum mechanically there's a state for both a and b together it's entangled and you know it exactly maybe let's imagine you know it exactly but it is as if the subsystem a has a state and you don't know it therefore you can talk about the entropy of subsystem a okay so the density matrices are called rho the greek letter rho is given to the density matrix and the phenomena entropy i told you i wasn't going to write it down i'm writing it down uh the volume and entropy of rho is the trace of rho log rho and the trace means the sum of the diagonal elements of a matrix so look minus there you go it's the same formula trace is kind of like the sum or the integral and rho is kind of like the probability distribution um so i'm being less specific than i've been in the past but you see the same structure appearing the difference the conceptual difference here is that this is still true even if you know the complete state of the universe so quantum mechanics forces some entropy on you if you are choosing to consider subsystems of larger systems because they can be entangled let me give you an example of a subsystem of a larger system here's space and here's a region of space okay let's imagine this is just the quantum field theory vacuum so there's no particles or anything like that but as you know as we talked about even though there's no particles there's entanglement between the region inside and the region outside and therefore you can calculate the density matrix of this region if you call this region s no s is terrible oh my goodness call it r if this is region r there's a density matrix rho sub r and that density matrix has an entropy regions of space of empty space in quantum field theory have entropy even if you know that the whole state is in the vacuum so in technically speaking the entropy is infinite because it's quantum field theory but maybe you regularize it and there's a cut off and make it a finite number so that's the usual renormalization story effective field theory story we think that in some sense you can think about black hole entropy this way because you notice that even though the entropy is infinite it's not crazy that the entropy is proportional to the area of the region that you've drawn because the number of entangled degrees of freedom is proportional to that area also so mark schreinke and other people pointed this out and made an argument maybe we should think of black hole entropy as entanglement entropy between what's inside the event horizon and what's outside the answer seems to be kind of black hole entropy turns out to be harder than that but in a very real sense it does seem to be entanglement entropy it certainly seems to be entanglementary of something but not quite of degrees of freedom in the quantum field theory vacuum anyway so this is yet another formula for entropy another way that entropy arises and this is sort of less optional than the gibbs or the boltzmann entropy there's nothing that came in here about coarse graining or ignorance or anything like that it's a built-in feature of the universe so it's almost like quantum mechanics is saying you have to deal with entropy whether you like it or not that's a good lesson for all of us to take to heart
Info
Channel: Sean Carroll
Views: 112,027
Rating: 4.9068098 out of 5
Keywords:
Id: rBPPOI5UIe0
Channel Id: undefined
Length: 98min 42sec (5922 seconds)
Published: Tue Aug 04 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.