The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies (Paper Explained)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right today we're going to find out why ai is much better at governing people why poor people really should pay more taxes and how donald trump is just a normal human all right we'll dive into it we're looking at the ai economist by salesforce research now salesforce research has kind of created a simulated world environment where they can place agents in it and the agents they can move around they can collect resources they can trade those resources and they can use those resources to build houses and that will earn them coins and each agent wants to maximize its own coins but also there's the government and the government can set taxes so they collect money from everyone and they redistribute it and the goal now is to going to be that the ai handles both the agent and the taxes and we want to maximize the social welfare of the entire population all right that's the goal so the paper here is called the ai economist improving equality and productivity with ai driven tax policies by stefan cheng and alexander trott and other people from salesforce research and harvard university so as i said this is a simulated environment and the simulated environment works like this there is a 2d plane kind of like a game playing field and in this game there are agents here you can see the agents there are always four agents where oh down here what are you what are you doing in the corner come on be productive um the the the agents are in this world and they can do certain things they have certain actions at their disposal so first of all they can move around they can move down left right and so on whenever they walk past a resource tile they collect the resource this is stone and this is wood so there are two kinds of resources and then the last actions the agents have is building a house one wood and one stone will create one house and the house gives you coins so this is a house and that will give you coins but how much coins you get is different from agent to agent and this represents the agent's different skill levels this is an abstraction and the kind of economic theory behind it is that the income inequality in people one of the main drivers of it is that they are skilled differently and therefore are able to they are able to convert one unit of labor into more money than another lower skilled worker so this is here represented by the fact that maybe if this agent here builds the house they'll get 50 coins but if this agent here would build the same house they'll only get 10 coins so we'll call this here a high skilled worker and this here a low skilled worker now the last thing sorry i thought last thing before but the very last thing the agents can do is they can trade so if one agent has too many resources and the other one has not enough they can trade those resources among each other for those coins so once you build a house you collect some coins you can then either go and collect more resources or you can use those coins in order to buy resources off of other people this guy this is unlucky no coins no houses and no resources look at them oh yeah so you also can't move across the water here um you can only move on the grass you can also not move through a house which gives you some interesting abilities because you can just build a house right here and um yes so and you can't move over other players but these are so the rules are pretty simple and the goal here is for the agents to maximize the number of coins they get in a thousand steps so the number h here is one thousand which is the number of steps that the agents can take before the game is over and it restarts again so each agent is using reinforcement learning in order to learn how to achieve the maximum number of coins now the policies of course going to be different depending on whether that is a high or a low skilled worker the catch here is that outside of this there is the government the government here let's draw this big house with the flag of our fictitious nation which is like this that's the flag and the government will observe what's happening here and they will issue a tax um taxes so it will issue a tax distribution now how do you imagine that so if you imagine the government says something like this for the first ten coins you own you owe us five percent of that um for the next 10 coins so from 10 to 20 you earn you owe us 10 and so on so if you earn even more you owe us more and more percent of those extra coins this is what you might know as a progressive tax schedule the more you earn the more percentage-wise you pay on that extra earned money this is what you might be used to but there are other tax schedules and the exact histogram you see or the exact how many percent for which amount of coins that is the action of the government so the government decides on the taxes and the taxes are just collected from the income so if you if an agent earns these coins then it has to pay taxes to the government and the government will redistribute all the taxes it has collected equally among the population so if you pay a lot you might lose through this process and if you just pay a little taxes you might gain through this process so that's it that is the basic premise of the game the agents are using reinforcement learning and i believe the newness of this paper is also that the government now is using reinforcement learning in order to determine the optimal tax policy there is kind of this inner loop here and there is this outer game where the government also tries to maximize the rl and what does the government try to maximize good question it is a measure that's called social welfare now social welfare consists of two things and they have this here way down in the paper social welfare in this paper consists of two things first of all economic productivity which basically just means how many coins have has anyone produced it doesn't matter who but just the total amount of coins produced the second one is income equality and this is related to the the genie index so if you plot the cumulative distribution of wealth a fully equal society would be a straight line because 50 percent of the people would have 50 of the money and so on but a almost all true societies have something like this where fifty percent of the people might have ten percent of the money and the rest fifty percent of the people has the other ninety percent and the measure of inequality is this area here um this is called the genie index and one minus this area is what this paper has as an equality measure so the higher this number the more equal is the society in terms of their income distribution now what is actually optimized for here is this thing equality times productivity so you want both to be high your income equality and your productivity there's a trade-off here of course but you can you can have multiple ways to trade that off and that will give you the different uh thing they call this the social welfare function and that's the thing that the government rl agent optimizes for so you can see here already the free market even though it's the most productive produces the most coins because if you haven't free market means no taxes if you have no taxes then people are basically encouraged to uh earn more money because they don't have to pay taxes on them right as soon as you tax them they're less encouraged to earn more money and therefore if you have no taxes the most coins will be earned in total but the equality suffers so the equality is the lowest among these things considered if you compare that to the ai economist the ai economist achieves the highest social welfare it achieves the highest equality but it doesn't suffer as much in productivity as other systems here and the baseline systems are first of all the u.s federal system this is not particularly tight to the u.s this is basically every system uh or most of the systems that you have currently in the world is the progressive tax system and the size formula which i believe is an economically theory-based system which is a regressive tax schedule you can see them down here where the u.s federal will be progressive means the more you earn the more percentage-wise you pay while the says formula will be regressive which generally means the more you earn the less you pay i believe this was derived under some assumptions to be the optimal tax distribution and the ai economist will come will come to will come to this in in a second let's actually just look at one of these things first one of these games how this plays out the cool thing here is that they have pretty flashy animations so you can look how does one of these games turn out now this is a free market game and you can see the agents moving around collecting things building houses and you might notice that one of the agents namely agent one is just building all of the houses and generally just kind of being a dick being in everyone's face and kind of building things everywhere and the other ones don't and or or just very few like the light blue on the on the bottom left build some houses on the right you can see how the distribution of wealth is is structured and you see agent one ends up with most of the wealth now the size of the circle i think is the total productivity so you can see this grows over time mainly because agent one becomes so rich and if you analyze this if you analyze what's happening here then you'll see that agent one and i might be yeah they have a graph up here so so it is very interesting what happens this is kind of the same game so agent one here is this orange dot and agents two three and four are these dots here and this graph here is coin from trading so how much money they win or lose from trading now you the green bars are trading wood and the the brown bars are trading stone so you see agent number four which is the lowest skilled um the skill is just determined at the beginning of the episode it will just make all of its coins basically by selling wood and agent 3 will make all of its coins by selling stone and agent 2 will collect both and sell both and agent one will just spend money in trading so you'll have a specialization here agent one which is the highest skill one right here will buy resources in order to build more houses because it clearly profits from building lots and lots and lots and lots of houses so it will use that money to buy more resources rather than go in collecting them while all the other ones basically forego building houses in favor of they just collect the resources and they just trade them way to the agent one that's more profitable for them than building houses themselves so you see this kind of specialization emerging in these games which i find i find this to be pretty cool that you see something like this like a really stark division of labor emerging just from these very very uh small set of rules and you can analyze this game in different ways they have a few more plots where this becomes quite apparent that um sorry that that these agents specialize so you see here resources collected sorry about that resources collected uh if you have the lowest skill and the highest skill labors the reas the lowest skills they mainly about this this should be a pen they mainly collect resources while the highest skill labor mainly goes for building things it doesn't collect resources but net income from building is really high while everyone else just doesn't build at all all right so we have a division of labor emerging now this was a free market let's actually compare the different algorithms so if you look at social welfare this is this thing here equality times productivity you can see that the ai economist will outperform over time over the training progress it will outperform all of the other systems so it will outperform the free market the u.s federal tax system and the sas formula um if trained for long enough which is to be expected right if you put rl onto a cost function it will then optimize that cost function but it's pretty cool to see that it had there's a lot of lot of headroom here over what we currently have now let's look at some of these strategies it comes up with so what do these games look like where the ai has imposed different tax strategies so this is with the size strategy you see that here again you you see this inequality emerging with the yellow player here building most of the houses with the ai economist again there is inequality but you can see at the distribution that agent one only ends up with about half of the wealth where if you compare this to the free market here then agent one ends up with like two-thirds of the wealth right this is the game we saw before um but there is not qualitatively that much of a difference uh but there is in the end result all right let's look at what the these policies actually come up with so what is the tax policy that the ai comes up with so this tax policy outperforms on this social welfare metric and this is very interesting right so first of all you see that it's right zigzag it's like down up down up uh which is already weird so the first very weird thing is the the spike at the very bottom so that thing here what's that thing here those are the poorest people in your society and you're taxing them the highest right so just imagine this you're here uh downtrodden by life abandoned by society you have no money no house no nothing and you're just trying to get a job you're just getting like a little bit of money and you can buy a cheeseburger and then the government comes give us that us that money come on so basically this these are the poor and the poor in this system is just fu fu the poor now the reason why this happens is pretty clear right the reason why this happens is because you want to encourage people to go here to earn more money right so so it's not like the government makes any money from the poor people independently of how it how high it taxes them but it is a basically an incentive structure to make them move over to the somewhat more productive population because here it's assumed kinda that even the lowest skilled ones can move over a bit if you just tax them enough at the low brackets right so um this this is what i find to be you just have to realize that it is so hard i believe it is almost impossible to encapsulate what we really want in a system into a formula to be into a cost function to be optimized it is so incredibly hard and you see that here of course it is going to result in a better social outcome but it just doesn't feel right to tax the poor at what 60 okay so f the poor right and then you get to to this to this level right here and interestingly if you earn even more you'll be taxed high again right so this this um this we're kind of used to that you earn little you pay little you earn more you er you pay more but then comes this entire valley here what's up with that right like wtf doesn't matter and this can be this this is now of course the same reasoning as you have with this science formula here is where the rich people you want to tax them less so that they are more productive such that they generate more coins and even though you tax them less percentage-wise they will end up paying more uh money in absolute terms because because you basically encourage them to produce more so that is that is can that is the i guess the reasoning behind this but what you have to re you have to recognize what's happening here right what are we optimizing we're optimizing this productivity times equality right and what do we get you see you get two big values of attraction one here and one here and that means that this algorithm favors a two-class society right and i believe this is this is partially the limitations of this simulation here the fact that you're only a f4 agent the fact that you can only do two things either collect or build right it encourages a two-class society this specialization that you saw right so you say these here are the money makers right and these here are the collectors and it is very hard to move from one group to the other because if you you earn more coins as a collector you're here and you're really discouraged here if you move there you want to move all the way over here right now the people that are are already over here if they earn an extra coin that doesn't bother them too much so they're very encouraged to earn more money but the very the poorer people on this side they're basically discouraged from earning more money because the system needs them to stay at that collector level right so the system encourages the two-class society because we have not built social mobility into the into the into the equation we have not built a measure for social social mobility into the cost function and therefore the ai doesn't care that the poor people will stay poor and the rich people will stay rich uh it just knows that this is the best outcome for society overall given the cost function that we had again this just doesn't seem like fair to us like what we want we want someone to be able to make it over here right even if they start out from the bottom and so we'd have to we have to build that in so we have a system that is effing f the poor right no social mobility mobility no and then look at what happening at the end what's happening at the end this is beautiful very rich people these are the money maker right this is the this is the monopoly guy top hat monocle wearing scrooge mcduck bathing in coins this is where the the government makes their money and um the discrepancy is really stunning because you could also argue hey why don't we apply the same reasoning as we applied here and here right it's not is it not like the case that if the rich people if if you tax them lower they'll pay more money and so on i believe again this might be just a result of this how the simulation is set up so we'll move away quickly and we'll come back to this here is what i find particularly interesting about this paper which just confuses the heck out of me it is a double periodic game so it's an inner outer loop game what do i mean by that they have these episodes right here is the start and here is the end and they subdivide this into as we said 1 000 steps so an agent is here and they can do step step step step step and it can perform these actions this is the agent there are 1 000 steps here and the agent just tries to collect as much coin so this is your classic rl problem but also they divide this into 10 what they call periods and i'm just going to draw maybe four periods right so this thing here they call one period where the whole thing is an episode now the purpose of the period is that at the beginning of each period the government the government can impose a new tax schedule so the government doesn't only fix the taxes once but it can change the taxes over the course of the episode right now this is what i find i i just don't see why so now you're formulating the tax giving objective as a sequential decision making it's like the government saying well today we have high taxes but tomorrow we have low taxes and the day after that we have high taxes again and it just doesn't make sense to to for any government to do this um what you should do is you should set taxes once at the beginning of the episode and then see how that turns out and then try to maximize uh your tax schedule because all we're looking at um we're only ever looking at how the taxes are at the end right the things that we've examined are just the last taxes that the ai has issued we don't know the dynamic of what happens in between this might be super wild actually what the ai does in between and i just don't see the framing as a as a as a sequential decision problem and i believe this is just an over engineered thing because someone wanted a reason and here is the architecture right you see someone wanted a reason to put an lstm in there someone is thinking like well rl that means like sequential decisions and so on and rl in this outer loop the way i propose it would just be a one step per episode decision which is a banded problem and as we all know bandits are boring so they didn't want this to be a bandit problem they wanted to be a sequential problem and that's why they made this period thing which i find dumb um so another factor here and i'm going to tell you how this relates to the to the weird rich people are taxed high another factor here is look at this it's a cnn an mlp an lstm and an mlp and the agent as well and i can tell you right now the cnn has two layers two and the lstm has like 128 units in its hidden state so these are tiny tiny models and it is not a model based rl it's model free or else proximal policy optimization and the the um the ability of these agents or planner to learn anything substantial here i believe is just not uh super duper uh well right so the i i believe that these are rather dumb agents and you can see the tax rates given by the planner is fed into the agent model but i don't think that the agent given such a small model can actually adjust to these inputs because you have to do some pretty good logic in order to from these tax brackets to determine uh how you should act right now what i think is happening is the agent just kind of is aware of its skill level and through its rewards it's trying to maximize its in future rewards and then when the government changes the tax rate it will not i am almost positive it will not directly change its response to that but it will kind of observe that something's happening in the world and then adjust maybe a little bit its overall strategy uh but not in that particular instance and it will be delayed or it will be like an overall strategy and this might be one of the reasons why the tax brackets here might be screwed up because who says who says if i were this ai what i could do is in period one through nine i make the taxes really low for the rich people so i just encourage everyone to make more money right like come on become more productive and i get the benefits of that and then in the last episode and last period right i just freaking jack up that final tax bracket it's like you you have lots of money give it to me right and then you just redistribute what you got there to the poor people in the very last period and thereby you achieve your goal of this social welfare function but of course this is not sustainable because all the rich people would just be kind of screwed through that and move down again but it's the end of the episode so what are they going to do so i think the fact how this is framed that there are just two different ways to get coins uh the fact that this is this periodical nature of the outer loop all might lead to something that becomes slowly more and more and more uninterpretable uh still cool though all right so the final thing they do this with humans yes real humans so they let humans try it and they have this interface here and the humans they behave quite differently from the ai so there are a few different things where the humans act but look at that here ai economist this is what the agents do right so this ai economist is the tax strategy so just take these developed tax strategies and let the humans be the agents so that the you you just want to observe how the agents act and whether or not the tax strategies also work when it's real humans acting in this environment and not rl agents so compare this to how the humans act the humans they just build their houses in like neat little packets or straight lines or stuff like this i just i just find it to be very funny now there are some things lacking in the human environment which i find really important so first of all they have no cost for moving which i guess is minor but um second of all they have no trade and i think that is that just kills the whole experiment because now of course what you're gonna get is the wealth is just going to be proportional to how much you get coins per house which is different for each agent right so to me that that is now a pointless experiment if you can't uh trade because the outcome is just predictable and i don't think that the human behavior changes in response to the different tax brackets i think they'll just do and however they can make money they'll make money they'll build more houses until it becomes unprofitable and that's it so i don't see the i don't see the value of these experiments even though they show that again the ai economist outperforms the other tax strategies in this equality times productivity metric and also in another metric that they measure um the second problem i have is for the human experiments they take this distribution here they say well the a this is one of the distributions that the ai came up with but you notice the lack of the fu poor people and the lack of this big spike here for the rich people which i find um are one of the two features of the other distribution so i think there's quite a bit of variance in what this ai comes up with or maybe it's just because this is periodical but this is really confusing because they show and discuss that other distribution and now all of a sudden they say well we use this distribution that was also created by our ai and it seems to be qualitatively quite different in any case um let's look at how the humans behave under the um under the different strategies so in the size formula you'll see that yeah the light blue person here is kind of spreading out a bit probably playing correctly everyone else is just neatly building their houses look at humans are so territorial and most of them they kind of they kind of stay in their little corner and they're like this is my corridor i'm gonna build my houses here in a nice thing and under the ai economist again you don't really see a different thing just because the taxes are different uh the qualitative behavior is quite the same it's just building straight lines and here i think the difference is more between the humans so i think it's not always the same humans and um the difference might be more between the humans and you kind of see that the humans clearly don't haven't really trained or discovered the optimal strategy they're just doing something and you what you're seeing is just a result of the taxation uh it's not different behavior and this here this this is the best okay watch the on the bottom right the human they're just first they do something and they're just walling off walling up the other players and this is this is the best i'm going to build a big beautiful wall and i'm going to have the orange guy pay for it it's donald trump in the game amazing and look at the end they actually managed to lock in the other players so they can't move anymore donald trump wins amazing though actually the yellow player appears to win economy-wise but what do you want with lots of money if you can't move so i again i find these human experiments to be rather pointless here because you disable trade and you don't train the humans to find a good strategy all right but in that i find the entire paper to be pretty cool code is going to be released they promise and they have checked that they have no ethical problems of course i invite you to check out the paper if you like content like this please uh subscribe share and leave a comment of what you think thank you so much for listening and bye bye
Info
Channel: Yannic Kilcher
Views: 4,367
Rating: 4.9024391 out of 5
Keywords: deep learning, reinforcement learning, society, gini index, welfare, taxes, brackets, progressive, regressive, us, poor, rich, equality, redistribution, outer loop, world, resources, labor, trade, neural networks, ppo
Id: F5aaXrIMWyU
Channel Id: undefined
Length: 35min 5sec (2105 seconds)
Published: Thu Apr 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.