Tidy Tuesday live screencast: Analyzing African-American history in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Dave Robinson and welcome to another one of my screencast where I'll be using our in our studio to analyze data that I haven't looked at before as usual the data comes in the tidy Tuesday project there's a great project run by the art for data science online learning community so and as usual that will be she'll be sharing this live so for comments and I'll be and and yeah so look this week we're going to be looking at african-american history so quick note on this data before we dive into it is that a lot of the datasets I look at in these weekly screencasts are what you describe as fun I've looked at cocktails of a better animal crossing are crossing reviews I look a lot of things that are interesting that we can have a lot of fun with this is decidedly not a fun dataset it's about an extremely serious topic for the largest horror in American history I mean numbers are behind this data I represent a lot of loss of life and a lot of suffering and a lot of in justices that reverberate even to today so we haven't been going through it but it's um we're good to be trying to be careful about jumping to too many conclusions I think it's a bit this is very important data but it's some it's very important data but it's it's data that we're we're gonna try and do our best to treat respectively and with the seriousness that it it deserves how they said that it's data that we can learn a lot from including about the legacy of slavery today now YouTube's telling me that it's a little bit that it's having a little bit of trouble buffering are people having a UH could someone tell meetings in the screencast are having a hard time is the video loading is it buffering is it freezing one second while I try and fix this all right I've got a few internet issues all right let's see one second yeah okay can people tell me if it if it freezes it um I have tried changing uh let me see I know this is this is I've been having a internet I've been having some internet issues miss siding how to go about this all right let me know if all right let me know if you didn't please tell me if it happens again like be sure to bring it up in the chat so I might have things I can do to fix it okay so the story here is that we're gonna be looking at f4 data sets these represent um some data about the history of the slave trade and then um generally African American history looks like there are four data sets here I took up I usually don't look at the data at all in advance of the screencast I did it just this time just so I could have some sent I didn't look at the data but I did read through a lot of material and I recommend that you do too and I did take a look at what for the data sets were so I haven't looked at the data itself yet so first we have slave routes dot CSV so 36,000 transatlantic voyages Wow and a lot of information on the trend moving twelve and a half million slaves from Africa to the Americas we have census theta and that's from 1790 to 1990 so most of American history we have a data set of African names that looks like it's about specifically freed slaves with Africans that were freedom during the slave trade I'm on the way to America and a data set on black past I haven't seven dumb looked into that one yet so I'm gonna read these in and then we'll twelve try that we'll take a look at one by one and see what kinds of visualizations we can do again I mean having some internet issues let me know if it freezes up please Oh so I'm gonna use the tiny Tuesday our package where's my curious all right and I'm gonna do a library time Tuesday at a library Patti Tuesday library coyote bursts and twos data is here's our data for today here's our help page and I'm gonna start you start with the slave routes data said that was a first one mentioned all right so the freezing again I'm gonna try moving rooms and we're gonna see if it gets better it's not I might try one or two other things one second all right let me know if it gets any let me get oh if it gets any worse alright so um I'm gonna do is yeah is here today to set on slave rats and 36,000 routes and do we have we have the number of slaves arrived that's what I'm curious about is we say summarize some and slaves arrived looks like some of them I miss him alright we have a total of five million the way they were showing here I wonder what the distribution of those looks like I'm actually going to save this as its own data set all right so we have a distribution of this is more or less ship size as it looks like so we have a ship name port of origin the place of purchase and the now that I see we have ships we probably have some sense of count ship name you have some sense of okay a lot of them are missing but that we can see some of the larger ships and I wonder with these ships if I said that what are the sizes of well as a typical size of a slave ship it looks like this modal was about 200 and the I wonder if I said ship name is XE t lump ship name and filter not is in a ship name so here we have we have a I'm gonna actually say not I can actually filter for to remove the others so this is a trick I like for I want the twelve ships with the most data is say the lumped is not other and now that I've done that I can actually visualize here we go the size of the of the ship based on the ship name so these n slaves arrived G own box pop I'm gonna cord flip that so one thing that's actually curious to me is that I expected the ship sizes to be a little bit more like center that is say the San Antonio to San Antonio to be more centered around one number it looks like that's not necessarily the case so I can say it could be that these are different ships with the same name so if I reorder the ship name by n slaves arrived uh it's not that step is not working reorder this by its a factor I don't know why I'm not getting I'm I'm not seeing these ordered I expected to reorder by the median which would be very clear which would result in in like a clear difference in the data oh I know why it's because I didn't do filter knock is n/a and slaves arrived so of the ones reordering won't know that has to on yes so you veterans Oh somebody asked I think they're right can I just flip these two can I can I just use ship name on the y-axis drop the cord flip yes I can I've been up with Jim call but I haven't done with to your box boss so thank you faithfully I didn't didn't I hadn't thought of that trick so I can just use X and one Y and since driving the x-axis chip near the y-axis so one thing we see is this variation within each ship I got it could be different ships but I'm not sure we also see again this is the overall distribution median had the the mode had around 200-250 that's pretty remarkable so uh Rick wonders about ship variation over time I agree I'm interested I'm curious about the ships and I'm curious about also the ports I'm just about a lot of things in the distribution over time so if we say I start with size over time yes let's actually group by that actually let's start by I don't even know what the time distribution is I believe the slave trade was outlawed in something like 1808 in the United States but this is might not just be the United States there is it's yes we actually see we actually see yes uh there were a lot of ships in the 615 and I said not a lot there's a peek of them but it's just an enormous amount in the second half of the 18th century to 1750 onward so just think of this a little bit of a sense it's sometimes useful to some historical context so we can add y-intercept nope x-intercept is 1865 is the end of the Civil War and 1770s 1700 yeah I'm gonna use Independence Day 1776 even though that was this that was not quite this those the declaration independence it was not quite the start of the Civil War so we can it up right of me of the Revolutionary War so the UM we go the stories is that it's the slave trade in terms of number of ships and this is in terms of all ports of arrival we're not sure these um some of these yeah actually let's take a quick look at ports of arrival all right al many of these are not in the United States so it could have been like a it could have been slaves that ended up staying in in countries like Cuba or it could have been it could have been like the date that they um continued onward to an American destination but there is Charleston which is in the US and I wondered we don't have port we don't really have like hmm we also have port origin I don't have country here is one thing I'm really curious about country I'm curious or how the slave trade traded changed over time some of them we know Liverpool in London or in England and savannas in Cuba and it's really just it's interesting means port origin and then arrival arriving in Barbados it hits a lot of the Caribbean okay so I don't have a good yeah um I don't have to feel if I would do it would do for you for this but yeah this is the start as what yes what I'm looking at is number of ships over time this to me is like it's already really um interesting the slave trade hit its peak around the time of the Revolutionary War and then let's see title number of slave ships over time we don't know this data set is complete is another there's nothing in this so this is like a on issue so we might want to look at this also was an idea from Mike rise and follow different ports of origin over time let's take a look at that let's say you taped port origin is fck lump of cork origin by 12 and count port origin I'm going to count along with the decade so I'm going to say port of origin and decade is we've used this trip before ten times year this is truncated division 10 and here we go I've got a bug you know is that year it's your arrival all right yes and then we we look at port origin and nope nope decade and port origin and color equals court order so we try looking at how the ports changed in their frequency over time alright so many shifts and I might want to facet it I might also want filter not is in a port origin and I think I want facet by the port origin leave this getting them on the same scale oh yes alright and then I if I do other I perform like 15 and I can add I'm gonna get rid of the legend so this was a few tricks to try and make this a slightly more a slightly more readable plot I'm also going to change the theme all right so a hoss somebody has a question that I think is it is a really good one which is this from Andrew 1807 is the official British hat to stop the slave trade is there decrease the state from British ports I wonder and indeed you actually take a look you see London and Liverpool were really common slave ports and dropped right off after 18 after the 1800 decade so those are two two notable points I wonder if we can use this the problem again I can't cluster them is Bristol in England my geography is absolutely terrible it's personal in England um should the Bordeaux is in France I'm gonna actually reorder this I'm gonna say we have a lump it and then we say for origin is FCP reorder by Annan reorder by the sum so not and it will be called I actually need to I need to I can give it anything give it your arrival Oh actually what if I really meet Ian year of arrival then actually I think might be more interesting than the other things I was looking at yes so I'm actually reordering it by um by you have arrival I'm gonna turn it into again I wish I could lump some of these together but I'm going to turn this into 25 including other let's zoom in these are the 25 most common slave ports that I sure them on the same scale intentionally because I don't want to compare ones that were enormous slave for it's like Liverpool which I did not know was was one of the largest slave births we see here two ones that are at least relatively small though though really any slave ship is more too many Charleston is curious for popping up to have a lot just in one decade huh that's interesting um you know I can turn this into a history this is kind of a histogram yeah so the what we see then is there's definite there's no question there's trends over time I couldn't on these as a boxplot similarly but I like this a bit more where I see like what is the the up and then the down somebody asked if alas the what is just a quickly fixed long facet names that extend the box with a to approach so for example Southeast Brazil according specified one approach is called a string trunk string trunk so I'd say court origin equals STR trunk court origin by let's say I don't know how many letters gonna do but let's say we say 15 letters now and either longer than 15 letters will be check this out will get a dot dot dot instead 15 is too short but if I try doing 30 yeah we can look at 30 to 30 30 is too many up but look even here's his South it was El Corte unspecified maybe 25 was the right length the ideas that we can truncate it to make it a little smaller the alternative is we change the facet label of the size so I think theme element no mm text I don't remember what the what the the text ggplot2 facet wrap text size there's a name of a theme option that I can't remember at all act weight strip text yes so to do is say strip text equals element text size size equals and I can choose a font size here this is anything too small yeah way too small so the stories that's the other option I have is to change this yes all right so um that's just an answer in terms of but you almost always want me to adjust it a little bit anyway like this you know make it make it a size that makes it readable and an informative all right so um these are 25 a question I got is who she's in GG animate we can I see the way we would do it in GG animate would be to do a bar plot that shows spikes and decreases over time I'm going to do that because I think this actually does communicate a lot by itself and lets us study in a little bit more would you write that's definitely a different way with your visual has the same data there's a big difference in 1807 and I think slave trade outlawed us I think 1808 no it was it was 1807 Howard England alright so the slavery trade not slavery was abolished in both England and the US and 1807 so one thing we can do there is we actually saw this over here we can add a gob line so one that we can see from this is there are ports like Rio de Janeiro so general Brazilian ports Rio de Janeiro except bazzill Havana that became ports of origin a lot more frequently I shouldn't assume they became ports of origin a lot more frequently after the the acts in 1802 1807 curious to me that New York spikes ever spikes after a you know said I wonder with that I really wonder what that was a la what that was about actually so one thing that's interesting about this data is that I kind of point about it is that every single every single row every observation is a tragedy every one is a rips it's a tremendous amount of suffering and injustice so if I say New York arranged so sometimes they wouldn't dig into individual points it it's like looking at the trends but here there is something really important about what were the last ships that originated from New York and and we take a look at those and each one like it has a story it could be it could be that it's miss labeled data but I do we do see here this is the surprise me it's during the Civil War and New York was of course the Union State so that's really mmm let's look let's look up this story 1863 I'm not seriously I'm not seeing anything there I'm gonna try another one of these it's curious to me I just hmm Michael Joanna Lewis if I try to give him the name if I try get a name is that gonna help if I say I was trying to Google this under n do see where I can see Ronald Lewis was a female slave trader I'm not seeing I don't look good understanding of one of these New York chefs I think it might be worth digging too so one more question that I'm a different point what one more question that I got was why use different colors this is a trick um using different colors along this it doesn't add any additional information this just helps me a trick I use it a lot by habit I learned from Julius sale gate adds a bit of color to a graph that otherwise would just be kind of like show lines but it doesn't it doesn't hide all that much here may be black and white makes makes more sense so this does communicate a few things tells us there's a ship a geographic shift in the slave trade for it that happened in 1807 when I was outlaw when the slave trade was outlawed in both the US and England all right all right so are other questions we have do ships - one question is do shifts change in origin all right yeah but that's the nature question if we try like looking at one slave ship let's see I'd include the same filter I did earlier where I say filter f-ck lump ship name twelve is not equal to other look at just a hand a couple of just twelve ships and then we'll apply as for starters count ship name port origin and what I'll then do is say group by ship name you take percent I'm curious is it usually do most ships have one that they mostly come from and then do a lot of sorted so top and I'm grouped for that matter so one percent okay so this is what was each ships most common point of origin and one thing we see is that oh and I'm gonna arrange that yeah I'm gonna arrange by descending percent she's gonna feel for this a couple ships had majority left from particular port but others left from so others like left from a variety of ports and only say for instance a ship name called Africa actually left no poured more than twenty percent of its of its journeys so some had particular ports some foil our particular arm destinations as well and let's actually look at the port what was the other one called port arrival so these are the twelve most common journeys and for instance this one did it from Bahia to us another quarter guess Bahia something so this would look like it was an e so this ship was a kite was in Tariq was like I'm intra-regional and yeah and a question someone has is is there a practical way to summarize and plot the itineraries in a map not of the map and at least not without a lot of work because I don't have longitude latitude here I don't think I have it in any of the twos data and data sets I think that if I look through here I'd census I had names I had out see looking through here Hugo okay I just wanted to read through some of that some of this context is really notable I can't map the itineraries I don't have longitude latitude and matching these two countries it's gonna it's gonna take a little work because you can tell like some of them are states some of their own country similar regions could be done with it with a little bit of time the closest that I would do in this time sensitivity and yeah we're halfway through because it start at 5:30 the closest thing that I would do is count the overall connections so actually say and I think guess eric has a good idea here which is a network diagram so I think we're gonna try that we're gonna count court origin court arrival and I'm not looking just at one ship this sort true what are the most common um starts and destination I'm confused as to how so many started or originated arrived in the same city I'm a little bit I don't know these two make a little more sense because maybe maybe it's two ports within the same region that's puzzling to me but I'm gonna go ahead I didn't actually say I'm gonna drop those and look at ones that are in between ports that also will drop the n/a as well we're at it and looks and then we'll take a look at what are the most common this is not like this is not this is actually just in terms of wrong number of ships what I'm gonna do is turn this into into a network but I'll say as we do I I do when I do networks I use GG rap and library I don't need that here I can say cheesy wrath and then I say here are the 30 most common connections and I put into a G giraffe I like this layout and then I throw in giome add GM edge link GM node point and we have connections we actually could create a quick I have connections now it's not what we need yet we really need the text label equals name let's do what pail equals true so they're all readable let's also set a seed I've done this a lot including in a lot of so we can build this now pretty quickly but notice that ports that these are directional so two things we need we we'd like to do we would like to say size equal of the link equals the N this is it does not like my edge a width okay yep and the second is I want to make it arrows so that takes a little bit of a step where I say arrow is went through is grid arrow video library grid I won't hurt anything will it I'm gonna do grid arrow type equals close that thing for this one these are too large I'm gonna make the length is unit I think that's how we specify ones I want that length the airhead yeah that looks pretty good and the other thing that I need to do is I prolly need to make the size scale size can't shoot its edge size continuous and say range is 1 2 3 is 5 point 5 to 3 I don't want I don't want the edges to be so thick range is what makes this like how big are they are the largest points this is not working its edge width that's what I was going for all right so this gives a sense of some of the most common actually I'm gonna do there's always experimentation work we can add a little bit more information at a time ok this shows us what are the common routes I'm gonna fill in a little bit more and more information so I'll say I'd say and I'm gonna add title is common slave rats size is number of ships like to add more information I'm not sighs I need to say there was a keep missing that edge width equals some tricks of ggrf all right so the um so yeah what we can see here then is like um is what I'm seeing is there are I want some Southeast Brazil to Rio de Janeiro those are within Brazil that's a really common route there's other like like routes just within a country this is it looks to me like a cluster within the Caribbean the and oh not within Liverpool is really notable so we actually see like Liverpool as a as an origin point that then ends up in a lot of Caribbean countries as well as the Charleston in the Americas and Bristol is another English port I go into Charleston going to say kids so generally a flow from England to the Americas is what this this did this part of the data subscrip describes the Lisbon is in Portugal I don't know where Peron Murano how are and yes somebody asked how to interpret the relative position between these clusters it's random so there's no there's no reason this got attached to here um if I ran it again without resetting the seed band up in different locations it's just a graph without algorithm alright Mike points out this really detailed documentation about how are we different how do we define these think that's a really important point in particular we shouldn't assume that these are these are anywhere close to all slave ships um these it looks like it describes we're all slave ports are all origins let's like this just happens to be the ones that are included within this data set and I think it says there's some really notable document methodology and details here yeah yeah this is a really really good set normally if I weren't analyst I am on a time free on a time for him I would definitely dig into each of these individually things like Geographic physics geographic data and so on you know all right so yeah so this this is--we're seem like some of the patterns and some of the the common paths that that occurred absolutely asked what a set seed mean notice that every time I create the answer is that if every time I create this visualization this network it's gonna look a little bit different look at will follow that that big cluster wait these aren't looking different I thought that one looks different so I freetown move from here down here to up here but if I set the seed first that means that all random numbers generated afterward will be the same so it means if i graph this multiple times it's gonna appear in this it's gonna be in the same way so that also is important things like simulations and set the seed first I often set a seed based on the current year all right so we looked that was looking a lot of at slave routes we saw um one thing that I think was was really relevant in this description is is that they estimate that it's ten point seven million enslaved Africans because you you look at the number they even though there's a lot that have missing into people it looks like they like there's actually a really notable thing as I said earlier if you if I sum up actually I'm gonna do this is a summary if I say if I take this and I summarized some and slaves arrived in a are M equals true I get a total of five million but this is misleading because a lot of the data is missing if I said if I said total recorded is this if I say PC keenness you mean not an amine isn't a half the data is missing so what that means is ideally what I would do would rat would be instead of looking at um instead of looking at the total I would actually take estimated total I would take this the mean of this number with na RM equals true and multiply it by n if there were no missing data it wouldn't make a difference we actually get a much large much larger number if we if we take the mean and the total looks like that you look at hmm all right there's actually a couple other yeah there's a couple of details in here I'm not getting exact the same number not going to I'm yeah so so I'm already too much more into this data but I want to note that as like isolate total and I can also quickly say let's group this I wanna go say n chips let's keep this from a port of origin and this is good a sense of um Liverpool ma'am a 1 estimated just based on like then assuming they did that them it the data was missing at random which is which is just starting assumption it estimates like 1.3 million enslaved Africans left Liverpool over the course the state I said that's really that's really remarkable I think thinking when is one of the corners of what they called the triangle trade so that's um it is really great is very it said it's not fun data it's also very sobering data to think about just the the tremendous size that what are today like in first world countries where today were how much suffering was passing through okay so the UM alright so that was so that was taking a look through slave routes I think let's move on and look at one of the other datasets I'm really curious but I'm curious what a lot I am curious but a lot of these I let's take a look at the census the Census choose data census so we can that we see here's each observation is each observation one year let's find out no there's duplications because his regions okay so this is what we can do a couple of visualizations to have a sense of of white/black free and enslaved so this does let's get a little bit about context here because this is ignoring a large racial makeup is my first question is is white and black I'm guessing it's only a percentage of the total so let's actually let's let's first take a look at the context categories directed social usage rather than attempt to define race biologically or genetically from 1790 to 1850 the only categories recorded where white and black um so if we say toad um actually curious is unaccounted if I save total - white - flat alright what we see is and I arranged by year I just want to have a sense of like okay until 18 yeah up to 8 including 1850 there's no there's no other there's only white and black and that adds up to the total um alright so it's already erasing some racial history certainly but the UM I shouldn't call it other alright but we I'm just gonna like then I'm gonna add all right I'm gonna add other ok and I'll call this other and then say we have a total northeast south and then we have divisions myth northeast south I only see Midwest pop-up here okay I guess Midwest there wasn't exactly there wasn't really a Midwest there's north regions kind of appear all right this is where I wanted to look at is I'm thinking it too like ways that I can visualize this data and we can visualize by breakdowns we can visualize it by totals let's start by visualizing it by totals so the way that I'm going to do that is by let's see I'm going to include let's see white black free black slaves black slaves I'm gonna I'm going we're interested today and looking at the history of slavery in America so I'm going to break out black frame black slaves as separate categories for this one so I'm going to say what I'm gonna do is gather to say racial category and and population and I'm gonna do that for white black free black slaves and other so what I just did is I broke I gathered I pivoted this table so that it includes multiple rows one for each racial category they do that because it makes it a little easier to visualize I'm gonna start with just the overall region and what I'll do is say population as first will do is show it as a bar plot so this is the population by census data over time curious to me I don't have data I'm not seeing data right now the ominous point is that because the region name changed no I only updated 1870 I thought I had 90 up to 1990 alright I only have did at 1870 that was when slavery was abolished slavery was abolished in 1880 in the 1860s so we can get we can still get a lot out of this data but hmm all right I'm gonna but I did think it was till I tell my connect all right so I'm going to use the these regions this is showing pre Civil War United States what was the breakdown of and starts 1870 so that presumably was the first year there was a census even though the oh no sorry I take that back it started in 1790 there was the United States at that point um and yes the Constitutional Convention have been held was that when there was the was a call be um preceded it I'm not sure so but stories there was a census ok so what we can do then is say fill by racial category and get and get a breakdown here ok so we then see is other is naturally very um very rare this dataset it's really remarkable Wow yeah the the fraction so the population grew so first I said I looking at the total and I was like this census population so a couple things to note from here again are that yep so it looks like there was almost no free population it did grow over time I can't tell whether a group apportion utley so what we do then is honest ace subtitle census us 1790 to 1870 and no no other category existed before 8 what do we say it was 18 for 1860 alright alright so the um so that's actually what so there's one visualization way so the total population last thing I would do to make to make this more readable you say labels is comma and I need to do library scales for that the professor made all these collapse not sure I need to say every scale all right and then so then this is showing like the population and how it broke down and then there was the Emancipation Proclamation in the 13th amendment in the 1860s that led to that the Vth were free category all right thanks thanks no point I would use command option shift o to uncollapse it but I didn't know that time thank you do that so the UM all right so that was one way of visualizing this the other way could visualize it is as a percentage and that would be after oh and um let's see I'm gonna call this census gathered and the other thing we can do here is break it down by region so let's say here we go reach it's let's start with region is not equal to total and is an a division let's not yet break it down by division and let's facet it region Wow so they said this does get it across so the West naturally was the the West was still barely being settled in the I've didn't even it wasn't even part of America until 1850s and I'm still being settled at the time but one thing that's very clear here is how much population growth there was among southern blacks and that really um so there were so most slavery was I don't know one whose abolished in in in various northern states but at least looking was relatively rare at points in thee by the by the mid-1800s not so in the Midwest the Midwest was still it look like at least so but it's still both these were still overwhelmingly fell into the white category where is one of these is really communicating is how by the 1860s the southern population was perhaps a third enslaved African so this is it's really um it really actually does this kinda is a remarkable display if I try breaking this down by division I'm gonna do a little bit of reordering I'll say yes beyond division division is SCT you reorder I usually wanted you're descending um here so I'll say reorder the division by the population the some of it is sending a so I can actually do negative population and that gets it across but the largest regions up on the alright so that does get across that was mostly South Atlantic East South Central where the the enslave black population was growing and the only way I can visualize this is as so last thing we do I could try clean these guys probably make this look a little better I'll say racial category not look better but um category is spring replace yeah I don't know I'll skip it I was saying that here I've still got these underscores alright the other thing that I'm going to do here is less of the is turn this into percentages so the UM so we have this is one visualization I'm gonna do one other one I'm just gonna copy it paste it and say group by division and here and do percent is population over some population I did this I did this I guess this is the percent within each racial category change the labels two percent so this is if we want to focus on something different this is where we don't focus on the totals and we instead focus on the the makeup this is I think actually called a spine a gram and the UM you know what this this does get across that the enslaved population was somewhat consistent from the American Revolution to them in eighth century in the process a percent of total was consistent in the South Atlantic and growing in East South Central may be shrinking a little bit in West South Central I don't have these on a map in we can also see here that slavery became rarer in the Middle Atlantic East Osetra New England though every sliver here was there there absolutely were slave plantations and even say in in New York in the in the 1700s so the UM and naturally the of course it's not to say that even when slavery was abolished there that that there was the universal civil rights very far from it so this is two ways we have of breaking down the census data all right um I think that's what we couldn't get can get from the census data for the most part the UM go all right and I got one one request which is looking at the African names data set yeah so we're gonna do one more data set fine that's Tuesday two African names well we're seeing here okay is name gender age these are freed I'm going to check the sin again this is also an atlanta-based condemned over two thousand vessels for engaging the traffic coordinate emails of captives found on board including their African names she's an African named Dave data said not not slave names and um I this is definitely this is absolutely interesting if we then look at what are the most common this is a dataset of 91 thousand names we can see some most common across men and women I want to request that God was to look at this is a word cloud we can do that I don't usually make wood clouds but I think it is it is somewhat evocative here so I'm going to try the word cloud package and I'll say name counts I'm going to look only at the most common hundred names and I do that next step name Council there were some keeping gender because I'm curious we couldn't try coloring by that and what I'll do is say word cloud of name counts and name counts name name accounts n so this is I've used the word cloud package before and it looks like this alright so then we see as you know some were too common to be plotted that oh no oh I am did I do a filter let me see yeah I need to do a filter hmm says oh do not be fit on page but does look like it's there there's a G G word cloud package isn't fair I'm curious all right I'd like to use that I really do like I really do like using duty bought two for they prefer using just bought two for these only popping up is a name that could be a Miss that could be miss Cody I'm not sure and let's take a look and library GG word cloud okay somebody suggested word cloud - does that a thing birth cloud - there's a word cloud - a package oh cool hmm that's interesting Mike angle I might take a look at that and I start by using the G word cloud so GM word choose your word cloud genome text word cloud okay then if I were to do is I'd say head 20 which is 20 for example and say ggplot text equals n wait let me look up Geum text word cloud hot area all right and the I'm learning to use as I actually haven't used before I just kind of figure that I might want it we go is it up sighs yes okay that makes actually that makes less sense okay so what I do here is say label equals name size equals size equals m and genome text word cloud all right and try 50 and try 100 all right so this is simpler kind of word cloud where it would it uh that's what I just looks a little different than here and I'm gonna do is then add in color equals gender we're not I'm not seen as odd to me I'm not seeing I'm a legend I think that could be a bug with them with GG word cloud not sure oh did I seem like huh all right so this is Lisa this was a word cloud um and I've broken down by um by gender there are some common names that parently that are that occur for most men and women like hyena there are other names that occur we could lump together man and boy woman a girl if we assumed that this could be a bug or it could be um we're cooler could be made a name like so we with this team a look at some common African names this is a way to do a word cloud I did get that that request well I actually prefer doing in general is to say let's actually do this as gender is FC Chirico let's actually do it do some quick recoding and say a gender-neutral gender at that took place boy with man and woman with curly we wouldn't it could be changes over time and therefore an age but it wouldn't sort of wouldn't expect it so I'm interested in both the age and the gender here we might just want to look only at names and gender and answer the questionnaire he IMF use local times because it's boat appears to be both a male and a female name in Africa so the UM alright so so the next thing to do is if I said oh yeah if I said group by gender top and xx by and plot the name and I'm I just as I need to do a I have to actually do on you from the life so I need to actually to show something if I reorder name by n and I plotted here as around I'm going to get out scales they're not going to be in the same order because some appear in both graphs by gender to happen and it's odd to me that I'm gonna say filter not isn't a a gender it's they're only five names boy being in this list is surprising to me are there only five female oh I did I had 100 here that was that was that was mistake here the reason they're in different orders there that some appear in both facets there's a fix for this reorder within from the Thai detects package we are a name with in gender and say so this is how would say common African on names via see and when I do this they actually takes one extra step scale why reorder and that didn't work name equals we order within name and gender anyone a scandal why we ordered not working for me oh is it possible oh oh I did the counts after I did the reorder oops alright yes I did the counts after four I did the recoating that was a bug of mine so what I do is do the recoating before the count some rearrangement yeah alright there we go so these are common male and female names among freed African slaves I definitely think Blake would be above musical coding issue alright so this is then I think it's really it is notable because this part of history was largely erased once slaves came to UM this is way they took on slave names and it is easy to see this is some of the UM some original names we do you see mixes of Joe could have been people that got this name work could be it or Joe could be an afternoon I'm not sure alright and that's looking at common names there's a lot in this data we didn't have a chance to dig deeper into these this free data we could have looked at that I think it would have been interesting to look at changes in name frequency over time or for that matter change it changes in gender of a time if this is not a random sample this freed slaves but it could give a sense of slave gender breakdown over time there's so much loss of country origin there's so much that can be done with this data I really encourage you both to analyze the data yourself and to read in a lot of the background information that is provided in the tidy Tuesday read me a lot of the links documentation and and so on as well as some good cause to be donated to yeah so this is the UM so then this concludes this screencast as I said it's not it's not enjoyable data but it is really important and it is it stayed the way we only barely scratched the surface of a really enormous amount of history last thing I encourage you to to look into Juneteenth is a holiday to be coming up this for this Friday that celebrates when the Emancipation Proclamation became widely known within the south so the day that slavery was was largely ended in the in America and I think it's a really good time it's probably really good date to reflect on this history and to donate to worthy causes some of which you have links here so um thanks so much for thanks so much for joining i we made a couple grat we made a couple graphs have learned a lot about the breakdown of the of the slave trade and about the breakdown of population and a little bit about some common names hopefully you learned a little bit about the tidy verse as usual as well all right so thank you so much for joining I hope you learned a lot I certainly did black lives matter and I'll see you next week
Info
Channel: David Robinson
Views: 2,001
Rating: 4.8888888 out of 5
Keywords:
Id: 2L-jA-Me3zg
Channel Id: undefined
Length: 61min 0sec (3660 seconds)
Published: Tue Jun 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.