Y'all Are Nerds (According to Math)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what you're seeing here is not a supernova or some weird virus or even an abstract representation of my ever inflating ego it's you all of you or at least a representative sample of all of you built from your channels and the YouTubers you like to watch you see recently I passed a bit of a YouTube subscriber Milestone and I know it's common at this point for a content creator to tell you a little bit about themselves maybe endear you to my personality or use the word we to refer to things that I did to establish a connection with you the dear viewer but I figured if the occasion is to celebrate the number of subscribers then this video shouldn't be about me it should be about you so to celebrate our accomplishments I wanted to get to know you a little better so that's what this monstrosity is it's a network built from a random sample of my subscribers and who they're subscribed to after all who you watch on YouTube in some way reflects who you are so in order to get to know you a little better I wanted to look at the relationships between the things that you watch and and luckily for Me Network theory is the math of how you do that so today we'll go into how to build this network which is probably the biggest Network I've ever had to work with and we'll see what it can tell us which is mainly that y'all are a bunch of nerds but multifaceted nerds with diverse interests and hobbies and of course you are not representative of the wider YouTube audience so using the tools we establish will also build that wider YouTube Network and see how it's structured into community commities how we can find those communities and what we can learn from that but in order to do all that we first have to figure out how we'll build that [Music] Network conceptually the network we're exploring today is fairly straightforward we'll compile a list of my subscribers and for each of you on that list we'll also fetch a list of the other channels you're subscribed to from that information we'll construct a network however every time I've shared this project with someone their first response is has been you can do that and well yeah I can anyone can but only if your profile is set to public so I guess if that's something you don't want then this is my PSA for you to change that but knowing that we can doesn't answer how we can do that because that's not as simple we need to query a lot of subscribers meaning we need an automated approach which means I had to learn how to use an API an API or application programming interface face is just some code provided by Google that lets us ask their servers for information such as subscribers before making this video I really had no experience with apis while that would have made a great segue to a sponsor segment if I had one I don't and I'm not going to pretend I'm now an expert conceptually what we're asking the API to do is nothing difficult first we ask Google servers who is subscribed to not David this gives us a bunch of nodes representing all of you my subscribers as we'll soon see these numbers are about to balloon very quickly so I limited this step to about 2,000 subscribers we'll also draw an arrow from you to me to indicate the direction of the subscription and form the first layer of the network network researchers actually have a special name for this kind of network it's called an ego Network because I have a large ego I mean because the network has generated from my perspective in the second step for each of those initial subscribers we retrieved we can use the API a second time to return who else their subscrib to and just as before we'll use the arrows to indicate the direction of the subscription technically because of these arrows we could call this a directed ego Network though I guess now that I think of it you could argue that this is actually an abstract representation of my ever inflating ego hm anyway after the second step the network really explodes from those initial 2,000 subscribers the final network has 268,000 channels and just under a million links connecting those channels this network was so large that all my previous tools broke and figuring out how to even animate this ended up becoming the largest bottleneck in the video but regardless of how big it is what we're really interested in are the channels that many of you subscribe to it's these highly connected notes that are really going to teach us a little bit about you and your general preferences like for example which channels do you all watch the most on average having all this data from the API I couldn't help but wanting to know a little bit about how some of my favorite channels are represented within this network I've learned that 9% of you probably read reviews about toy spiders on Amazon 133% of you get a little too excited when you hear this chord 2% of you know that aliens are likely to have independently invented Tetris 18% of you use powdered based detergents and lastly 7% of you probably have sins you need to confess it's always fun in a self-aggrandizing way to see that people like the channels that you like and it was at this point that I really started to think my audience is a bunch of really cool people but what about the channels at the very highest number of incoming arrows the ones you subscribe to the most taking the top 10 channels we find the following so we see that about 60% of you are subscribed to veritasium and Vsauce and Tom Scott three blue one brown and Steve mold amongst others notice though that this doesn't really say anything about my Channel's uh quote unquote compatibility with anothers as an example ulot subscribed to three blue one brown and Mark Rober almost equally about 45% each but Mark Rober has five times the number of total subscribers so you could argue that the compatibility of my Channel with Mark's channel is lower than with three blue and Browns now compatibility is not important to today's analysis so we won't be going down that road but I do encourage you to think about how you would account for it there's no one answer here but but to give you a hint of why I think it's an interesting question I'll just say that I don't think knowing the total number of subscribers is enough or maybe even helpful if you disagree though let me know one last thing we should look at before we move to the next section we've looked at the channels the highest number of subscribers but what if we look at the distribution of subscribers across all channels big and small we want to make What's called the IND degree distribution where in degree is just another name for incoming arrows if we plot the number of subscribers that a channel has versus the number of channels that have that many subscribers and let's put in log log skill for the fun of it we'll find that oh it's just a straight line well that's not very exciting all right never mind moving on at the start of the video I said yall are a bunch of nerd but I don't think we have enough evidence to suggest that definitively at least not yet that's because our Network tells us the likelihood of you subscribing to Channel A or Channel B but not the likelihood of subscribing to both Channel A and B it's a propensity to subscribe to multiple channels that make someone a nerd fortunately a co-s subscriber network is exactly what we need to test for this to make a co-s subscriber Network or more often called a co-citation network we'll simply link any two channels that share subscriber and then get rid of the original directed links the more subscribers to channels share in common the stronger that link is and that's really useful the stronger the links are the more similar we can assume those channels are we're der having information about relationships simply from the structure of the network now making the co- subscriber network is actually kind of easy for you math heads out there a co-citation network is just a product of the adjacency Matrix with its own transpose which is to say we can literally do it in one line of code but just because it's easy to write doesn't mean it's easy to run on a computer you see as I mentioned previously the network we've been working with had just under a million links which was definitely at the limit of what my computer could render this new network though it has 62.5 million links that's a lot but where are all these extra links coming from it all goes back to how a co- subscriber network is made and how many channels you subscribe to suppose we have a subscriber that watches n channels then each of these channels need to be wired together and this formula is the amount of links you need to introduce in order to do that and that's actually a huge amount for example if a subscriber subscribes to say 1,000 channels then we have to add 500,000 links now you might be thinking who in the world is subscribing to 1,000 channels and my answer to you is apparently a shocking number of you in fact about 10% of you and like what I mean please don't get me wrong if these are real accounts and you're one of these individuals watching right now then the only thing I can say to you is I'm genuinely honored that out of everything you choose to watch you'd watch my video but it is a little suspicious right I just can't get it out of my head that these are potentially Bots and that's really not good Bots don't leave comments smash that like button or ring the bell for nominally instant notifications so if a sizable portion of the channel is not engaging with the content you produce then that hurts your channel of course if you'd like to help my channel instead of hurting it then consider checking out my Pat now because I can't prove these are Bots I decided to leave them in instead to account for them I've kept only the 1% strongest Links of the co- subscriber Network which will hopefully reduce the influence of any spous connections so finally here it is the not David co- subscriber Network as you can tell this network is rather busy to make this more digestible let's just focus on a couple of Select channels for example if you subscribe to Super I patch wolf whose video on Riverdale I've watched way more times than is healthy and highlight only the nodes directly co-s subscribed to them we can see that you're more likely to co-s subscribe to channels like Jacob Geller H bomber guy summoning salt fellow Alberton folding ideas and legal eagle or let's say you subscribe to polygon then you're also more likely to subscribe to Brian David Gilbert which makes sense given Brian used to be on polygon or let's say you got into vintage Victorian fashion YouTube because 2020 sure was a year then presum you're also more likely to watch other vintage Victorian fashion YouTubers which is reflected in this network as well obviously I could keep going but hopefully you get the point which is that you are in fact nerds but there's more to it than that keep in mind there are plenty of Science and Technology channels that I can think of that did not make it to the 1% of co-s subscribed channels instead there's music art fashion movies literature and a bunch of other weirder things really weird things so yeah you're nerds but you're not defined by your nerdiness there's more to you than just that and I guess as much as it pains me to say given this is my co-s subscriber Network maybe that also hints that I too could potentially be a nerd oh god um what I think is really cool though is that within this network we're getting a view of some really diverse channels ones that you wouldn't normally categorize with the science Community it's like we're getting a glimpse out of our own community and into the wider YouTube Network so then what does that wider Network look [Music] like if our original ego network was a connected Community like a small town we can think of the wider YouTube Network as the equivalent of zooming out and looking at the globe and here that Globe is or at least a rather small sample of it how I made this network is somewhat similar to the ego Network we built initially there are some nuan es and caveats which I've left in the description but right away you can see that it's a much looser Network we can quantify that looseness using What's called the network diameter which is defined as the longest shortest path between two nodes in other words if you find the shortest path between all pairs of nodes the diameter is the longest of these in this example the diameter is five but if we add a single link between these two nodes the diameter goes down to three in the case of the ego network from the start the diameter was also three which given its size gives you an idea of just how compact the network is the YouTube Network however has a diameter of 11 which is why the network looks so loose and hopefully that makes intuitive sense we're no longer looking at data from a community of people with very similar interests but rather a much broader group of people with much less connecting them hence a bigger diameter but if somewhere here is our community it begs the question can we find it and what other communities are represented here perhaps one of the biggest areas of study within Network theory is tackling exactly this problem Community detection if we look at a simple example the idea of communities in a network is at least visually intuitive here we have three sets of densely connected nodes forming three distinct communities identifi by color but I've drawn this in such a way so as to make the structure obvious if I randomly arrange the nodes which is much more representative of a real Network it might not be clear what the communities are so whatever algorithm we use for Community detection shouldn't depend on how we drew the network more rigorously we can Define communities as divisions of a network such that the number of links going between different communities is low while the number of links confined to the same Community is high so imagine if in the original example we had actually guessed that the community structure was something like this through the magic of rendering two of them we can see that the incorrect guess not only has more links going between communities it also has less links going between members of the same community in this simple example we know there exists a different partitioning that better captures the community structure but as you can imagine this is much more complex in real world networks one popular method of community detection is called the L algorithm which is just a smart way of going through the network and minimizing these intercommunity links applying the louane algorithm to the YouTube Network you can see that it's discovered several communities but how do we actually check to see if this classification is Meaningful of course we could just go through the community's channel by Channel but even for this relatively small sample of YouTube that would be next to Impossible instead what we can do is take all the usernames belonging to a given category and make a word cloud showing the frequency of the most commonly occurring words so if we highlight the beige Community here we'll find the most reoccurring words are gaming retro and barbeq I don't know about that last one the teal Community is characterized by words like Garden vegan and life the red channels are music audio and sound while the singularly focused orange group is gaming game and games blue is admittedly more of a wild card with animation podcast and James and finally the purple group is science lab and physics and this is presumably where we would be if our channel was popular enough to have been picked up by the API maybe one day so I think it's fair to say that the lane algorithm is working using the structure of the network the algorithm is identifying communities of channels that are highly related which tells us that YouTube is a highly modular website and I hope from this you're starting to get an idea of how these tools can be applied well beyond this YouTube Network for example how regions of the brain are actually just communities of neurons that respond in similar ways to certain stimuli and how these communities can be uncovered using approaches like the louan algorithm you can go One Step Beyond that and ask how do these communities change as a person grows older or learns a new skill or becomes sick and so on and so on that's why I love Network Theory the same math can be applied to something fun and silly like Pokémon or something as so important as the brain and your quality of life one last thing though if you had to take a guess what do you think was the word that occurred the most often across all the YouTube channels I had I'll give you a hint it isn't a word per se it's just four letters here's a dirty secret about creativity much of it is just seizing on connections you don't really feel responsible for maybe you're trying to think of a way to explain pant statistics while watching videos about speedrunning and an analogy hits you like a bus maybe you remember the Pokémon theme song ends with you teach me and I teach you and think this could be a good way to end your Pokémon themed education video even if almost no one notices it these things aren't planned out and there is a large aspect of luck associated with it but it also doesn't occur in aent vacuum the more Wells we can draw inspiration from the more we stack the deck in our favor and the easier making these connections becomes if there was a single piece of advice that I could pass off to my undergraduate self it would be to not tunnel vision into my studies go paint a picture make a song read something that's not a textbook or even just watch a movie there's a balance to these things and this video was my attempt at showing those connections in a more lural way and as much fun as learning about who you all are has been I'm mainly thankful that given all the other amazing channels you subscribe to that you choose to watch what I can only describe as my attempts to show future employers I have marketable skills I guess what I'm trying to say is Google if you're watching please hire me I know how to use your API now
Info
Channel: Not David
Views: 306,207
Rating: undefined out of 5
Keywords:
Id: o879xRxmwmU
Channel Id: undefined
Length: 17min 42sec (1062 seconds)
Published: Sat Oct 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.