5 Ways to Organize Information For Small to Large Data Sets

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody so welcome to another edition of reading fun in the sun today i am out with the puppies uh we are going to go for one group and play some frisbee uh but afterwards we're going to oh here he comes now he won't leave me alone all right so we are going to be reviewing a book all about the beginnings of organization so i'm going to go over the top ways to organize information as we go while also reviewing the book this is a giveaway week so if you are interested in receiving a free copy of this book make sure you stick around till the end all right now i have to go and play with my dog let's get started the composition of this video is going to be i am going to go through the top five ways to organize information each of which will have three levels of granularity micro meso and macro and i'm going to go through how much data falls into each of those as well as putting examples on the screen for each type now i am going to start out with topical analysis because the main book that is being reviewed has mostly to do with topical however the categorical or hierarchical forms of organizing also can find themselves applicable to geospatial which is wear as well as temporal which is when but we will save those for later in the video so topical analysis is very common it is all about the what what is this research about what does this person focus on what are the top searches those type of what questions so the micro level here would be what is it that you as an individual are focused on for your research or for the types of searches that you commonly do the meso level would be maybe the the knowledge flow between two different researchers or for instance you know what articles people in mit often cite from let's say cornell university that kind of exchange of information from a topical perspective now the micro is very high level this could be looking at all of the google searches and what are the topics that are most common there or out of your entire corpus of information at your university or your business what are the top things that you research or the top products that you sell now the macro level of topical is looking at the base knowledge that one can draw from now the book calls this cabinet logic what others would think of as categorical data this is a form of organizing that has been done since before the library of alexandria this is one of the most common ways that people organize things what is fascinating is this book actually goes through how the filing cabinet because it brought order to chaos and it created tabs and folders and different indexes for those files and those folders and color coding and tabs on the folders these all seem very rudimentary to us today but if you think about how any database is organized you will see tabs look like the title of a table and the data is in a cell almost as if it's a card so to speak in the 1900s this was actually a marketing tactic to describe the filing cabinet as an automated way of organizing information which is fascinating because the way the book describes this automation was because humans no longer had to think about where they organized something they could instead bank on the fact that it was done in alphabetical order or it was done by color coding or some other function that they didn't have to keep in mind while they were looking for information another interesting thing the book goes over is how a lot of the folks that were first designing the filing cabinet were coming from an engineering or scientific background they understood that any unit of information had different components those components could then be used to organize how that individual item related to other items sounds a whole lot like a network doesn't it now this book i would say reads very much more like an academic book that you would read in a classroom but it's certainly worth a read if you're interested to see how database design and even the way that we portray machine learning and marketing for machine learning has a lot to do with the beginnings of the filing cabinet jumping from the book review that was mostly dealing with topical although any of the data organizational methods that we're talking about today can be done in a filing cabinet and see the results of modern database designed from those things we're now going to jump into the remaining four types of organization method number two is temporal which is when at a micro level this is potentially your health history when did you have some surgery when did you get diagnosed with something that's at a micro individual level at a meso or a local level this could be looking at topical bursts over a period of time so let's say during this outbreak period where are those outbreaks happening and over which point of time that's kind of mixing between temporal and geospatial but that's often the case at a macro level this could be looking across different time periods so if you're trying to determine when scientific papers were coming out and getting cited and then tracking how patents for instance on those same types of technologies were impacted or followed with the timing of those papers that's the macro or the global level temporal is very useful for understanding the trends or how certain external factors are affecting a certain kind of data so time series data for instance is something that is very common in temporal data in understanding when there might be failure in a system or with forecasting can you use the temporal data that you are understanding today to predict a potential failure or a potential risk farther down the line moving on to number three which is geospatial or where did it happen again this kind of goes along with the first two that we've talked about a lot of these can be mixed and matched but this might be where did something actually occur so if you're looking at the individual micro perspective you could potentially look at weather patterns where specifically did an earthquake happen that's a specific location but you could also look at the entire state of california which is more of that meso level where did all of those earthquakes happen and what was the impact of them and if you were looking at a global perspective of this geospatial analysis you could look at where the tectonic plates are and where volcanic activity is maybe the most active and understand how that relates to the earthquakes that are being tracked from the meso or micro levels in everyday practice this has a lot to do with where certain uh buying patterns happen search patterns maybe there is a very large soccer or european football tournament happening and that's why there's a ton of searches going on in that region for football things these are very common in the search world as well as the e-commerce world one of the most popular ways to do any kind of organization and analysis is statistical so this is looking at more of like a profiling this very much if you're looking at unstructured data is nlp you're you're doing statistics to understand what are the common characteristics in an article or in a social media post or thread those are the types of profiling that you will get with that statistical analysis as with anything that is pure statistically based you want to make sure your data set is accurate and representative and you make sure you do this ethically so do not forget that part if you are looking at the statistical aspect of organization you might ask yourself well i don't know if i know anyone that does organization by statistics but you do it's actually one of the foundations behind all of the different types of organizational methods because typically if you think about it from a probabilities perspective or a saturation or weighted perspective you are going to have the most important things highlighted or retained in your data so if you are looking at nlp it would be something that maybe has the highest uh saturation or it could be something that has the highest f score so from a micro level this could be looking at the highest saturation of terminology or the clusters that are the most common in one specific article that would be like the individual level or the micro level if you are looking at what are the most common searches in your organization from let's say a certain geographic or time period see how those all go together that might be that local or meso level of analysis when you're getting into the global population you could be looking at not just what are the top searches or what are the most important aspects of your product line but you could also be looking at what are the sales trends for other similar products or other similar organizations such as yours that statistical emphasis is where that macro global level could be manifest for this statistical profiling method so statistical uh profiling is actually one of the foundational pieces of organization that we don't have to think about but actually has a lot to do with how we prioritize the information that we are organizing and last but certainly not least is of course network analysis so this can be with whom or with what i could also really have a heavy emphasis on that profiling um what different network characteristics affect different networks in different ways um but this is basically how do you take the first four types and mash them together because in a network analysis you could have what you could have people which is a specific person is connected to specific articles that they have published they could be published in specific geographic spaces or over a period of time and each of those articles can have different citation levels you see how all of those come together into a beautiful package network analysis is of course if you watch this channel one of my favorite types of analysis to do so if you're looking at it from a micro level you could be looking at you as an individual and how your impact to the research community is focused on or if you're looking at your insurance records or loans you can look at that individual person and understand what are their risk factors if you are doing car insurance or what is their probability of being able to return a loan once given those all come together into that network analysis probability and risk assessment from a mezzo or a local perspective if you know anything about insurance uh if you live in an area where accidents are very common that's a local analysis and it's understanding how a network of people are impacting maybe your risk assessment you might not be a risky driver but maybe everyone else in your location is if you're from massachusetts you know what a mass hole is because it's a very common saying around here all right and then if you look at that macro or global level from a network perspective i mean this is so common especially right now with medical point-of-care distribution different access to types of medication network analysis and social media how certain people are influencing one another all of that kind of network analysis is very very common today and watch some of my other videos on knowledge graph and how graph has really changed the name of the game in data science because there's so much impact there but i hope this video has given you a little deep dive into the types of organizational methods and how you analyze them alright so with that i want to thank you very much and i'll catch you next time
Info
Channel: Ashleigh Faith
Views: 122
Rating: 5 out of 5
Keywords: Ashleigh Faith, search engine optimization, knowledge graph, knowldge graph, ontology, how to make a knowledge graph, how to make a knowledge graph and ontology, how to make an ontology and knowledge graph, what is IA, what is metadata, what is taxonomy, how to make a taxonomy, how to organize data bases, top 5 ways to organize information, best way to organize big data, best model for big data, best model for taxonomy, best model for information architecture
Id: qkWKE-EJsJo
Channel Id: undefined
Length: 14min 23sec (863 seconds)
Published: Tue Sep 21 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.