July 2023 CACM: Data Science–A Systematic Treatment

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
data is not recognized as being a fundamental ingredient of many of our daily activities and if data is such an important asset then one has to ask question of how do we leverage that I said [Music] there are a lot of academic departments institutes and programs that are being started and these are not always well defined in terms of how they view data science a lot of countries are creating initiatives around data many of these government initiatives are not distinguishing AI from data science the Practical consequence of that is that any investment they make in AI they consider to be have been made in data science as well and that blocks the growth of the field to a certain extent have a particular way of defining data science and my definition can be summarized as a data-based approach to problem solving typically this is done by analyzing large volumes of usually multimodal data and you extract knowledge and you hopefully actionable Insight from that data for me there are four pillars of data science data engineering is everything that you do to data collection integration cleaning quality issues before you start analyzing the data so the old cliche of garbage and garbage out holes in data science and the objective of data engineering is to make sure that garbage never goes in data analytics it's the use of the machine learning and statistical techniques to analyze the data data protection is about security and privacy issues and finally there is what is typically called Data ethics or what I call data science ethics unfortunately this is usually ignored in these projects but we're dealing with data that contains sensitive information so it should be front and center if we look at the pillars two of them are really stem topics so technical topics one of them data protection is partially technical partially involves social sciences and the ethics is all about Humanities and social sciences and that's just about the core when you consider the applications or deployment of data science projects this could be environment biological sciences Health Medicine Etc and you need to bring in domain scientists from these fields so when you consider all of those together by definition data Sciences interdisciplinary any data science project has a number of stages that you go through and that needs to be recognized that it is a circular process so you basically start with a problem definition and you go then your data preparation quality issues cleaning issues ETC then you worry about data storage and data access methods how you integrate what you do then the data is ready for analytics you do analytics and then you produce results now you have to deploy the entire process and monitor and then you go back and you do it again so for those who are into that it is a dialectic process that you basically you just don't repeat you're always going in a better understanding of the field to me a field never becomes respectable until it has a definition of its core its methods and its techniques and tools you know we need a better answer to the question of who are you what do you do we need a better answer than well it depends on which one of us you ask and if we can Define our core mission and vision and what our methods and tools are then we can unleash the full potential of data science foreign
Info
Channel: Association for Computing Machinery (ACM)
Views: 355
Rating: undefined out of 5
Keywords:
Id: m9XecEc9yGw
Channel Id: undefined
Length: 4min 17sec (257 seconds)
Published: Fri Jun 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.