The average salary in the big data field? If you are a PhD, in Google or Amazon, your starting salary is Hello. I am JoCoding. As I announced earlier in the community, we have the best big data expert in Korea today. We have Professor Cho Sung-joon, the author of the Big Data Career Guidebook. Nice to meet you. Hello. Nice to meet you. Yes, please introduce yourself briefly first. I teach students at the Department of Industrial Engineering at Seoul National University. I teach and conduct researches on big data, data mining that analyzes big data, AI, and machine learning. I majored in industrial engineering in bachelor's degree and majored in computer science in master's and PhD. I majored in AI and machine learning, and more specifically, neural networks for my PhD. Since then, I develop algorithms in the field, and solve problems that many companies have through machine learning Also recently, I am interested in teaching AI and big data to not only the students in school but also to those who are already working in companies. I am doing the training too. Korea's greatest big data expert! I prepared interview questions based on the comments left by subscribers on the community. Recently, a lot of people are saying that the big data field is promising. In fact, there are people preparing to be developers, but people preparing for data are increasing. Why did it become so hot? I think there are two reasons. First, it's a matter of course, but there is a lot of data. It is literally big data. There is the social media data that all us individuals are making. We are making a lot of data in social media like Facebook, Naver, and Instagram. There is this social media side. Also, we are not aware of it, but there are sensors everywhere. The sensor we use the most is our cell phones. Coming here with my phone, even if I turn everything off, mute it, and don't answer the phone, the phone keeps sending the location where I am currently in to the operator. It is saying 'I am here.' So in a way, this is a location sensor. In that way, the data of where I go for 24 hours continues to accumulate in telecommunication companies. So, first of all, there is a lot of data. That means opportunity. So there are a lot of opportunities. The second reason is that in our society, when we look at human history worldwide not only in Korea A considerable number of decisions were made by very few wise people, people who were thought to be smart. I am not sure if they actually were smart. But they made decisions by their experience and sense. There are a lot of limitations to it. And data is a fact. Why should we believe in our senses when there are the facts? Let's use the facts and make more objective decisions. The second reason is a necessity. One is the opportunity, and the other is a necessity. As these two were combined, there are a lot of big data that can be easily analyzed. As these two are combined, the need to analyze big data is in many companies and public institutions. This is a promising field right now. Will it continue to be promising in the future? - What do you think?
- That is very important, especially for young people. Although it is important now, it is a problem when it isn't 5 years from now. You can think this way. 'Will there be more or less data in the future?' What do you think? There is going to be a lot more. I think it would boost greatly. It is just the beginning in a way. There will be more data and sensors. People will use more social media. So you can think of it like this. In the past, it was 100% subjective and 0% objective. - Now, I think about 20% is objective and still about 80% subjective.
- Still very subjective. - That is right.
- Then, it is still in a very early stage. It is the early stages. I think it's in a very early stage. Then will it be promising for 10 or 20 years? I say that it definitely would last for more than 50 years. I heard that there are many big data careers. I heard that there are data scientists, engineers, and the classification varies from company to company. How should we divide it? Yes. There are many different ways to classify them. The researchers and I got together to discuss it for a very long time. It can be divided into six categories. A data engineer who prepares data. Data scientists are the most famous. In a way, they are the most central part. They extract insights from data. Third, there is data researcher. There are people who research and develop. They are the professionals. Next, data analyst are a little less professionals but also important. And there are the citizen data scientists. Finally, there are the planners. These are the six categories. You may not be able to tell just by the names. - Can you tell us about their roles with actual examples?
- Yes. First, data engineer is literally the person who collects, stores, and manages data using the engineering background. To put it simply, they are people who are good at coding and have studied a lot on databases based on computer science or cloud computing. They are the data engineers. Second, the data scientist is the chief. They are the people that makes the nice food out of the ingredients that the data engineer prepared. They are the people that analyze. As the analysis method, there are AI, statistics, traditional visualization methods, and such. So they are the people who solve very difficult and deep problems. The third type is the researcher. They learn the same things as the data scientists. But most of them do their PhDs, and rather than entering the field, they make the tools that people who actually analyze in the field can use. Researchers are people who work in the lab to keep developing new methods, and to make these things much better than the ones we're using right now. Now, then people that are not in computer science, statistics, or engineering, are there anything for them to do? There are a lot of jobs in the next three fields that can be done without great expertise as a data scientist. The first is the analyst. Analysts are similar to data scientists but the analyzing method is not very deep. People who look at sales marketing data and analyze various things through excel They are actually analysts. The next one is the citizen data scientist. The name is the same as the data scientist, right? Data scientists are expert chefs in some way. Citizen data scientists are not expert chefs, but amateur chefs. They already have jobs. There could be professors, YouTubers, or athletes. So they each have their main job. This is the secondary job. They learn analysis as a secondary job and character. So even if they didn't learn as much as data scientists, they are sort of a power user. They study way deeper than the analysts but not much like the data scientists. People who have studied hard and reached that level play a very important role. It's because even though the method of analyzing is important, what to analyze is more important. The citizen data scientist have actual jobs. They are experts in that field. Will it possible to be solved? What different method will there be from the existing method? How can I do objective decision makings? They know these things better than anyone else. They are the people who can ask questions well. There are not many people like that in Korea right now. That is actually one of the reasons why the spread of big data in Korea is slowing down, and why company training and vocational training is important. The planner is the one that does the management. Who does the management? There are management agencies. Yes. These days, idol groups make new concepts from the beginning. That is planning and management. Oh~ Big data needs to be planned out like that. If I want to create a certain value, what analysis results and data is needed? You have to go backwards like this. Then, as you said earlier, each job has a different role, so I think the required competence must be different. What do they each ask for? I think there are four technical capabilities. To understand AI machine learning, you need some mathematical foundation in college maths that you learn in freshman and sophomore years. That doesn't mean that you have to major in mathematics. Second, you need to learn the contents about AI machine learning like data mining. Another thing is cloud computing. These days, we store all data in the cloud. So about cloud computing and database. Then there is coding. Depending on the six categories, there is a slight difference in the required level. Coding must be very important to the data engineers. It is important to the data scientists as well, but not as much as the data engineers. It's like this. And there are two kinds of non-technical skills. One is domain knowledge. It must be very important for citizen data scientists and planners. Another important thing is communication skills. Can a data scientist understand what the citizen data scientist or the planner says? This is a question from kimbanggu. How do you decide on the domain? I wonder to what extent I have to study that field. Yes, first of all, I told you about six job categories. Whether I should go to the technical job or the domain must already have been decided if you are watching this video. If you are majoring in statistics, commerce, and engineering, I think it's better to go to the technical track than the domain. If you are currently studying in economics, French, or business, what you are doing is your domain. Then, for someone who a statistics expert or majored in statistics is preparing to be a data scientist, can they do that without domain knowledge? Yes, they can. For example, if I am now graduating with a computer engineering degree, studying economics all of a sudden is not right. I wouldn't achieve domain expertise. For those people, go that way. You will be given work in that field when you get a job. The job is bound to meet a domain expert then. When you meet the domain expert, think that it is a good opportunity and that you should get a lot of domain knowledge from that person. When you enter a specific company, you will naturally learn the domain so you don't have to decide in advance. Yes. Also, there are some people that didn't major in the field who start studying big data or artificial intelligence. Will it be possible for them too? Yes, like I mentioned earlier, non-majors can do it as a citizen data scientist. For those who are attending school, almost all universities in Korea open these data science-related subjects in computer science, statistics, or commerce. Take some classes starting from the easy ones. The better you do, it is better. I don't think there is a limit. If I get better and more skillful tomorrow than today, and get better after a month, then I am doing well. Just keep improving. Among the additional questions, Would it be more helpful to get a degree in related majors like computer engineering or statistics through self-studying degree or credit bank system? If you have that much time, it's not bad. In the case of non-majors, I think it would be better to go into work first and learn little by little whether it is sales marketing or HR for the humanities majors. I think it would be alright to focus more on your major. So I don't think there is an answer. You can do whatever you like. There will be various situations and environments to consider, so do as you wish. Among those who majored in liberal arts, are there cases where they actually became a citizen data scientist or, even beyond, a planner? Yes. There are a lot. Currently, I teach big data as a general course in Seoul National University About half of the students there study Liberal Arts. When I asked them, most of them didn't know about coding. Still, I've seen cases where they learn and succeed a quite high-level analysis. And some even come right to graduate school with that. It seems many people have questions about graduate school. Whether they should go to them. You have to get a master's degree to become a data scientist. If the situation allows you, I think a doctor's degree is great too. Then how do people who don't major in those go to graduate school to become citizen data scientists? I think they don't know if they're supposed to go to a data scientist grad school. There are lots of labs these days that do data analysis in grad school. So learning data science in those, working on their major more deeply could be one way. Another is, just getting a job. Learn data analysis from your mistakes there. Whether you're going to grad school because you think your chances of getting a job will go up or you already have a job but you want to study more to break the boundaries. I think the latter is more desirable. At that point, I think it's become sure whether you're going for data science or your own major. So in doubt, I feel it's good not to be too bold. What licenses are there in data science and what licenses are important for getting a job? I believe there is a data scientist license but I know only that such a thing exists. Oh, So it seems companies don't consider licenses when they hire. There's no need to care too much about licenses for now. Instead, there are these. The AWS certificate in data engineering? - Oh, cloud.
- Yes, in cloud areas, those certifications are licenses in a way. There are stuff like those. You take several classes and take exams. When I was a data engineer in LG, - I was recommended to get that certification, and the company really supported you towards it.
- Oh, that's right. Just like you said, I think cloud licenses are quite recognized. Yes. As in programming language used mainly for big data, there are R or Python. What are the pros and cons of each, and what do industries prefer? Recently, with the appearance of AI machine learning, tools of analysis we can use have varied a lot. Whereas we used only statistic tools in the past, now we use machine learning a lot tool As a result, not only people who've majored in statistics but also those who've majored in computers
or machine learning can go into analysis. As for what tools they mainly use, statistics traditionally prefer R. In statistics. A fun thing is, areas outside statistics don't use R. So I think only statistics use it. In other areas, they all use Python. Computer engineering uses Python, of course. To compare the two languages, Python is more low-level. A low-level language, detailed and which we can control even the slightest things. On the other hand, R is a much higher language. What this means is, R is much easier to learn than Python. Python offers a lot of choices to the user. Both are open source, so they're free. So if there are people who aren't able to decide between the two, you would say Python offers wider choice ranges. That is what I would say. But the fact itself that they're having a hard time deciding means they aren't statistic majors. So they're end up using Python. What tools are there that aren't used much these days or famous? There's a tool called SAS, traditionally used a lot in statistics. SAS is the name of the company and there are various tools from that company. Lots of companies are already using SAS tools. There are a variety of tools, from one that's easy to control to one that's not. The only disadvantage of them is that they're expensive. But the price is paid by the company, not you, so you can use it if you can. And there's a visualization tool used in BI, Business Intelligence, called Tableau. This is something that's used a lot as well. You don't need coding with this. As well as the SAS tools. Only the mouse click. Yes, only clicks. One more thing I would recommend is, a software called Spitfire. It's compared with the BI Tableau a lot, but it's fundamentally different from it. This isn't a tool for making a god picture, you're analyzing the picture itself. It's called visual analysis. Normally in analysis, we code, turn the results offline and look at them whereas in this, we keep changing the picture online, interactively. Shall we look at it this way? Or that? It's a tool for analyzing these things on the screen. It's very powerful. There's also extensive analysis software. Ones that don't need coding are Rapidminer, and Orange. But these require payment. A free tool similar to these is KNIME. Made by the Konstanz University in Germany. Since it's made by Germany, the screen is a bit rough, but it's free. It's a great tool. Is it actually used a lot in the field? It's used a lot in fieldwork, highly recommended for people who can't afford to learn coding or people who tried learning coding, but couldn't get the hang of it. As for experiences, one can build up in the data field, what are there? Well, there are lots of Kaggle contests, so you could enter those or there are hackathons. There are also lots of stuff from companies or public organizations, right? Keep entering those, and manage your personal GitHub. The details are in my book. - Right, I've read it,
- You can refer from there. it's really detailed. It's all explained, from GitHub to LinkedIn management. So those of you who want to know all the details, read the big data career guidebook. What effective ways are there to study and prepare for big data career? To be a data engineer, change your major quickly to computer engineering, follow the course, and experience AWS or Google, which are actually used a lot after graduating. It's all true in the case of a data scientist or researcher, and as for an analyst, citizen data scientist, or promoter, have to understand from your line of work what you can analyze so those people should learn new stuff about them. Would there be a difference in going domestic or overseas? In overseas employment, I think it's very important to use LinkedIn, since it's hard to reach. LinkedIn is universal, after all. Detailed ways are also explained in my book. I think this is the question the subscribers are all dying for. What is the average wage in the big data industry, and how much did the maximum cases get paid? They don't tell me much about their wage. With a doctorate degree, you get a starting salary of 250 thousand dollars. - About 300 million won.
- Wow. That's how we would react, but there's a joke about 80% of the money going to a house owner in San Francisco. You asked about average or highest salary, but one thing I want to tell you is, they say doctors get paid a lot but there are cases in which a doctor owns their own hospital, or gets paid, a pay doctor as they call them. I think it's the same in data science. If you're really interested in money, there's no need to consider salary. People who want to make big money with data science should start a business. I see. Become a data promoter and start your own business. Those startup businesses are supported a lot by the government these days. Then how to promote. If you look at the big three of the entertainment industry, what did they all do when they were young? They were singers, they danced. They know the fundamentals of the line of work, so they can be CEOs and make young teams. It's the same in data science, don't linger too much on salary. You have to go to the waters in which you can swim in as quickly as possible. I think it would be good to think you're learning until you are capable of promoting and starting your own business. The salary is just part of your studying. It's a period of studying. You've told us about 6 areas, which is the highest in salary? It would be a data scientist. Or a citizen data scientist, I think it's one of either. And a promoter doesn't have a salary, Oh, a promoter. They pay themselves. Would there be a book you would recommend to those who are preparing for the big data field? If you're interested in career, I recommend this book right in front of us, and there's also a book called 'A new language to understand the world, big data' which has lots of cases in it,
so I think it'll help you look at the general overview. It's embarrassing for me to recommend only my books, but there are also lots of good books too. It's a book analyzing the process of connecting many men and women. The duplicity of humans? And it analyzes what the clients really want. It's very fun, time flies by and it won't feel like studying at all. Stuff you didn't even know could be analyzed. Finally, could you give us some words of advice for our subscribers who are preparing for the big data field? Think 100 years back in the past in Korea, most of us were farming, and people who could read were like 5%? of the population. But if we told the farmers to learn how to read, what would they have said? I'm a farmer. Why would I learn how to read? In the 80s and 90s, with the appearance of computers, 'I'm not a computer engineer, why should I learn how to use computers?' Also 'Why should I learn about the internet?' 'Why should I learn how to use a smartphone?' Every time new technology and innovation turns up, people have said it wasn't their business. All the things I've mentioned have now become a part of our life and we've come to a point where you can't live without knowing them, even though you're not an expert. With the advent of big data and AI, people think data science is new and for a minority of experts, but it's not. I hope after listening to what I've said, you can think of it as everyone's business. Everyone should know about this, and it will become just like using a computer or the internet. It's great to become an expert at this and make huge money, but you also should keep in mind that not doing it will be a big mistake. Second, data analysts are a very hot topic these days. As a researcher, I think people are trying to automatize the process of data science analysis as well. It's becoming more and more simple. Think of when computers first came out, saving files with Dos was horrible. But UI and GUI were made and four year olds who know nothing about computers can use smartphones now. Big data is going toward that point already. I think for the next couple decades, people good at analyzing those stuff will be really welcomed but if the process keeps on getting simpler, the time needed to learn that analysis will reduce and such a world will slowly come. The important point is in promotion ability. I've been emphasizing promotion from the beginning. That capacity is very important and is hard to get. But people who are able to do that are those who have knowledge in both domain and analysis. And who knows the market and business. To be really successful, experience the field and build up your promotion ability and you'll have endless opportunities. Thank you for your good words. This has been professor Cho Seong Jun. - Thank you.
- Thank you. Did you enjoy the interview? Leave a comment about it and we'll send 20 of you this book. If this video helped you, like, subscribe, and turn on notifications. Thank you.