Hi, I am jasonmel In recent years, the wave of AI has swept across AI is replacing human jobs in all walks of life It seems that everything just needs to apply AI No matter how difficult it is, it can be solved So what exactly is AI How it works And how to imagine and apply Today, let's use 13 minutes To quickly understand this long-term dream technology of mankind AI This train is bound for, Future Stops at, Machine Learning, Deep Learning jasonmel wishes you a pleasant journey AI, Artificial intelligence Is the dream technology of mankind for a long time As early as 1950 The genius slashie scientist Alan Turing mentioned an interesting question in his paper "Computers and Intelligence" Can machines think? Since then, it opened the new field of AI And triggered people's infinite imagination of AI According to Turing's idea To judge whether a machine can think It must pass a so-called "imitation game" Because this game is too classic It is also called the "Turing Test" In this test A questioner C Continues asking questions to machine A and human B in different rooms at the same time As long as C can't tell A or B, who is a computer and who is a human We can claim that the machine in the room can think Since then People spent a long time researching and developing Trying to make a machine or algorithm that can pass the Turing Test In 1997 The most advanced IBM Deep Blue computer defeated the chess world champion It looks great Somehow this is just to let the computer exhaust all the possibilities And choose the most advantageous steps to go To put it bluntly, it's just like GPS navigation system Choosing the best route from all the known map routes However, facing the real world of unlimited possibilities This kind of brute force Obviously cannot be applied to most of the more complex realities To apply AI in daily life We still need to find a more efficient approach And the way humans accumulate wisdom Is a good reference direction Human wisdom comes from experience That is to keep learning and remembering lessons By continuous trial and error We can adjust self's perception of the outside world In this way When encountering a similar situation next time We can easily adopt past experience To judge and deal with the unknown future At the same time, in order to greatly reduce the content that needs to be memorized and processed People are also very good at classifying and labeling similar things Group a large amount of information into a few categories Applying the same concept Is it possible for us to feed experience Which is historical data To machines to learn And automatically find the correlation model between event features and results And generate a program that can predict future values or automatically classify and make decisions About predicting values A very intuitive idea Is to find the mathematical linear relationship between event features and results For example, suppose in a certain area There is a house of 10 level grounds (~33 square meters) sold for 10 million Another house of 20 level grounds (~66 square meters) sold for 20 million According to this information We can reasonably infer the relationship between the house price and area Is about 1 million per level ground (Can you face young people?) When the transaction information is getting more and more We can also use techniques like Gradient Descent To find a regression line that best fits all data Then get a model that uses the house areas to predict house prices This is so-called inear Regression There are many methods for automatic classification Here we list a few well-known algorithms to get some sense of it Similar to Linear Regression For classification problems We can also associate features with results By projecting to a logistic curve between 0 and 1 0 represents one category, 1 represents another category So that we can use similar approach To get a model that maps any value to the appropriate classification This is the so-called Logistic Regression (The next one) Decision Tree Is to use the relationship between features and classification results To construct a decision tree full of "if this, then that" from historical data And generate a model that allows different features to fall into corresponding appropriate classifications (The next one) While dealing the same problem In order to avoid the importance of a single feature being over-magnified and causing bias If we randomly select some features to construct multiple decision trees Finally, voting to decide the outcome (The biggest secret of the election is that the more votes you win, the fewer votes you lose. It's that simple) It will generate a more comprehensive and correct answer than a single decision tree This is Random Forest (The next one) Similar concepts going further If we strategically construct multiple decision tree models gradually Indirectly give important features more weight To get a more accurate decision tree forest It is Gradient Boosted Decision Trees Or GBDT (The next one) K-Nearest Neighbors Or KNN Is based on existing historical data For new data you want to predict Directly compare the K historical data with the closest features To see which category they belong to Then vote to determine the classification of the new data (The biggest secret of the election is that the more votes you win, the fewer votes you lose. It's that simple) (The next one) Naïve Bayes Classifier Under the premise that the known features are independent of and do not affect each other We can use Bayes' theorem To calculate the probability relationship between individual features and results And predict the individual probability of different feature conditions falling into different categories (The next one) Support Vector Machine Or SVM Try to find a dividing line between different classification groups Keep the boundary from the closest data point as far as possible To achieve the purpose of classification All the above algorithms are under the condition that the historical data has ground truths And try to find a model that matches the correlation between features and results In this way New data can apply the same model To get appropriate prediction results What if the data we have has never been classified Is there a way to automatically group them Yes K-Means Clustering First select K center points from all the data randomly We can divide individual data into K groups according to the nearest center point Treat the average of each group as the new K center points Then divide data into K groups And so on Finally data will converge to K groups close to each other The above algorithms are all in the case of having historical data And use data to construct models What if there is no historical data Reinforcement Learning Or RL Conceptually, under the condition without historical data Put the agent, that is, the model Directly into the environment Through a series of actions To observe the state of the environment At the same time, it gets reward or punishment feedback from the environment And dynamically adjust the model In this way After training The model can automatically make the action that can get the most reward With so many dazzling machine learning algorithms The first problem we face is Which algorithm should be applied About the choice of algorithm Usually we will classify algorithms into 2 categories by if the historical data have ground truths Supervised Learning Or Unsupervised Learning And then subdivide according to the effect that can be achieved As for reinforcement learning without historical data Should be put in an independent category In addition, we also need to consider the characteristics and assumptions of each algorithm E.g. linear regression Is based on the premise that there is a certain degree of linear relationship between features and results In case of non-linear relationship It could not be very applicable Another example is Naïve Bayes Classifier It is based on the premise that the features are mutually independent If there are dependencies between features It could not be very applicable And so on In addition, there are many miscellaneous factors Such as the size of the data, the trade-off between model performance and accuracy, etc. Some people even make the choice of algorithm into an SOP cheat sheet To make people more directionable Even so According to different types of problems It seems that it is only suitable for these relatively simple application scenarios And still difficult to apply to higher-level and more complex applications Is this all machine learning can do While developing machine learning Humans who are good at copycats are also thinking about imitating their own brain neurons Although the human brain is only composed of simple neurons It can generate wisdom through the interconnection between tens to hundreds of billions of neurons So, can we use the same concept To make machines simulating this kind of universal one-trick mechanism And generate wisdom This idea opened up the field of Neural Networks And then evolved into Deep Learning A brain neuron has many dendrites Receiving action potentials from other neurons These foreign action potentials are integrated inside the cell As long as the potential exceeds a threshold It will trigger a chain reaction And pass the action potential information of this neuron To subsequent neurons through axons In the same way, we can simulate the mechanism of brain neurons With digital logic We call it perceptron Which contains m inputs x A bias After multiplying the weights and summing up Then pass to an activation function To simulate the potential threshold mechanism of brain neurons Finally output the degree of activation of this node And pass to the perceptron of the next layer Since most of the problems to be solved in reality will not have simple linear solutions We usually choose activation function of nonlinear functions Like sigmoid between 0 and 1 Tanh between -1 and 1 The most commonly used ReLU or other variants Once we connect many layers of perceptrons to each other It forms a deep learning model To train this model Just feed the data one by one Do forward propagation first Bring the output results and ground truths into a loss function Figure out the difference between the two Then use an optimizer such as gradient descent To do back propagation Aiming to reduce the loss To adjust the weights in each perceptron As long as there is enough data The difference between model outputs and ground truths Will gradually converges and decreases in the self-correction of the data flowing through the model one by one Once the difference between the ground truths and the answer derived from the model is small to an acceptable level We can say this model is a trained and usable model This concept seems simple But to achieve it We need a lot of data A lot of computing power And easy-to-use software So after 2012 When these three conditions are met Deep learning finally blossomed Began to grow explosively In the field of computer vision We can use Convolutional Neural Network Or CNN First use a small range of filters to obtain the edge, shape and other characteristics of the image Then connect these meaningful features to the aforementioned deep Learning model So that it can effectively identify objects in pictures or images In this way Computers have surpassed humans in image recognition accuracy And continue to improve In terms of imitating images or artistic styles We can use Generative Adversarial Network Or GAN By competing against two deep learning models Fake data is generated by a generator model that aspires to become a master of imitation And let the discriminator model to judge whether the data is real or fake Once the generated fake data confuses the discriminator model It succeeded Some face-changing apps Or AI-generated paintings Are all related applications of GAN For sound or text, etc. Natural Language Processing Or NLP Processing of such sequential data Traditionally we can use Recurrent Neural Network Or RNN Pass the model state of each training iteration to the next iteration To achieve sequential short-term memory An advanced version is Long-Short Term Memory Or LSTM It is used to improve the long-term memory diminishing effect of RNN For similar problems Someone proposed another more efficient solution Called Transformer Conceptually it uses the mechanism of attention Let the model deal with the important parts directly Such a mechanism is not only suitable for natural language processing It also has good results when applied in the field of computer vision In 2020 Super huge model GPT-3 with 175 billion parameters Is already able to automatically generate articles and codes or answer questions The quality is not even inferior to humans In the future, as the number of model parameters continues to grow exponentially The practical application effect of this type of model is even more exciting In addition to the two fields of computer vision and natural language processing mentioned earlier Deep learning also has amazing results in various fields In 2017 In the Go field where brute force algorithm cannot be applied AlphaGo that combines deep learning and reinforcement learning Defeat the world's No. 1 Go player Ke Jie by 3:0 It shocked the world And declared that AI can learn through fast self-learning To surpass the accumulated wisdom of mankind over thousands of years in a specific field In 2020 AlphaGo's team DeepMind uses deep learning again To solve the protein folding problem that has plagued biology for 50 years This will more practically help humans understand disease mechanisms Promote new drug development Help agricultural production And then use protein to improve the earth's ecological environment The development of self-driving cars closer to life is also amazing Current self-driving technology As the accumulated mileage continues to increase and mature The accident rate has long been far lower than that of humans At the same time, AI in the medical field The diagnostic accuracy rate of certain subjects has also reached a level better than that of humans (Doctor is not god) (Doctor is human) As for unmanned stores and China Skynet Are not such a novel topic anymore At this time, when we look back at Turing's problem in 1950 Can machines think We might still not be able give a clear answer However, the current human beings Already have more accumulated technological achievements than in the past And are closer to this dream and keep going Current AI technology Like a kid learning and growing Can see, can hear, can speak And be able to address specific issues Make accurate judgments, even out of the frame, and beyond human past cognitive abilities However, once encountering complicated issues of philosophy, emotion, ethics, etc. It is still far from competent Overall Humans and machines have their own strengths Humans are good at thinking and innovating However, are limited by physical strength and occasionally make mistakes Machines are good at memorizing and computing And can give stable and high-quality answers to specific questions 24hr all year round Therefore, in this wave of AI The ideal strategy should be that humans and machines fully cooperate, taking advantage of each strengths People can outsource some relatively low-level, highly repetitive, trivial, and uninteresting works to machines gradually At the same time The released manpower can be invested in more exploration, research, creative and interesting works In this way People will have more time and energy to realize their dreams To think about the meaning of life And also focus on solving important problems To raise the level of the whole human beings