What is machine learning, anyways?

A brief definition, history and overview about how computers learn.

Definition of machine learning

Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term “machine learning”. He defined machine learning as a “Field of study that gives computers the ability to learn without being explicitly programmed” (Samuel 1959). This definition emphasizes the ability of machines to improve their performance on a specific task over time, based on experience, without the need for human intervention to explicitly program their behaviors.

A more formal, mathematical definition of machine learning involves concepts from statistics and computer science, focusing on the development of algorithms that can learn from and make predictions or decisions based on data. Here is a more detailed formalization:

Machine learning algorithms build a model based on input data (known as “training data”) to make predictions or decisions without being explicitly programmed to perform the task. These models can be understood in terms of functions, optimization, and probability.

Function: A machine learning model can be seen as a function $f$ that maps input data $X$ (features) to output data $Y$ (targets), such that $Y = f (X) + ϵ$ , where $ϵ$ represents the error or noise in the predictions.

Optimization: The process of learning involves finding the function $f$ that minimizes a loss function $L (Y, f (X))$ , which measures the difference between the predicted outputs $f (X)$ and the actual outputs $Y$ over all the examples in the training data. This process is known as optimization.

Probability: In a probabilistic view, machine learning involves estimating the conditional probability $P (Y | X)$ , which is the probability of $Y$ given $X$ . For classification, this might involve finding the most probable label for a given input, while for regression, it might involve predicting the expected value of $Y$ given $X$ .

Therefore, a more formal definition could be: Machine learning seeks to learn a function $f$ from a set of data, such that it can predict the output $Y$ for new, unseen inputs $X$ with minimal error, by minimizing a loss function $L$ through an optimization process, often underpinned by probabilistic principles.

The history of artificial intelligence and machine learning

The history of machine learning is a fascinating journey that intertwines with the development of computers and artificial intelligence (AI). It begins in the mid-20th century, when the concept of “machine learning” was still nascent, and computational power was a fraction of what we have today. One of the earliest milestones in machine learning was in 1950, when Alan Turing introduced the Turing Test as a criterion of intelligence, proposing that a machine could be considered intelligent if it could mimic human responses under specific conditions. This period also saw Arthur Samuel define machine learning in 1959 as a “Field of study that gives computers the ability to learn without being explicitly programmed,” showcasing his work on a checkers-playing program that improved with experience.

The 1960s and 1970s witnessed the advent of fundamental algorithms that are still in use today. For instance, the development of the perceptron by Frank Rosenblatt in 1957 laid the groundwork for neural networks and deep learning. However, the initial enthusiasm for AI and machine learning faced significant challenges during the “AI winters” of the late 1970s and late 1980s, when expectations did not meet reality, leading to a decrease in funding and interest in the field. Despite these setbacks, the foundation for future developments was laid during this time, with important theoretical work on algorithms and the limits of computation being carried out.

The resurgence of interest in machine learning began in the late 1990s and early 2000s, fueled by the increasing availability of digital data and significant improvements in computing power. This era saw the development of Support Vector Machines, decision trees, and ensemble methods like Random Forests, which provided powerful tools for data analysis and prediction. The real game-changer, however, came with the advent of deep learning in the 2010s, where neural networks, particularly those with many layers (deep neural networks), achieved remarkable success in tasks such as image and speech recognition. The victory of DeepMind’s AlphaGo over world champion Go player Lee Sedol in 2016 marked a watershed moment, demonstrating the potential of deep learning and machine learning not just as academic pursuits but as technologies with significant real-world impact.

Today, machine learning is an integral part of everyday life, powering search engines, recommendation systems, voice assistants, and more, reflecting the dramatic evolution of the field from its humble beginnings to its current status as a cornerstone of modern artificial intelligence.

Domains of ML, how can it be divided?

Supervised
Unsupervised
RL

Reality is more complex – the different domains are often mixed.

--- title: What is machine learning, anyways? subtitle: A brief definition, history and overview about how computers learn. --- ## Definition of machine learning Arthur Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term "machine learning". He defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed" [@samuel1959machine]. This definition emphasizes the ability of machines to improve their performance on a specific task over time, based on experience, without the need for human intervention to explicitly program their behaviors. A more formal, mathematical definition of machine learning involves concepts from statistics and computer science, focusing on the development of algorithms that can learn from and make predictions or decisions based on data. Here is a more detailed formalization: Machine learning algorithms build a model based on input data (known as "training data") to make predictions or decisions without being explicitly programmed to perform the task. These models can be understood in terms of functions, optimization, and probability. **Function:** A machine learning model can be seen as a function $f$ that maps input data $X$ (features) to output data $Y$ (targets), such that $Y = f(X) + \epsilon$, where $\epsilon$ represents the error or noise in the predictions. **Optimization:** The process of learning involves finding the function $f$ that minimizes a loss function $L(Y, f(X))$, which measures the difference between the predicted outputs $f(X)$ and the actual outputs $Y$ over all the examples in the training data. This process is known as optimization. **Probability:** In a probabilistic view, machine learning involves estimating the conditional probability $P(Y|X)$, which is the probability of $Y$ given $X$. For classification, this might involve finding the most probable label for a given input, while for regression, it might involve predicting the expected value of $Y$ given $X$. Therefore, a more formal definition could be: Machine learning seeks to learn a function $f$ from a set of data, such that it can predict the output $Y$ for new, unseen inputs $X$ with minimal error, by minimizing a loss function $L$ through an optimization process, often underpinned by probabilistic principles. ## The history of artificial intelligence and machine learning The history of machine learning is a fascinating journey that intertwines with the development of computers and artificial intelligence (AI). It begins in the mid-20th century, when the concept of "machine learning" was still nascent, and computational power was a fraction of what we have today. One of the earliest milestones in machine learning was in 1950, when Alan Turing introduced the Turing Test as a criterion of intelligence, proposing that a machine could be considered intelligent if it could mimic human responses under specific conditions. This period also saw Arthur Samuel define machine learning in 1959 as a "Field of study that gives computers the ability to learn without being explicitly programmed," showcasing his work on a checkers-playing program that improved with experience. The 1960s and 1970s witnessed the advent of fundamental algorithms that are still in use today. For instance, the development of the perceptron by Frank Rosenblatt in 1957 laid the groundwork for neural networks and deep learning. However, the initial enthusiasm for AI and machine learning faced significant challenges during the "AI winters" of the late 1970s and late 1980s, when expectations did not meet reality, leading to a decrease in funding and interest in the field. Despite these setbacks, the foundation for future developments was laid during this time, with important theoretical work on algorithms and the limits of computation being carried out. The resurgence of interest in machine learning began in the late 1990s and early 2000s, fueled by the increasing availability of digital data and significant improvements in computing power. This era saw the development of Support Vector Machines, decision trees, and ensemble methods like Random Forests, which provided powerful tools for data analysis and prediction. The real game-changer, however, came with the advent of deep learning in the 2010s, where neural networks, particularly those with many layers (deep neural networks), achieved remarkable success in tasks such as image and speech recognition. The victory of DeepMind's AlphaGo over world champion Go player Lee Sedol in 2016 marked a watershed moment, demonstrating the potential of deep learning and machine learning not just as academic pursuits but as technologies with significant real-world impact. Today, machine learning is an integral part of everyday life, powering search engines, recommendation systems, voice assistants, and more, reflecting the dramatic evolution of the field from its humble beginnings to its current status as a cornerstone of modern artificial intelligence. ## Domains of ML, how can it be divided? - Supervised - Unsupervised - RL Reality is more complex -- the different *domains* are often mixed.