Back to list
Lv.2

Reinforcement Learning

Reinforcement Learning

A technique in which an AI learns through trial and error to take actions that maximize rewards.

In Simple Terms

Reinforcement learning is a mechanism by which an AI discovers the best approach through trial and error. For example, an AI playing a game learns to recognize defeating an enemy and earning points as a "good" action. Conversely, if it makes a mistake and loses points, it adjusts so it does not repeat the same error next time. By cycling through successes and failures this way, the AI figures out on its own the strategy that earns the most points.

Behind the Name

The word "reinforcement" comes from behavioral psychology, where it refers to strengthening a behavior by following it with a reward. The name captures the core mechanism: when an action earns a reward, that action is "reinforced" — making the AI more likely to choose it again in the future.

Take a Closer Look!

Reinforcement learning is a learning method in which an AI takes actions within an environment and aims to maximize the rewards it receives as a result.
The AI starts with no knowledge of what the correct actions are.

Instead of being told the right answers directly, the AI receives feedback — in the form of scores or points — that signals whether each action was good or bad.
Through repeated trial and error, it gradually learns which sequence of actions leads to higher rewards.

Unlike supervised learning, reinforcement learning does not rely on a pre-existing set of correct answers. This means the AI can sometimes discover unexpected strategies that even humans had not thought of.
It can also learn sophisticated decision-making — for example, deliberately choosing a short-term disadvantageous action in order to maximize the total reward achieved in the end.

This approach is used in fields that require complex judgment, such as training robots to walk, playing board games like Go, and developing autonomous driving technology.
A key strength is that even in situations where preparing all the correct answers in advance would be difficult, the AI can still discover better approaches on its own.

CategoryAIData