Back to list
Lv.2

Synthetic Data

Synthetic Data

Data artificially created by computers for various purposes such as AI training and data analysis

In Simple Terms

Synthetic data refers to data that isn't collected from the real world but is newly generated by computers or AI. A key feature is that it can be used in place of real personal information to train AI while keeping privacy risks low, and it can safely fill in hard-to-collect data like rare accident scenarios in autonomous driving. It's created in many forms — images, audio, text, and more — and is widely used as a way to advance AI development.

Behind the Name

In English, it's called Synthetic Data. "Synthetic" means "combined" or "artificial." The word captures the idea that this is data artificially created through computer processing — not real data gathered from the real world.

Take a Closer Look!

Synthetic data refers to data that isn't collected directly from the real world, but is newly generated by computer programs or AI.
A key characteristic is that it is created by mimicking the features and patterns of real-world data, or by using simulations to recreate specific situations.

One benefit of synthetic data is that it helps protect privacy.
For example, if hospital patient data or personal shopping histories were used directly to train AI, there would be a risk of someone's personal information being exposed.
By using synthetic data instead — newly generated in a way that can't be traced back to any individual — AI can be made smarter while keeping privacy risks in check.

Another benefit is the ability to fill in data that's hard to collect in the real world.
To train a self-driving AI, you need a lot of data on situations like skidding on snowy roads or rare accidents — but repeatedly gathering that data in the real world is both dangerous and extremely difficult.
By generating synthetic data for those situations on a computer, it becomes possible to support AI training safely.

Technologies for generating various types of data — images, audio, text, and more — continue to be developed.
In AI development and beyond, synthetic data is used as a way to supplement real data and recreate situations that are difficult to capture in practice.

CategoryAIData