How AI Is Actually Built: From Sand to Data Centers

How AI Chips Are Made

from a grain of sand to billions of switches

The starting point: sand

The chips that run AI start as silicon, refined from ordinary quartz sand. The sand is purified until it's about 99.9999999% pure silicon ("nine nines"). That molten silicon is grown into a single giant crystal, a cylindrical ingot, by dipping a seed crystal into the melt and slowly pulling it up while rotating. The atoms line up into one continuous crystal lattice.

The ingot is sliced into thin discs called wafers, polished mirror-smooth. A modern wafer is 300mm across (about 12 inches). Hundreds of chips will be built on a single wafer at once.

Building the transistors

A chip is, at heart, billions of microscopic switches called transistors. Each one can be "on" or "off": a 1 or a 0. Stack enough of them together in clever patterns and you get logic, memory, and arithmetic.

The way you build them is essentially printing with light, repeated dozens of times in layers. The core loop:

Coat the wafer with a light-sensitive chemical (photoresist).
Expose it to light projected through a stencil-like mask that holds the circuit pattern. This step is called photolithography. Where the light hits, the chemistry of the resist changes.
Etch away material to carve the pattern into the wafer.
Deposit new materials (insulators, metals) to build up structures.
Dope the silicon by implanting impurity atoms (like boron or phosphorus) that change how it conducts electricity. This is what actually makes a transistor work.

Think of it like → building a city block by block. Repeat the loop 50–100+ times, layer on layer, until you have a 3D maze of transistors and the copper "wiring" that connects them.

Why this is so hard: the size

The features on a leading-edge AI chip are a few nanometers wide. For scale, a human hair is about 80,000–100,000 nanometers thick, and a silicon atom is about 0.2 nanometers. You are drawing lines only a few dozen atoms wide.

To print something that small, you need light with an extremely short wavelength. The state-of-the-art tool is EUV (extreme ultraviolet) lithography, which uses light at a 13.5nm wavelength. Making that light is almost absurd: droplets of molten tin are blasted with a high-power laser tens of thousands of times per second to create a plasma that emits EUV. These machines are made essentially by one company (ASML), cost on the order of a couple hundred million dollars each, and are among the most complex machines humans have ever built.

Yield, cutting, and packaging

Not every chip on the wafer works; a single dust speck or defect can kill one. The percentage that work is the yield, and it's a closely guarded number because it drives the entire economics of the industry.

Working chips are cut from the wafer ("dicing"), then packaged: connected to a substrate with hundreds or thousands of electrical contacts and protected in a casing. Modern AI chips increasingly use advanced packaging, where multiple chips (or "chiplets") and stacks of memory are placed side by side or on top of each other and wired together with incredibly dense, short connections. This matters enormously for AI (more on why below).

Why AI uses special chips (GPUs and accelerators)

A normal CPU (the brain in your laptop) is built to do a few things very fast, one after another. It's a brilliant generalist. AI work is different: it's a colossal pile of simple math (mostly multiplication and addition) that can all be done at the same time.

A GPU (graphics processing unit) was originally designed to color millions of pixels simultaneously, so it has thousands of small cores doing math in parallel. That parallelism is exactly what AI needs, which is why GPUs became the workhorse of AI.

Newer chips go further with dedicated units:

Tensor Cores / matrix units: hardware built specifically to multiply matrices (the core operation of AI; see Section 02).
TPUs, NPUs, and custom ASICs: chips designed from scratch to do nothing but AI math, trading flexibility for raw efficiency.

A recurring theme: for AI, moving data is often more expensive (in time and energy) than the math itself. So much of modern chip design is about putting memory physically as close to the compute as possible: hence stacking ultra-fast memory (HBM, "high bandwidth memory") right next to the processor.

How AI Actually Works

patterns, weights, and next-token prediction

The core idea: learning patterns from examples

Traditional software is rules written by a human: if X, then do Y. AI flips this. Instead of writing rules, you show the system millions of examples and let it adjust itself until it gets good at the task. Nobody hand-codes "what a cat looks like"; the system figures out the pattern.

The tool for this is the neural network.

What a neural network is

A neural network is a big stack of numbers and simple math, loosely inspired by brain neurons. The pieces:

Neurons (nodes): each takes in some numbers, combines them, and passes a result forward.
Weights: every connection has a number, its weight, that says how important that input is. The weights are the knowledge. A trained model is basically a giant collection of weights (billions of them).
Layers: neurons are arranged in layers. Data enters the first layer, gets transformed, passes to the next, and so on. "Deep learning" just means a network with many layers.

At each neuron the math is: multiply each input by its weight, add them up, and pass the sum through a simple activation function that decides whether (and how strongly) to "fire." Do this across billions of neurons and you can represent astonishingly complex patterns.

This is why AI is so multiplication-heavy: a layer's computation is essentially one enormous matrix multiplication. That's the operation the chips in Section 01 are racing to do faster.

Training: how it learns

Training is the process of finding good weights. It goes like this:

Start with random weights (the model knows nothing).
Feed it an example and let it make a guess (this forward pass is called inference).
Compare the guess to the correct answer and measure how wrong it was: this is the loss.
Use calculus (backpropagation) to figure out how each weight contributed to the error, and nudge every weight slightly in the direction that reduces the error. This nudging is gradient descent.
Repeat billions of times with mountains of data.

Slowly, the random numbers organize themselves into a structure that captures real patterns. Training is the expensive part: it can take thousands of chips running for weeks or months. Once trained, using the model (inference) is far cheaper per use, though serving it to millions of people adds up.

How modern language AI works: the Transformer

The breakthrough behind today's AI (including the model you're reading right now) is an architecture called the Transformer, introduced in 2017.

Its key trick is attention: when processing a piece of text, the model can look at all the other words at once and decide which ones are most relevant to understanding each word.

A quick example → in the sentence "the trophy didn't fit in the suitcase because it was too big," attention is how the model works out that "it" means the trophy, not the suitcase.

How a language model like this actually generates text:

Text is chopped into tokens (chunks roughly the size of a word or word-piece) and each token is turned into a list of numbers (an embedding) that captures its meaning.
These numbers flow through many Transformer layers, attention re-weighting things at each step.
At the end, the model outputs a probability for every possible next token and picks one.
That token is added to the text, and the whole thing runs again to produce the next token. It writes one token at a time, each one informed by everything before it.

A crucial honest point: the model is fundamentally a very sophisticated next-token predictor. It has no database it looks things up in; all its "knowledge" lives in the patterns baked into its weights during training. This is why it can be fluent and useful but also why it can confidently state things that are wrong ("hallucinate"): it's predicting plausible continuations, not retrieving verified facts.

Why it needs so much compute

Three numbers multiply together to make AI hungry:

Parameters (weights): large models have hundreds of billions.
Data: trained on trillions of tokens.
Repetition: every parameter interacts with every chunk of data, many times over.

Multiply those and you get astronomical numbers of math operations, which is the whole reason the chips and data centers in Sections 01, 03, and 04 exist.

Data Centers: Where AI Lives

one giant machine fighting power and heat

What a data center actually is

A data center is a purpose-built warehouse full of computers. AI data centers are a special, extreme version: thousands of GPU servers packed into racks, all wired together so they can work as one giant machine on a single training job.

The physical hierarchy:

A server holds several chips (e.g., 8 GPUs) plus memory and networking.
Servers stack into racks.
Racks fill rows, rows fill halls, halls fill the building.
A large AI cluster can contain tens of thousands of GPUs working in concert.

The three things a data center fights constantly

1. Networking. For training, thousands of chips must share their results with each other constantly and stay in sync. If the network is slow, expensive chips sit idle waiting. So AI data centers use extremely fast interconnects (technologies like InfiniBand or specialized high-speed Ethernet, plus chip-to-chip links like NVLink) to move data between chips at staggering rates. The network is as important as the chips.

2. Power. These facilities are enormous electricity consumers. Power is now the main bottleneck on how big AI can get, which is driving operators to build next to power plants, sign deals for dedicated generation (including nuclear), and obsess over efficiency.

For scale → a single large AI campus can draw as much power as a small city (hundreds of megawatts, with gigawatt-scale sites being planned).

3. Heat. Every watt of electricity that goes in comes out as heat, and a dense rack of AI chips produces a tremendous amount. If you don't remove it, the chips throttle or fry. Cooling approaches:

Air cooling: fans and air conditioning. Simple, but hitting its limits for dense AI hardware.
Liquid cooling: pumping coolant through cold plates bolted directly onto the chips. Far more effective and now standard for high-end AI.
Immersion cooling: submerging entire servers in a special non-conductive fluid. The most aggressive option.

A common efficiency metric is PUE (Power Usage Effectiveness): total facility power divided by power actually used for computing. A perfect score is 1.0 (every watt does useful work); good modern data centers get close to it, meaning very little is wasted on overhead like cooling.

Why location matters

Operators choose sites for cheap, abundant, reliable power; cool climates or water access to help with cooling; fast fiber-optic connectivity; and stable conditions (low risk of natural disaster, friendly regulation). This is why you see massive data centers in cold northern regions, near hydroelectric dams, or in the desert next to dedicated solar and gas.

The New Frontier: Computing With Light

moving data, and maybe math, with photons

This is the "energy using light" idea you asked about. It comes in two related but distinct flavors, and it's worth keeping them separate.

Flavor 1, happening now: using light to move data (silicon photonics)

Inside and between chips, information normally travels as electrical signals down copper wires. The problem: as you push more data faster, copper wastes energy as heat and degrades over distance. Moving data is already one of the biggest energy costs in AI (recall Section 01).

The fix is silicon photonics / optical interconnects: send the data as pulses of light through tiny waveguides or fiber instead of electricity through copper. Light is fast, generates far less heat over distance, and can carry enormous amounts of data (you can even send multiple data streams down one fiber at once, each on a different color/wavelength).

The newest twist is co-packaged optics: putting the light-based communication hardware right next to the processor in the same package, so signals convert to light almost immediately instead of traveling as electricity across the board first. For AI data centers, where the bottleneck is increasingly getting data between thousands of chips, this is a big deal: more bandwidth, less energy, less heat. This technology is real and rolling out.

Flavor 2, emerging: using light to do the math (optical/photonic computing)

The more radical idea is to perform the AI computation itself with light. The elegant part: the core AI operation is matrix multiplication (Section 02), and light naturally does math when it passes through optical components. Beams can be split, combined, and have their intensities scaled by passing through materials, and combining light beams can perform addition and multiplication at the speed of light and with very little energy, in parallel, without the heat of switching billions of transistors.

A photonic computing chip aims to encode numbers into properties of light (like intensity or phase), pass them through an optical mesh that performs the multiplication-and-addition, and read out the result.

The promise: potentially orders of magnitude better speed and energy efficiency for the specific math AI relies on. The catch: it's still maturing. Hard problems include doing it with high precision, converting back and forth between electrical and optical (which costs energy and can erase the gains), handling memory, and manufacturing it reliably at scale. Several companies and labs are pursuing it; expect it first as a specialized accelerator working alongside conventional chips, not a wholesale replacement.

Why light matters for energy

The throughline of this whole article is energy. AI's growth is increasingly limited not by ideas but by power and heat. Light helps on both: it moves and (potentially) processes information using less energy and generating less heat than pushing electrons through metal. That's why "computing with light" is one of the most watched directions in the field: it attacks AI's single hardest constraint.

How It All Connects

every layer feeds the one above it

Read top to bottom, the stack is one continuous story:

Sand becomes silicon wafers, printed with light into chips holding billions of transistors (Section 01).
Those chips are built to do massive parallel multiplication, because that's the one operation a neural network needs billions of times to learn patterns from data (Section 02).
A single chip isn't enough, so thousands are wired together in data centers, where the real fights are networking, power, and heat (Section 03).
And the next leap aims to beat those limits by moving (and maybe doing) the computation with light instead of electricity (Section 04).

Every layer exists to feed the one above it. The whole machine is, ultimately, a way of turning electricity and math into something that recognizes patterns.

The Whole Story in 5 Steps

Sand to chips: purified silicon is printed with light into billions of nanometer-scale transistors.

Chips to math: GPUs and accelerators exist to do massive parallel multiplication, the one operation AI needs most.

Math to learning: a neural network nudges billions of weights, guess by guess, until real patterns emerge.

Learning to scale: training takes thousands of chips in data centers that fight networking, power, and heat.

Scale to light: photonics attacks the power-and-heat limit by moving (and maybe doing) the math with light.

If You Want to Go Deeper

Chips: look up how EUV lithography works, and what "chiplets" and "advanced packaging" mean.
AI: search the 2017 paper "Attention Is All You Need" (the Transformer), and learn what "backpropagation" and "gradient descent" do step by step.
Data centers: read about liquid vs. immersion cooling and why data center power demand is reshaping the electricity grid.
Light: look into "co-packaged optics" (near-term) and "analog photonic computing" (long-term).

Each of those is a rabbit hole worth falling into. Pick whichever part grabbed you most and start there.

Quick Glossary

Transistor: microscopic on/off switch; the basic unit of a chip.

Photolithography: printing circuit patterns onto a wafer using light.

EUV: extreme ultraviolet light used to print the smallest features.

Wafer / yield: the silicon disc chips are built on / the % that come out working.

GPU / TPU / ASIC: chips with many cores for parallel math; the engines of AI.

HBM: ultra-fast memory stacked right next to the processor.

Neural network: layered web of weighted connections that learns patterns.

Weights / parameters: the numbers that hold a model's learned knowledge.

Training vs. inference: teaching the model (expensive) vs. using it (cheaper).

Transformer / attention: the architecture behind modern language AI; lets it weigh which words matter to each other.

Token: a chunk of text the model reads and writes one at a time.

PUE: how efficiently a data center uses its power (1.0 is perfect).

Silicon photonics: moving data with light instead of copper.

Photonic computing: doing the math itself with light.

Keep Reading

How Cryptography Works: The math that lets you send a secret across a world full of eavesdroppers. Read →

How GPS Works: How a tiny chip in your phone pinpoints your spot on Earth to within a few meters. Read →

How Medicines Get Tested: Why a promising idea has to survive years of brutal testing before it reaches you. Read →

Browse all the Melio Blog guides →

Take a Break, Play a Game

twelve free games, no ads in the games, no signup

Sudoku: Five difficulties. Solo, or race a friend live. Play →

Wordle: One five-letter word, six guesses, fresh daily. Play →

Solitaire: The classic Klondike, draw 1 or draw 3. Play →

Typing Test: How fast can you type? Check your WPM. Play →

Chess: Play bots at three levels, friends, or puzzles. Play →

See all 12 free games on Melio →