Chips, brains, buildings, and light: a no-fluff walkthrough of the whole stack, from a grain of sand to a thinking machine, and where the technology is heading next.
from a grain of sand to billions of switches
The chips that run AI start as silicon, refined from ordinary quartz sand. The sand is purified until it's about 99.9999999% pure silicon ("nine nines"). That molten silicon is grown into a single giant crystal, a cylindrical ingot, by dipping a seed crystal into the melt and slowly pulling it up while rotating. The atoms line up into one continuous crystal lattice.
The ingot is sliced into thin discs called wafers, polished mirror-smooth. A modern wafer is 300mm across (about 12 inches). Hundreds of chips will be built on a single wafer at once.
A chip is, at heart, billions of microscopic switches called transistors. Each one can be "on" or "off": a 1 or a 0. Stack enough of them together in clever patterns and you get logic, memory, and arithmetic.
The way you build them is essentially printing with light, repeated dozens of times in layers. The core loop:
The features on a leading-edge AI chip are a few nanometers wide. For scale, a human hair is about 80,000–100,000 nanometers thick, and a silicon atom is about 0.2 nanometers. You are drawing lines only a few dozen atoms wide.
To print something that small, you need light with an extremely short wavelength. The state-of-the-art tool is EUV (extreme ultraviolet) lithography, which uses light at a 13.5nm wavelength. Making that light is almost absurd: droplets of molten tin are blasted with a high-power laser tens of thousands of times per second to create a plasma that emits EUV. These machines are made essentially by one company (ASML), cost on the order of a couple hundred million dollars each, and are among the most complex machines humans have ever built.
Not every chip on the wafer works; a single dust speck or defect can kill one. The percentage that work is the yield, and it's a closely guarded number because it drives the entire economics of the industry.
Working chips are cut from the wafer ("dicing"), then packaged: connected to a substrate with hundreds or thousands of electrical contacts and protected in a casing. Modern AI chips increasingly use advanced packaging, where multiple chips (or "chiplets") and stacks of memory are placed side by side or on top of each other and wired together with incredibly dense, short connections. This matters enormously for AI (more on why below).
A normal CPU (the brain in your laptop) is built to do a few things very fast, one after another. It's a brilliant generalist. AI work is different: it's a colossal pile of simple math (mostly multiplication and addition) that can all be done at the same time.
A GPU (graphics processing unit) was originally designed to color millions of pixels simultaneously, so it has thousands of small cores doing math in parallel. That parallelism is exactly what AI needs, which is why GPUs became the workhorse of AI.
Newer chips go further with dedicated units:
A recurring theme: for AI, moving data is often more expensive (in time and energy) than the math itself. So much of modern chip design is about putting memory physically as close to the compute as possible: hence stacking ultra-fast memory (HBM, "high bandwidth memory") right next to the processor.
patterns, weights, and next-token prediction
Traditional software is rules written by a human: if X, then do Y. AI flips this. Instead of writing rules, you show the system millions of examples and let it adjust itself until it gets good at the task. Nobody hand-codes "what a cat looks like"; the system figures out the pattern.
The tool for this is the neural network.
A neural network is a big stack of numbers and simple math, loosely inspired by brain neurons. The pieces:
At each neuron the math is: multiply each input by its weight, add them up, and pass the sum through a simple activation function that decides whether (and how strongly) to "fire." Do this across billions of neurons and you can represent astonishingly complex patterns.
This is why AI is so multiplication-heavy: a layer's computation is essentially one enormous matrix multiplication. That's the operation the chips in Section 01 are racing to do faster.
Training is the process of finding good weights. It goes like this:
Slowly, the random numbers organize themselves into a structure that captures real patterns. Training is the expensive part: it can take thousands of chips running for weeks or months. Once trained, using the model (inference) is far cheaper per use, though serving it to millions of people adds up.
The breakthrough behind today's AI (including the model you're reading right now) is an architecture called the Transformer, introduced in 2017.
Its key trick is attention: when processing a piece of text, the model can look at all the other words at once and decide which ones are most relevant to understanding each word.
How a language model like this actually generates text:
A crucial honest point: the model is fundamentally a very sophisticated next-token predictor. It has no database it looks things up in; all its "knowledge" lives in the patterns baked into its weights during training. This is why it can be fluent and useful but also why it can confidently state things that are wrong ("hallucinate"): it's predicting plausible continuations, not retrieving verified facts.
Three numbers multiply together to make AI hungry:
Multiply those and you get astronomical numbers of math operations, which is the whole reason the chips and data centers in Sections 01, 03, and 04 exist.
one giant machine fighting power and heat
A data center is a purpose-built warehouse full of computers. AI data centers are a special, extreme version: thousands of GPU servers packed into racks, all wired together so they can work as one giant machine on a single training job.
The physical hierarchy:
1. Networking. For training, thousands of chips must share their results with each other constantly and stay in sync. If the network is slow, expensive chips sit idle waiting. So AI data centers use extremely fast interconnects (technologies like InfiniBand or specialized high-speed Ethernet, plus chip-to-chip links like NVLink) to move data between chips at staggering rates. The network is as important as the chips.
2. Power. These facilities are enormous electricity consumers. Power is now the main bottleneck on how big AI can get, which is driving operators to build next to power plants, sign deals for dedicated generation (including nuclear), and obsess over efficiency.
3. Heat. Every watt of electricity that goes in comes out as heat, and a dense rack of AI chips produces a tremendous amount. If you don't remove it, the chips throttle or fry. Cooling approaches:
A common efficiency metric is PUE (Power Usage Effectiveness): total facility power divided by power actually used for computing. A perfect score is 1.0 (every watt does useful work); good modern data centers get close to it, meaning very little is wasted on overhead like cooling.
Operators choose sites for cheap, abundant, reliable power; cool climates or water access to help with cooling; fast fiber-optic connectivity; and stable conditions (low risk of natural disaster, friendly regulation). This is why you see massive data centers in cold northern regions, near hydroelectric dams, or in the desert next to dedicated solar and gas.
moving data, and maybe math, with photons
This is the "energy using light" idea you asked about. It comes in two related but distinct flavors, and it's worth keeping them separate.
Inside and between chips, information normally travels as electrical signals down copper wires. The problem: as you push more data faster, copper wastes energy as heat and degrades over distance. Moving data is already one of the biggest energy costs in AI (recall Section 01).
The fix is silicon photonics / optical interconnects: send the data as pulses of light through tiny waveguides or fiber instead of electricity through copper. Light is fast, generates far less heat over distance, and can carry enormous amounts of data (you can even send multiple data streams down one fiber at once, each on a different color/wavelength).
The newest twist is co-packaged optics: putting the light-based communication hardware right next to the processor in the same package, so signals convert to light almost immediately instead of traveling as electricity across the board first. For AI data centers, where the bottleneck is increasingly getting data between thousands of chips, this is a big deal: more bandwidth, less energy, less heat. This technology is real and rolling out.
The more radical idea is to perform the AI computation itself with light. The elegant part: the core AI operation is matrix multiplication (Section 02), and light naturally does math when it passes through optical components. Beams can be split, combined, and have their intensities scaled by passing through materials, and combining light beams can perform addition and multiplication at the speed of light and with very little energy, in parallel, without the heat of switching billions of transistors.
A photonic computing chip aims to encode numbers into properties of light (like intensity or phase), pass them through an optical mesh that performs the multiplication-and-addition, and read out the result.
The promise: potentially orders of magnitude better speed and energy efficiency for the specific math AI relies on. The catch: it's still maturing. Hard problems include doing it with high precision, converting back and forth between electrical and optical (which costs energy and can erase the gains), handling memory, and manufacturing it reliably at scale. Several companies and labs are pursuing it; expect it first as a specialized accelerator working alongside conventional chips, not a wholesale replacement.
The throughline of this whole article is energy. AI's growth is increasingly limited not by ideas but by power and heat. Light helps on both: it moves and (potentially) processes information using less energy and generating less heat than pushing electrons through metal. That's why "computing with light" is one of the most watched directions in the field: it attacks AI's single hardest constraint.
every layer feeds the one above it
Read top to bottom, the stack is one continuous story:
Every layer exists to feed the one above it. The whole machine is, ultimately, a way of turning electricity and math into something that recognizes patterns.
Sand to chips: purified silicon is printed with light into billions of nanometer-scale transistors.
Chips to math: GPUs and accelerators exist to do massive parallel multiplication, the one operation AI needs most.
Math to learning: a neural network nudges billions of weights, guess by guess, until real patterns emerge.
Learning to scale: training takes thousands of chips in data centers that fight networking, power, and heat.
Scale to light: photonics attacks the power-and-heat limit by moving (and maybe doing) the math with light.
Each of those is a rabbit hole worth falling into. Pick whichever part grabbed you most and start there.