From a grain of sand to a thinking machine, explained in plain language, with a picture for every idea.
It starts with sand. Really.
Every AI chip begins as sand, the same stuff as glass. We purify it into nearly perfect silicon, melt it, and slowly grow it into one giant, flawless crystal.
That crystal log gets sliced into thin discs called wafers, like slicing a salami. Hundreds of chips will be built on each wafer at once.
Zoom into a chip and you find transistors: microscopic switches. Each one is either on (1) or off (0). That's it. Stack billions of these switches in clever patterns and you can do any math or logic in the world.
Here's the surprising part. We don't carve chips by hand. We print them with light. Coat the wafer in a light-sensitive coating, shine light through a stencil that holds the circuit pattern, then wash and etch away the parts you don't want. Then do it again. And again: roughly 100 layers, stacked like a tiny city.
The lines are so small (a few atoms wide) that ordinary light is too "fat" to draw them. The cutting edge uses a special extreme-ultraviolet light, made by zapping tin droplets with a laser tens of thousands of times a second. The machine that does this is so complex that essentially one company on Earth can build it, and each one costs hundreds of millions of dollars.
Your laptop's main chip (the CPU) is like one genius worker who does tasks brilliantly, but mostly one at a time. AI doesn't need a genius. It needs a mountain of tiny, easy sums done all at once. So AI uses a GPU: thousands of small workers doing simple math in parallel.
One more quiet truth: moving the numbers around often costs more energy than the math itself. So chip designers stack super-fast memory right next to the processor to shorten the trip. Keep this in mind: it's why "light" shows up later (Part 4).
It isn't programmed with rules. It learns from examples.
Normal software is a list of human-written rules: if this, then that. AI flips that. Instead of writing rules, we show it millions of examples and let it tune itself until it gets good. Nobody ever codes "what a cat looks like"; the system works it out.
The tool for this is a neural network: layers of tiny math units called neurons. Each neuron does something simple: take some numbers in, multiply each by a weight (an "importance dial"), add them up, and pass the result on.
Wire many neurons into layers: numbers go in one side, get reshaped layer by layer, and an answer comes out the other. "Deep learning" just means lots of layers.
How do the dials get the right values? Through training, which is basically trial and error at massive scale:
Slowly, those random dials organize themselves into real understanding. Training a big model can take thousands of chips running for weeks. Once it's trained, using it (called inference) is much cheaper: just one quick pass.
Modern chatbots (including the one writing this) use a design called the Transformer. Its superpower is attention: when reading, the model looks at all the words at once and figures out which ones matter to each other.
A chatbot writes one chunk at a time (each chunk is called a token, roughly a word or part of a word). It predicts the most likely next chunk, adds it, then predicts the next, building the sentence piece by piece.
And why does it need so much computing power? Multiply three big numbers together: billions of dials × trillions of example chunks × many repeats. That's an astronomical amount of math, which is exactly why we need the chips and buildings in Parts 1, 3, and 4.
A warehouse full of computers, acting as one giant brain.
One chip isn't nearly enough. To train a big model you wire thousands of GPUs together so they behave like a single enormous machine. They live in data centers: big, humming warehouses built for exactly this.
1. Networking. The chips must share their results constantly to stay in sync. If the network is slow, hugely expensive chips just sit there waiting. So they're linked with ultra-fast connections.
2. Power. A big AI site can use as much electricity as a small city. Power is now the main limit on how big AI can grow. Operators are building next to power plants just to feed them.
3. Heat. Every watt of electricity turns into heat. Pack thousands of chips together and they'll cook themselves. Cooling goes from fans → liquid piped onto the chips → fully dunking servers in special non-conductive fluid.
This is why data centers get built in cold regions, near water, and next to cheap power: geography is part of the engineering.
This is the "energy using light" idea, and there are two versions of it.
Remember from Part 1: moving data wastes a lot of energy and heat. Inside computers, data normally travels as electricity through copper wires, and copper gets hot and fades over distance. Light fixes that.
Instead of pushing electricity through copper, send the data as pulses of light through glass fiber. Light is fast, stays cool over distance, and can carry a staggering amount of data. You can even send many streams down one fiber at once, each on a different color.
This is the bold one. Recall the main AI operation is multiply-and-add (Part 2). It turns out light naturally does that math when beams pass through optical parts and combine: at the speed of light, using very little energy, with almost no heat.
The catch: it's still maturing. The tricky part is converting back and forth between electricity and light without burning up the energy you just saved. So it will likely first appear as a specialist helper chip sitting alongside normal chips, not a full replacement.
Why this matters most of all: AI's single biggest limit isn't ideas anymore; it's power and heat. Light attacks both. That's why "computing with light" is one of the most exciting directions in the whole field.
Sand → chips. Purified sand becomes wafers, printed with light into chips full of billions of tiny on/off switches.
Chips → thinking. Those chips do mountains of simple math, which is exactly what a neural network needs to learn patterns from examples.
Thinking → buildings. One chip isn't enough, so thousands are wired together in data centers, where the real fights are network speed, power, and heat.
Buildings → light. The next leap beats those limits by moving (and maybe doing) the math with light instead of electricity.