the.com

teaching a machine to guess the next word by making it fail millions of times first.

means the process of feeding a model massive data and adjusting its internal numbers until its guesses stop being embarrassing.

from borrows the word from animal training, but here the treats are gradients: tiny corrections computed via calculus (backpropagation) that nudge billions of parameters closer to right answers.

costgpt-4 class runs reportedly exceed $100 million

data appetitetrained on trillions of words, still hungry

real bottleneckelectricity and gpus, not ideas

forgettingmodels can overwrite old skills learning new ones

for instance

gpt-4 — openai, 2023, trained on internet-scale text plus human feedback

alphago — deepmind, 2016, trained by playing itself millions of times

stable diffusion — 2022, trained on billions of image-text pairs scraped online

llama — meta, open weights, trained on curated public datasets

what’s happening now · the.com · generated