This week we discuss the release of the 1.5 billion parameter model of the GPT-2 neural network. Released by OpenAI, the GPT-2 is a language model trained to predict the next word in 40GB of Internet text. While it’s been around for some time in scaled-back forms, the release of the full model was previously considered “too dangerous” for the Internet. Why the change? Listen in as we discuss the implications of “big AI”, where the resources to train very large models are beyond what most individuals or even universities can afford. And what about the energy costs for all those GPUs/TPUs computing away a gradient descent for weeks or months on end? Should there be a “training shame” (utbildningskam in Swedish) for big, energy hungry models?

Read More