"Like GPT-2 and other Transformerbased. programs, GPT-3 is trained on the Common Crawl data set, a corpus of almost a trillion words of texts scraped from the Web. 'The dataset and model size are about two orders of magnitude larger than those used for GPT-2.'"