What is GPT-4 (and When?)

Mandar Karhade, MD. PhD.
Towards AI
Published in
4 min readNov 14, 2022

--

Comparing GPT-4 to GPT-3 and human brain (source: Lex Fridman @youtube)

It has been some time since Robert Scoble wrote this about GPT-4. Which pointed me that OpenAI might be giving access to GPT-4 to a certain closed group of individuals. Not sure if it is a conventional alpha-build or a beta but given the timelines, I am guessing that since Aug 2022, there has been enough time passed, which suggests a beta or even early release candidate.

Source (Twitter)

What is GPT-4?

We all know that GPT-3 was a huge leap in itself. The refined model that could spit out fluent paragraphs compared to GPT-2. Since, the release of GPT-3, the discussion on the “next big thing” was fairly quiet and muted. Now, we have got more information about GPT-4.

The concrete specifics about GPT-4 specifications are still in flux due to NDA, however GPT-4 is likely to use 100 Trillion parameters (source). This is the first large-scale model with sparsity at the core design. What does it mean to have sparsity, well, it means that even at the 100T parameter space, the compute cost is likely to be lower. This means that a lot of neurons are still active in the final model. From layman’s understanding, it is a model where the model can keep a lot more choices of “next word” or “next sentence” or “next emotion” based on the context. In all essence, this means it is more similar to actual human thinking than its predecessor.

Wait, but what is GPT

Generative Pre-trained Transformer (GPT) is a text generation deep learning model trained on the data available on the internet. It is used for question & answers, text summary generation, machine translation, classification, code generation, and conversation AI.
The applications of GPT models are endless. Furthermore, you can even fine-tune them on specific data to create even better results (transfer learning). By using the “sauce” from GPT models, building NLP projects becomes a heck lot easier. Easier means you save time, money, and resources and ultimately you use the generalization (giant sample size) to get started without having to reinvent the wheel for general aspects of the language.

source xataka

GPT-1 to GPT-3

Since 2018 when GPT-1 was first published (link) GPT-3 has made giant progress. The GPT-1 had (only /s) 117 million parameters. GPT-2 raised the bar to 1.2 billion parameters (publication), and GPT-3 raised it even further to 175 billion parameters (publication). For reference, the Deepmind’s Gopher model had 250 billion parameters (publication) and Megatron NLG’s model had 500 billion+ parameters (publication).

At the same time, Microsoft’s efforts with OpenAI lead to the conclusion that optional hyperparameter tuning has great utility in fine-tuning models at this scale. Generally the larger the model, it is extremely costly to fine-tune it. Deepmind’s chinchilla experiment (publication) concluded that the number of parameters is as important as the size of the training corpus.

Final remarks,

GPT-4 is a text-only model that takes the NLP one giant step ahead promising step. GPT-4 is likely to be released early next year! Given the abilities of GPT-4 extrapolated from GPT-3, we might now need a new Touring test standard. The topic of AI in deepfakes is another stream, but with every leap in the model, we get closer to it.

Since the release of previously released General-Purpose models, a plethora of text has been generated. The understanding of the general populous is unmistakably lower than what it needs to be. I am optimistic but I would love to have a similar concerted effort from the same for-profit entities in improving AI models that can identify an AI-generated text from a human-generated text.

Undoubtedly, these are exciting times, fall is upon us so let us enjoy the progress. I will be eagerly waiting for more information as and when it comes.!

Credits: https://unsplash.com/@sajadnori

--

--