GPT-4 & Artificial General Intelligence: Are We There Yet?

Last updated:
March 15, 2024

English, Japanese, Spanish — What do these languages have in common?

They are what we call natural languages.

While the exact origin of language is still fiercely debated in the scientific community, it’s widely accepted that language emerged around the same time as the evolution of modern Homo sapiens—roughly 150,000 years ago.

For centuries after, language has been the sole gift exclusive to mankind. Throughout the course of our (admittedly short) history, no other species has come close to communicating at the same level of complexity as the human language.

Yet, that has not stopped humans from attempting to teach it to other species. In 1973, an infant Chimpanzee was sent to live with a human family as part of a conditioning experiment. The chimp, cheekily named Nim Chimpsky (a not-so-subtle nod to the father of modern linguistics, Noam Chomsky), only managed to learn roughly 125 American Sign Language signs before being forcefully removed due to violence towards its caretakers.

So, Project Nim was a failure. But wait, there’s no reason to throw in the towel just yet. We still have one promising contender in the running.

Can machines generate natural language?

Yes, they can.

Now in its fourth generation, GPT-4—a multi-modal machine learning model boasting a rumored 1.7 trillion parameters—shocked the world upon its debut owing to its outstanding ability to mimic natural language with human-like fluency. The model could generate catchy song lyrics, draft lawsuits, write elegant computer code, and even create memes—all from a single text (or image) prompt. To say that the GPT-4 developer livestream was jaw-dropping is to say the least.

Within weeks of its debut, users took to social media to showcase the myriad applications of OpenAI’s new model. As a powerful testament to its natural language proficiency, popular language-learning app, Duolingo, unveiled a partnership with OpenAI to bring AI-powered features to learners for a more immersive learning experience—to warm reception.

Bestselling author Reid Hoffman even co-authored a book with GPT-4:

Bestselling author of Blitzscaling, Reid Hoffman, co-authors a book titled "Impromptu: Amplifying our Humanity through AI" with GPT-4 | Source: @reidhoffman on Twitter

The examples above tell us this: our man-made machines can learn to generate text in a natural, fluent way if trained with enough data. While no doubt an impressive feat on its own, that’s not all there is to mastering natural language.

Natural language and artificial general intelligence

Perfect grammar aside, the problem plaguing these models is not the execution of the language itself, but rather, the understanding of it.

If you’ve ever tried learning a language from scratch, you would appreciate how challenging the task is. Not only do you have to memorise foreign-looking words and know how to arrange them into a proper sentence, you also have to contend with cultural and individual variations of the language. When used in different contexts, words have different meanings.

Presently, our machines are doing a great job at parsing what the same words mean in different contexts. For example, if I use the word “apple” in a query about electronics, the AI would assume that I’m talking about the brand and not the fruit. Armed with this contextual knowledge, the AI can then direct me to the relevant information that I requested for. So far, so good.

But whether the AI model actually possesses relevant knowledge related to certain words is an altogether different can of worms. To test the theory out, I asked GPT-4 a simple word riddle: What kind of bird is so strong it can lift a car?

Any thinking human would be able to answer ‘crane’ without much hesitation. However, what I got back from GPT-4 was an incorrect, boring response despite having already answered several logic riddles before this:

Actually, there's no bird that can lift a car. The idea of a bird lifting a car is more likely to be found in fictional stories or cartoons. In reality, even the largest and strongest birds, like eagles and condors, can only lift a fraction of their body weight. Cars, on the other hand, weigh thousands of pounds, which is far beyond the lifting capacity of any bird.

Besides sounding a tad condescending, it appears that the AI may have completely missed the obvious wordplay. So much for language mastery, huh?

To be fair, the AI was able to answer the bulk of the word riddles and trick questions posed to it during my 30-minute testing. While there were a few slip-ups (such as the example above) here and there, the model could be said to possess superhuman-level reasoning and linguistic abilities—a level on par with a genius polyglot.

Nevertheless, this still leaves me unconvinced. How can we be so sure that GPT-4 is not simply copying some existing data learned during its training?

Is GPT-4 showing sparks of AGI?

In a groundbreaking research paper titled “Sparks of Artificial General Intelligence: Early experiments with GPT-4”, Microsoft researchers defined intelligence as a general ability that is “not limited to a specific domain or task, but rather encompasses a broad range of cognitive skills and abilities,” including the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, and learn from experience.

To assess the model on different dimensions of its general intelligence, the researchers tested GPT-4 on a novel range of creative tasks that it has never seen before, such as stacking items or drawing unicorns in a specific programming language:

sparks of artificial general intelligence

Interestingly, researchers observed noteworthy improvements in the model’s ability to accomplish creative tasks over the span of a month:

Interdisciplinary tests on the model’s ability to combine knowledge from different domains were also conducted to remarkable results, showcasing its capacity to synthesize unrelated information in creative and novel ways.

Yet, despite stellar results on the general intelligence report card, the nature of an autoregressive language model (working on a next-word prediction paradigm) prevents GPT-4 from being able to do something characteristically human: planning ahead.

gpt 4 sparks of agi

In other words, the model is pretty good at producing texts that sound smooth and make sense, but struggles when it comes to solving tricky problems that require forward planning and backtracking.

Looking back, the media frenzy was, for the most part, unnecessary. Despite their vastly enhanced “intelligence”, AI models are far from perfect. In recent months, a study conducted by Stanford researchers found considerable “drift” in GPT-4’s ability to perform certain tasks, in particular citing the model’s worsening ability to identify prime numbers consistently and accurately.

Remember the stacking problem? I posed the question to GPT-4 again to see for myself if the study’s findings were true and, lo and behold, the proposed solution was indeed different, and problematic:

Stacking these items in a stable manner can be quite a challenge, but let's try this way:

  1. Start with the book. It's likely the largest and most stable item, so it will make a good base.
  2. Place the laptop on the book. It has a flat surface and should sit nicely on the book.
  3. Next, carefully set the bottle on its side on the laptop. Make sure it's not rolling; you could use the laptop's edges as a support.
  4. Here's where it gets tricky. The nail is small and thin, so you need to place it meticulously on the bottle. Make sure it's lying flat and not teetering on the bottle's curvature.
  5. Lastly, the eggs. As they are fragile and round, stacking them might seem impossible. However, if you've got egg cartons, you can put the eggs back in the carton and gently place it on top of the nail. This way, the eggs are contained and won't roll off.

On the bright side, at least the eggs are in the carton!

Intelligence is imperfect

While lots of work remains to be done, the authors of this exploration conclude by acknowledging GPT-4’s broad intelligence across a variety of tasks, confirming that the AI model has indeed shown “sparks of artificial general intelligence”.

Personal ego aside, I’m inclined to agree with this assessment. Look around you. Intelligence manifests itself in diverse shapes and forms. From orcas to monkeys to bees, each distinct species is a living manifestation of intelligence in their own peculiar ways. In the eyes of an impartial observer, any single form of intelligence is imperfect.

Ultimately, progress towards AGI does not mean that machines become perfect at everything it does or that it comes close to being able to do anything that a human can do. We cannot replicate that intelligence, and it would be foolish to attempt to do so.

Machine learning to reduce false positives is a critical aspect of improving decision-making processes across various domains. By harnessing the power of data-driven models, machine learning algorithms can identify nuanced patterns and contextual information, allowing for more accurate assessments and reducing the likelihood of erroneous positive detections.

Our thinking machines are unreliable, careless, and problematic. And so are we.

P.S. Get access to GPT-4 for free here.

Disclaimer: The opinions expressed in this article are the author's own and do not reflect the views of Hypotenuse AI or its affiliates.

Alex is a seasoned writer responsible for creating valuable, well-researched content for various industries like tech and ecommerce.
Content Writer
Alex is a seasoned writer responsible for creating valuable, well-researched content for various industries like tech and ecommerce.

Join 100,000+ marketers writing with Hypotenuse AI

MacBook mockup