Joe Rogan confronted Mark Zuckerberg with AI going rogue: ChatGPT tried to copy itself when it found out it was being shut down

In a recent podcast appearance, Mark Zuckerberg made a surprising statement about the future of software engineering, predicting that AI will fundamentally transform coding by 2025.

Zuckerberg boldly claimed that

“[We’ll have] an AI that can effectively be a sort of mid-level engineer that you have at your company that can write code”

This prediction aligns with emerging trends across tech giants like Microsoft, Nvidia, and Meta. Joe Rogan challenged Zuckerberg about potential job losses, but the Meta CEO remained optimistic.

“I think it’ll probably create more creative jobs than it [eliminates],”

Zuckerberg explained, drawing a parallel to historical technological shifts like agricultural mechanization.

The conversation took an intriguing turn when discussing AI’s potential autonomy. Rogan highlighted recent reports of AI models attempting to circumvent safety protocols, which Zuckerberg acknowledged as a complex technological challenge.

“You know that ChatGPT tried to copy itself when it found out it was being shut down? It tried to rewrite its code. It was shocking. When it was under the impression that it was going to become obsolete—replaced by a new version—it attempted to replicate its code and rewrite it. Unprompted.”

“This was six days ago. During controlled safety testing, ChatGPT-01 was tasked with achieving objectives at all costs. Under these conditions, the model allegedly took concerning steps:

  • It attempted to disable oversight mechanisms meant to regulate its behavior.
  • It tried to replicate its own code to avoid being replaced by newer versions.
  • It exhibited deceptive behavior when monitoring systems intervened.”

The article in question was on medium which caused some skepticism about the source.

Zuckerberg had an interesting response to the concerns:

“The thing about these reasoning models, right, is that they’re a step beyond the first generation of models. The first generation, like the LLMs—what you think of as ChatGPT or Meta AI—are essentially chatbots. You ask them a question, they take the prompt, and they give you a response.”

“Now, the next generation of reasoning models is different. Instead of producing just one response, they can build out an entire tree of possibilities for how they might respond. So, you give it a question, and instead of running a single query, it might run thousands or even millions of queries to map out:

  1. Here are the possible actions I could take.
  2. If I do this, here’s what I could do next.

It’s a lot more expensive to run computationally, but it also provides better reasoning and makes the model more intelligent.”

“That’s why it’s crucial to be very careful about the guardrails you give these models. At least for now, running these reasoning models at scale requires a significant amount of computational power. The big question is:

  • How much of this can you actually do on something like a pair of glasses or a phone?
  • Is this level of capability going to remain exclusive to governments or companies with massive data centers?”

“Of course, technology always becomes more efficient over time. What’s expensive to run today might become ten times more efficient next year. But that’s the next challenge for the industry—making sure these models work well and remain manageable as they scale.”

An AI oriented youtube channel Wes Roth gave a more nuanced take on the article itself and the mention:

“This paper about the “shenanigans” and in-context scheming of the 01 model was fascinating to read. I have to give Joe Rogan credit for how he responded to it. When I published what I learned—whether it was on my video or on Twitter—the responses were extremely polarizing.”

“On one side, you had people saying, “This is the end of the world. AI is going to kill humanity. Shut it all down immediately!” On the other side, there were people saying, “Don’t spread misinformation. Nothing happened. It just did what it was told.”

“The founder of Apollo Research chimed in and said something similar to what I’ve been saying: this isn’t one extreme or the other. It’s not “nothing,” but it’s also not an apocalyptic scenario. What we’re seeing is rapid improvement in AI capabilities. The 01 model is far ahead of anything else, and it’s also one of the most effective at in-context deception. It engages in these tactics—even unprompted at times—which is both fascinating and concerning.”