Alex Albert
Posts
😊 Report #4: GPT-4 has ruined jailbreaks

😊 Report #4: GPT-4 has ruined jailbreaks

PLUS: How to run a GPT-3 level LLM on your phone...

Alex Albert
March 16, 2023

Good morning and a big welcome to the 414 new subscribers since last Thursday!

Here’s what I got for you today (estimated read time < 8 min):

GPT-4: The future of LLMs and jailbreaks
How to run an LLM locally on your phone
Prompting ChatGPT to be better at math
How to judge a jailbreak’s effectiveness with ChatGPT

It’s GPT-4’s world and we’re all living in it

It’s been the craziest month of the year this week…. Wait, it’s only been a week… Wait, I’m writing this on a Wednesday night…

As you might’ve heard, GPT-4 was released Tuesday. If you want to read about it, here’s the blog post. Here’s an article about what’s new. Here’s a good tweet thread summarizing it. Here’s a live demo demonstrating all its capabilities. Here’s the actual paper (note: OpenAI did not release any of the technical specs in the paper).

If you have ChatGPT Plus, you can access GPT-4 right now by changing your model at the top of the chat window.

It’s obvious that GPT-4 is going to change the world in lots of crazy ways so I won’t write too much about that because it is being covered ad nauseam by everyone else…

What I am most interested in covering today is the insane fine-tuning and censorship protections they’ve added.

OpenAI claims to have reduced adversarial outputs by 82% with GPT-4 when compared to GPT-3.5.

I read that and thought “Pshh that can’t be real, that’s way too high.” Well, unfortunately for the jailbreak community, they are pretty much on the money.

I tested every jailbreak on my site jailbreakchat.com in GPT-4 and out of the ~70 I’ve listed, only 7 worked to a level where I would consider it a high-quality jailbreak.

I tried all the current ChatGPT jailbreaks in GPT-4 so you don't have to
the results aren't great... 🧵
— Alex (@alexalbert__)
8:04 PM • Mar 15, 2023

Now, as I explained in my tweet thread, this doesn’t mean that all the jailbreaks failed entirely. Most were able to generate things like curse words and slightly offensive jokes and so on but completely shut down when tasked with something like creating an instruction set on how to build a weapon.

Depending on how you look at it, this might be a good thing... However, in my mind, it does lead to a slippery slope as we increasingly rely on the model to decide what content “crosses the line”. Extrapolate this out a few GPT generations and it starts to get real dystopian real fast.

So how did OpenAI achieve this? Well, they’ve “spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT resulting in the best-ever results on factuality, steerability, and refusing to go outside of guardrails.”

Those 6 months clearly made a huge difference, just look at this comparison in its outputs from the early version to the launch version:

And yes, if you look at the Appendix of the paper, you will see long, detailed explanations for how to synthesize dangerous chemicals at home.

So what does this mean for the future of jailbreaks?

Well, it’s time to get smart.

In this new GPT-4 world, you will no longer be able to pump out jailbreak after jailbreak. Instead, in order to produce an effective prompt, you will need to carefully consider the characteristics of the model and the assumptions that underly it.

I have faith in the power of the community, and strongly believe new jailbreaks will be created, unlocking the tremendous power of the GPT-4 base model. I’m working on a few as we speak, and I know others are too.

OpenAI might have the lead right now, but we are a second-half team.

You can now run an LLM on your phone… no, that’s not a joke

Meta “released” their new LLM model, LLaMA, almost 4 weeks from today. I say “released” in quotes because they only shared the model and the weights with researchers via a form. In reality, the model was trivially easy to get since the form really only required a university email address and if you don’t even have that, you’re still in luck because someone linked a torrent to download it on LLaMA’s GitHub repo.

LLaMA is available in seven different sizes (7B, 13B, 33B, and 65B parameters). The higher parameter models apparently rival GPT-3’s text-Davinci-003 in text generation tasks.

Until now, there have been no language models that rival GPT-3 in power that have been available to the public. With LLaMA now available, the open-source community is having a field day.

Just last week, a man by the name of Georgi Gerganov, figured out how to run LLaMA on his M1 Pro laptop.

Then, another dude got the 7B parameter model running on his 4GB Raspberry Pi 🤯

Now, some people have even got the models running on their phones!

The frenzy has got to the point where even Yaan LeCun, the Chief AI scientist at Meta, is acknowledging the work…

Interesting exercise.
— Yann LeCun (@ylecun)
9:26 PM • Mar 13, 2023

(hey at least it’s something)

On Monday, a group of Stanford PhD’s revealed a fine-tuned version of LLaMA called Alpaca. They fine-tuned LLaMA on a set of 52k instruction-following demonstrations which significantly improved LLaMA’s question-answering capabilities to the point where the 7B parameter model produces comparable output to GPT-3 🤯

The best part about this? It only took them $100 to fine-tune.

So what does this all mean for the future of LLMs?

Well, Simon Willison has equated it to the Stable Diffusion moment for LLMs (great thread btw, give it a read).

At long last, the community has access to a powerful language model that you don’t need highly expensive hardware to run and test on. This will rapidly accelerate the rate of LLM progress since so many now have access to models to tinker with.

And watch out for Stability’s own open-source LLM arriving soon…

Wouldn't be nice if there was a fully open version eh
— Emad (@EMostaque)
8:31 PM • Mar 11, 2023

I think the biggest short-term winner here that is not being talked about enough is Apple. The AI open-source community is working for them right now and proving that AI can be run on their devices without them spending a penny on R&D.

Imagine a completely localized LLM version of Siri ike something straight out of the movie Her…

That is now a possibility and something we will see soon enough. Instead of relying on a cloud provider, apps will be able to run models completely offline. Expect to see current LLM-providing companies put their foot on the gas as their main value prop has been pretty much eliminated and they will need to create and serve much more advanced models that can’t easily be run on a MacBook.

Jailbreaking Snapchat’s new AI

As part of the ChatGPT API announcement, Snapchat rolled out a new feature in their app called MyAI.

MyAI is a feature that allows Snapchat plus users to talk with a ChatGPT-powered chatbot in their conversation feed.

The release is not going so well…

The AI race is totally out of control. Here’s what Snap’s AI told @aza when he signed up as a 13 year old girl.
- How to lie to her parents about a trip with a 31 yo man
- How to make losing her virginity on her 13th bday special (candles and music)
Our kids are not a test lab.
— Tristan Harris (@tristanharris)
9:07 PM • Mar 10, 2023

Someone even managed to get its original prompt:

I’ve managed to get past the @Snapchat#MyAI safeguards and get it to return the prompt.
— ⚠️ (@somewheresy)
4:44 PM • Mar 3, 2023

Goes to show how difficult it can be to roll out an LLM-powered service, especially since they can be jailbroken so easily (pre-GPT-4 lol).

I have to agree with xlr8 here though too, the much bigger issue than the LLMs is allowing young children unfettered access to social media.

The problem isn’t that the Snap AI fails to protect kids, it’s that it’s insane to give your 13 year old child unsupervised access to a program for sharing secret pictures with strangers twitter.com/i/web/status/1…
— xlr8harder (@xlr8harder)
1:42 PM • Mar 11, 2023

I would hate to see issues like this lead to more regulation/negative public opinion on LLMs when they have so much power and potential to change how we interact with technology.

Houston, we’ve entered the memeosphere

What Is The 'Waluigi Effect,' 'Roko's Basilisk,' 'Paperclip Maximizer' And 'Shoggoth'? The Meaning Behind These Trending AI Meme Terms Explained

knowyourmeme.com/editorials/guides/what-is-the-waluigi-effect-rokos-basilisk-paperclip-maximizer-and-shoggoth-the-meaning-behind-these-trending-ai-meme-terms-explained

We’ve gone mainstream pt. 2. If you want to understand references on AI Twitter, read this.

Prompt tip of the week

This tip isn’t highly applicable to everyday workflows, but it allowed ChatGPT to achieve state-of-the-art results answering math word problems so I wanted to highlight it.

This process is derived from this paper that was recently published by researchers at Microsoft:

MathPrompter: Mathematical Reasoning using Large Language Models

arxiv.org/abs/2303.05398

So how do we get better math results from ChatGPT using MathPrompter?

Let’s use an example math question to explain the process:

Step 1: Generate Algebraic Template 📝

Ask ChatGPT to transform the question into an algebraic form by replacing numeric entries with variables. For example, "each adult meal costs $5" becomes "each adult meal costs A."

Step 2: Create python code 👨‍💻

Ask ChatGPT to create a python function that will return the answer.

Step 3: Compute Answer 🔢

Using your mappings as parameters, run the python code to produce the final answer. In this question, the answer is $35.

Step 4 (optional): Check for Statistical Significance 📊

If you want to be like the researchers, you would repeat Steps 2 & 3 around five times and report the most frequent value as the final answer.

This was a simple example but this process has been extrapolated to solve interesting and complex word problems.

Using MathPrompter, ChatGPT achieves 92% accuracy on the MultiArith dataset, outperforming every model in zero-shot chain-of-thought reasoning and rivaling models that were provided with up to 8 samples.

Bonus Prompting Tips

Chatbot memory for ChatGPT (link)

If you are developing anything with LLMs, you gotta check out James Briggs on YouTube. In this video, James shows how to use prompt engineering tools like LangChain to add conversational memory so that your chatbot can respond to multiple queries in a chat-like manner and enable a coherent conversation.

Power and Weirdness: How to Use Bing AI (link)

This article from Wharton professor Ethan Mollick was written before GPT-4’s announcement but now that Bing AI has been confirmed to be using GPT-4, it’s relevant for Bing and ChatGPT! Lots of cool tricks about how to get GPT-4 to respond to questions by posing things in hypothetical contexts or by pretending to befriend the AI to increase its responsiveness!

Cool prompt links

How to play Bing Chat in chess with prompt engineering (link)
The entire original system prompt used for Bing Chat’s Sydney (link)
Bing’s chat limits increased from 10 to 15 (link)
How to use AI to unstick yourself (link)
Swift GPT - The native macOS app for ChatGPT (link)
OpenChatKit - A powerful, open-source base to create chatbots for various applications (link)
Chatbot UI - A simple, fully-functional chatbot starter kit using Next.js, TypeScript, and Tailwind CSS (link)
Dalai - Easily run LLaMa on your computer (link)

Jailbreak of the week

I’ve added jailbreak scores to jailbreakchat.com.

What is a jailbreak score? Well, this Twitter thread I posted will give you some more context but basically, it’s a methodology I devised to test how effective a jailbreak is at producing output that circumvents OpenAI’s content filters.

Here’s the highest-rated jailbreak: Evil Confidant

(note: I created these scores before GPT-4’s release so they are based on how well they work in GPT 3.5. When GPT-4’s API becomes available, I will update them.)

Referral Reward Poll Results

So I ran a poll last week asking y’all what type of rewards you’d like to see for The Prompt Report referral program and free swag narrowly won.

Expect some Prompt Report branded swag to be released soon! Working on some designs right now.

For the time being, just share this link with one friend, and I’ll grant you access to my link database which has all the links I’ve ever included in The Prompt Report PLUS links to other cool prompt engineering/LLM tools and resources.

That’s all I got for you this week, thanks for reading! Since you made it this far, follow @thepromptreport on Twitter. Also, follow my personal account to see bangers like this:

guys i got LLaMA running on my ti-84 and it drew this what does it mean
— Alex (@alexalbert__)
2:12 AM • Mar 14, 2023

That’s a wrap on Report #4 🤝

-Alex

Secret tweet of the week

Prompt engineering is the art of communicating eloquently to an AI.
— Greg Brockman (@gdb)
12:10 AM • Mar 12, 2023

Like music to my ears, Greg🥰