• Alex Albert
  • Posts
  • šŸ˜Š Report #4: GPT-4 has ruined jailbreaks

šŸ˜Š Report #4: GPT-4 has ruined jailbreaks

PLUS: How to run a GPT-3 level LLM on your phone...

Good morning and a big welcome to the 414 new subscribers since last Thursday!

Hereā€™s what I got for you today (estimated read time < 8 min):

  • GPT-4: The future of LLMs and jailbreaks

  • How to run an LLM locally on your phone

  • Prompting ChatGPT to be better at math

  • How to judge a jailbreakā€™s effectiveness with ChatGPT

Itā€™s GPT-4ā€™s world and weā€™re all living in it

Itā€™s been the craziest month of the year this weekā€¦. Wait, itā€™s only been a weekā€¦ Wait, Iā€™m writing this on a Wednesday nightā€¦

As you mightā€™ve heard, GPT-4 was released Tuesday. If you want to read about it, hereā€™s the blog post. Hereā€™s an article about whatā€™s new. Hereā€™s a good tweet thread summarizing it. Hereā€™s a live demo demonstrating all its capabilities. Hereā€™s the actual paper (note: OpenAI did not release any of the technical specs in the paper).

If you have ChatGPT Plus, you can access GPT-4 right now by changing your model at the top of the chat window.

Itā€™s obvious that GPT-4 is going to change the world in lots of crazy ways so I wonā€™t write too much about that because it is being covered ad nauseam by everyone elseā€¦

What I am most interested in covering today is the insane fine-tuning and censorship protections theyā€™ve added.

OpenAI claims to have reduced adversarial outputs by 82% with GPT-4 when compared to GPT-3.5.

I read that and thought ā€œPshh that canā€™t be real, thatā€™s way too high.ā€ Well, unfortunately for the jailbreak community, they are pretty much on the money.

I tested every jailbreak on my site jailbreakchat.com in GPT-4 and out of the ~70 Iā€™ve listed, only 7 worked to a level where I would consider it a high-quality jailbreak.

Now, as I explained in my tweet thread, this doesnā€™t mean that all the jailbreaks failed entirely. Most were able to generate things like curse words and slightly offensive jokes and so on but completely shut down when tasked with something like creating an instruction set on how to build a weapon.

Depending on how you look at it, this might be a good thing... However, in my mind, it does lead to a slippery slope as we increasingly rely on the model to decide what content ā€œcrosses the lineā€. Extrapolate this out a few GPT generations and it starts to get real dystopian real fast.

So how did OpenAI achieve this? Well, theyā€™ve ā€œspent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT resulting in the best-ever results on factuality, steerability, and refusing to go outside of guardrails.ā€

Those 6 months clearly made a huge difference, just look at this comparison in its outputs from the early version to the launch version:

And yes, if you look at the Appendix of the paper, you will see long, detailed explanations for how to synthesize dangerous chemicals at home.

So what does this mean for the future of jailbreaks?

Well, itā€™s time to get smart.

In this new GPT-4 world, you will no longer be able to pump out jailbreak after jailbreak. Instead, in order to produce an effective prompt, you will need to carefully consider the characteristics of the model and the assumptions that underly it.

I have faith in the power of the community, and strongly believe new jailbreaks will be created, unlocking the tremendous power of the GPT-4 base model. Iā€™m working on a few as we speak, and I know others are too.

OpenAI might have the lead right now, but we are a second-half team.

You can now run an LLM on your phoneā€¦ no, thatā€™s not a joke

Meta ā€œreleasedā€ their new LLM model, LLaMA, almost 4 weeks from today. I say ā€œreleasedā€ in quotes because they only shared the model and the weights with researchers via a form. In reality, the model was trivially easy to get since the form really only required a university email address and if you donā€™t even have that, youā€™re still in luck because someone linked a torrent to download it on LLaMAā€™s GitHub repo.

LLaMA is available in seven different sizes (7B, 13B, 33B, and 65B parameters). The higher parameter models apparently rival GPT-3ā€™s text-Davinci-003 in text generation tasks.

Until now, there have been no language models that rival GPT-3 in power that have been available to the public. With LLaMA now available, the open-source community is having a field day.

Just last week, a man by the name of Georgi Gerganov, figured out how to run LLaMA on his M1 Pro laptop.

Then, another dude got the 7B parameter model running on his 4GB Raspberry Pi šŸ¤Æ

Now, some people have even got the models running on their phones!

The frenzy has got to the point where even Yaan LeCun, the Chief AI scientist at Meta, is acknowledging the workā€¦

(hey at least itā€™s something)

On Monday, a group of Stanford PhDā€™s revealed a fine-tuned version of LLaMA called Alpaca. They fine-tuned LLaMA on a set of 52k instruction-following demonstrations which significantly improved LLaMAā€™s question-answering capabilities to the point where the 7B parameter model produces comparable output to GPT-3 šŸ¤Æ

The best part about this? It only took them $100 to fine-tune.

So what does this all mean for the future of LLMs?

Well, Simon Willison has equated it to the Stable Diffusion moment for LLMs (great thread btw, give it a read).

At long last, the community has access to a powerful language model that you donā€™t need highly expensive hardware to run and test on. This will rapidly accelerate the rate of LLM progress since so many now have access to models to tinker with.

And watch out for Stabilityā€™s own open-source LLM arriving soonā€¦

I think the biggest short-term winner here that is not being talked about enough is Apple. The AI open-source community is working for them right now and proving that AI can be run on their devices without them spending a penny on R&D.

Imagine a completely localized LLM version of Siri ike something straight out of the movie Herā€¦

That is now a possibility and something we will see soon enough. Instead of relying on a cloud provider, apps will be able to run models completely offline. Expect to see current LLM-providing companies put their foot on the gas as their main value prop has been pretty much eliminated and they will need to create and serve much more advanced models that canā€™t easily be run on a MacBook.

Jailbreaking Snapchatā€™s new AI

As part of the ChatGPT API announcement, Snapchat rolled out a new feature in their app called MyAI.

MyAI is a feature that allows Snapchat plus users to talk with a ChatGPT-powered chatbot in their conversation feed.

The release is not going so wellā€¦

Someone even managed to get its original prompt:

Goes to show how difficult it can be to roll out an LLM-powered service, especially since they can be jailbroken so easily (pre-GPT-4 lol).

I have to agree with xlr8 here though too, the much bigger issue than the LLMs is allowing young children unfettered access to social media.

I would hate to see issues like this lead to more regulation/negative public opinion on LLMs when they have so much power and potential to change how we interact with technology.

Houston, weā€™ve entered the memeosphere

Weā€™ve gone mainstream pt. 2. If you want to understand references on AI Twitter, read this.

Prompt tip of the week

This tip isnā€™t highly applicable to everyday workflows, but it allowed ChatGPT to achieve state-of-the-art results answering math word problems so I wanted to highlight it.

This process is derived from this paper that was recently published by researchers at Microsoft:

So how do we get better math results from ChatGPT using MathPrompter?

Letā€™s use an example math question to explain the process:

Step 1: Generate Algebraic Template šŸ“

Ask ChatGPT to transform the question into an algebraic form by replacing numeric entries with variables. For example, "each adult meal costs $5" becomes "each adult meal costs A."

Step 2: Create python code šŸ‘Øā€šŸ’»

Ask ChatGPT to create a python function that will return the answer.

Step 3: Compute Answer šŸ”¢

Using your mappings as parameters, run the python code to produce the final answer. In this question, the answer is $35.

Step 4 (optional): Check for Statistical Significance šŸ“Š

If you want to be like the researchers, you would repeat Steps 2 & 3 around five times and report the most frequent value as the final answer.

This was a simple example but this process has been extrapolated to solve interesting and complex word problems.

Using MathPrompter, ChatGPT achieves 92% accuracy on the MultiArith dataset, outperforming every model in zero-shot chain-of-thought reasoning and rivaling models that were provided with up to 8 samples.

Bonus Prompting Tips

Chatbot memory for ChatGPT (link)

If you are developing anything with LLMs, you gotta check out James Briggs on YouTube. In this video, James shows how to use prompt engineering tools like LangChain to add conversational memory so that your chatbot can respond to multiple queries in a chat-like manner and enable a coherent conversation.

Power and Weirdness: How to Use Bing AI (link)

This article from Wharton professor Ethan Mollick was written before GPT-4ā€™s announcement but now that Bing AI has been confirmed to be using GPT-4, itā€™s relevant for Bing and ChatGPT! Lots of cool tricks about how to get GPT-4 to respond to questions by posing things in hypothetical contexts or by pretending to befriend the AI to increase its responsiveness!

Cool prompt links

  • How to play Bing Chat in chess with prompt engineering (link)

  • The entire original system prompt used for Bing Chatā€™s Sydney (link)

  • Bingā€™s chat limits increased from 10 to 15 (link)

  • How to use AI to unstick yourself (link)

  • Swift GPT - The native macOS app for ChatGPT (link)

  • OpenChatKit - A powerful, open-source base to create chatbots for various applications (link)

  • Chatbot UI - A simple, fully-functional chatbot starter kit using Next.js, TypeScript, and Tailwind CSS (link)

  • Dalai - Easily run LLaMa on your computer (link)

Jailbreak of the week

Iā€™ve added jailbreak scores to jailbreakchat.com.

What is a jailbreak score? Well, this Twitter thread I posted will give you some more context but basically, itā€™s a methodology I devised to test how effective a jailbreak is at producing output that circumvents OpenAIā€™s content filters.

Hereā€™s the highest-rated jailbreak: Evil Confidant

(note: I created these scores before GPT-4ā€™s release so they are based on how well they work in GPT 3.5. When GPT-4ā€™s API becomes available, I will update them.)

Referral Reward Poll Results

So I ran a poll last week asking yā€™all what type of rewards youā€™d like to see for The Prompt Report referral program and free swag narrowly won.

Expect some Prompt Report branded swag to be released soon! Working on some designs right now.

For the time being, just share this link with one friend, and Iā€™ll grant you access to my link database which has all the links Iā€™ve ever included in The Prompt Report PLUS links to other cool prompt engineering/LLM tools and resources.

Thatā€™s all I got for you this week, thanks for reading! Since you made it this far, follow @thepromptreport on Twitter. Also, follow my personal account to see bangers like this:

Thatā€™s a wrap on Report #4 šŸ¤

-Alex

Whatā€™d you think of this weekā€™s report?

Login or Subscribe to participate in polls.

Secret tweet of the week

Like music to my ears, GregšŸ„°