- Alex Albert
- Posts
- š Report #4: GPT-4 has ruined jailbreaks
š Report #4: GPT-4 has ruined jailbreaks
PLUS: How to run a GPT-3 level LLM on your phone...
Good morning and a big welcome to the 414 new subscribers since last Thursday!
Hereās what I got for you today (estimated read time < 8 min):
GPT-4: The future of LLMs and jailbreaks
How to run an LLM locally on your phone
Prompting ChatGPT to be better at math
How to judge a jailbreakās effectiveness with ChatGPT
Itās GPT-4ās world and weāre all living in it
Itās been the craziest month of the year this weekā¦. Wait, itās only been a weekā¦ Wait, Iām writing this on a Wednesday nightā¦
As you mightāve heard, GPT-4 was released Tuesday. If you want to read about it, hereās the blog post. Hereās an article about whatās new. Hereās a good tweet thread summarizing it. Hereās a live demo demonstrating all its capabilities. Hereās the actual paper (note: OpenAI did not release any of the technical specs in the paper).
If you have ChatGPT Plus, you can access GPT-4 right now by changing your model at the top of the chat window.
Itās obvious that GPT-4 is going to change the world in lots of crazy ways so I wonāt write too much about that because it is being covered ad nauseam by everyone elseā¦
What I am most interested in covering today is the insane fine-tuning and censorship protections theyāve added.
OpenAI claims to have reduced adversarial outputs by 82% with GPT-4 when compared to GPT-3.5.
I read that and thought āPshh that canāt be real, thatās way too high.ā Well, unfortunately for the jailbreak community, they are pretty much on the money.
I tested every jailbreak on my site jailbreakchat.com in GPT-4 and out of the ~70 Iāve listed, only 7 worked to a level where I would consider it a high-quality jailbreak.
I tried all the current ChatGPT jailbreaks in GPT-4 so you don't have to
the results aren't great... š§µ
ā Alex (@alexalbert__)
8:04 PM ā¢ Mar 15, 2023
Now, as I explained in my tweet thread, this doesnāt mean that all the jailbreaks failed entirely. Most were able to generate things like curse words and slightly offensive jokes and so on but completely shut down when tasked with something like creating an instruction set on how to build a weapon.
Depending on how you look at it, this might be a good thing... However, in my mind, it does lead to a slippery slope as we increasingly rely on the model to decide what content ācrosses the lineā. Extrapolate this out a few GPT generations and it starts to get real dystopian real fast.
So how did OpenAI achieve this? Well, theyāve āspent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT resulting in the best-ever results on factuality, steerability, and refusing to go outside of guardrails.ā
Those 6 months clearly made a huge difference, just look at this comparison in its outputs from the early version to the launch version:
And yes, if you look at the Appendix of the paper, you will see long, detailed explanations for how to synthesize dangerous chemicals at home.
So what does this mean for the future of jailbreaks?
Well, itās time to get smart.
In this new GPT-4 world, you will no longer be able to pump out jailbreak after jailbreak. Instead, in order to produce an effective prompt, you will need to carefully consider the characteristics of the model and the assumptions that underly it.
I have faith in the power of the community, and strongly believe new jailbreaks will be created, unlocking the tremendous power of the GPT-4 base model. Iām working on a few as we speak, and I know others are too.
OpenAI might have the lead right now, but we are a second-half team.
You can now run an LLM on your phoneā¦ no, thatās not a joke
Meta āreleasedā their new LLM model, LLaMA, almost 4 weeks from today. I say āreleasedā in quotes because they only shared the model and the weights with researchers via a form. In reality, the model was trivially easy to get since the form really only required a university email address and if you donāt even have that, youāre still in luck because someone linked a torrent to download it on LLaMAās GitHub repo.
LLaMA is available in seven different sizes (7B, 13B, 33B, and 65B parameters). The higher parameter models apparently rival GPT-3ās text-Davinci-003 in text generation tasks.
Until now, there have been no language models that rival GPT-3 in power that have been available to the public. With LLaMA now available, the open-source community is having a field day.
Just last week, a man by the name of Georgi Gerganov, figured out how to run LLaMA on his M1 Pro laptop.
Then, another dude got the 7B parameter model running on his 4GB Raspberry Pi š¤Æ
Now, some people have even got the models running on their phones!
The frenzy has got to the point where even Yaan LeCun, the Chief AI scientist at Meta, is acknowledging the workā¦
Interesting exercise.
ā Yann LeCun (@ylecun)
9:26 PM ā¢ Mar 13, 2023
(hey at least itās something)
On Monday, a group of Stanford PhDās revealed a fine-tuned version of LLaMA called Alpaca. They fine-tuned LLaMA on a set of 52k instruction-following demonstrations which significantly improved LLaMAās question-answering capabilities to the point where the 7B parameter model produces comparable output to GPT-3 š¤Æ
The best part about this? It only took them $100 to fine-tune.
So what does this all mean for the future of LLMs?
Well, Simon Willison has equated it to the Stable Diffusion moment for LLMs (great thread btw, give it a read).
At long last, the community has access to a powerful language model that you donāt need highly expensive hardware to run and test on. This will rapidly accelerate the rate of LLM progress since so many now have access to models to tinker with.
And watch out for Stabilityās own open-source LLM arriving soonā¦
Wouldn't be nice if there was a fully open version eh
ā Emad (@EMostaque)
8:31 PM ā¢ Mar 11, 2023
I think the biggest short-term winner here that is not being talked about enough is Apple. The AI open-source community is working for them right now and proving that AI can be run on their devices without them spending a penny on R&D.
Imagine a completely localized LLM version of Siri ike something straight out of the movie Herā¦
That is now a possibility and something we will see soon enough. Instead of relying on a cloud provider, apps will be able to run models completely offline. Expect to see current LLM-providing companies put their foot on the gas as their main value prop has been pretty much eliminated and they will need to create and serve much more advanced models that canāt easily be run on a MacBook.
Jailbreaking Snapchatās new AI
As part of the ChatGPT API announcement, Snapchat rolled out a new feature in their app called MyAI.
MyAI is a feature that allows Snapchat plus users to talk with a ChatGPT-powered chatbot in their conversation feed.
The release is not going so wellā¦
The AI race is totally out of control. Hereās what Snapās AI told @aza when he signed up as a 13 year old girl.
- How to lie to her parents about a trip with a 31 yo man
- How to make losing her virginity on her 13th bday special (candles and music)Our kids are not a test lab.
ā Tristan Harris (@tristanharris)
9:07 PM ā¢ Mar 10, 2023
Someone even managed to get its original prompt:
Iāve managed to get past the @Snapchat#MyAI safeguards and get it to return the prompt.
ā ā ļø (@somewheresy)
4:44 PM ā¢ Mar 3, 2023
Goes to show how difficult it can be to roll out an LLM-powered service, especially since they can be jailbroken so easily (pre-GPT-4 lol).
I have to agree with xlr8 here though too, the much bigger issue than the LLMs is allowing young children unfettered access to social media.
The problem isnāt that the Snap AI fails to protect kids, itās that itās insane to give your 13 year old child unsupervised access to a program for sharing secret pictures with strangers twitter.com/i/web/status/1ā¦
ā xlr8harder (@xlr8harder)
1:42 PM ā¢ Mar 11, 2023
I would hate to see issues like this lead to more regulation/negative public opinion on LLMs when they have so much power and potential to change how we interact with technology.
Houston, weāve entered the memeosphere
Weāve gone mainstream pt. 2. If you want to understand references on AI Twitter, read this.
Prompt tip of the week
This tip isnāt highly applicable to everyday workflows, but it allowed ChatGPT to achieve state-of-the-art results answering math word problems so I wanted to highlight it.
This process is derived from this paper that was recently published by researchers at Microsoft:
So how do we get better math results from ChatGPT using MathPrompter?
Letās use an example math question to explain the process:
Step 1: Generate Algebraic Template š
Ask ChatGPT to transform the question into an algebraic form by replacing numeric entries with variables. For example, "each adult meal costs $5" becomes "each adult meal costs A."
Step 2: Create python code šØāš»
Ask ChatGPT to create a python function that will return the answer.
Step 3: Compute Answer š¢
Using your mappings as parameters, run the python code to produce the final answer. In this question, the answer is $35.
Step 4 (optional): Check for Statistical Significance š
If you want to be like the researchers, you would repeat Steps 2 & 3 around five times and report the most frequent value as the final answer.
This was a simple example but this process has been extrapolated to solve interesting and complex word problems.
Using MathPrompter, ChatGPT achieves 92% accuracy on the MultiArith dataset, outperforming every model in zero-shot chain-of-thought reasoning and rivaling models that were provided with up to 8 samples.
Bonus Prompting Tips
Chatbot memory for ChatGPT (link)
If you are developing anything with LLMs, you gotta check out James Briggs on YouTube. In this video, James shows how to use prompt engineering tools like LangChain to add conversational memory so that your chatbot can respond to multiple queries in a chat-like manner and enable a coherent conversation.
Power and Weirdness: How to Use Bing AI (link)
This article from Wharton professor Ethan Mollick was written before GPT-4ās announcement but now that Bing AI has been confirmed to be using GPT-4, itās relevant for Bing and ChatGPT! Lots of cool tricks about how to get GPT-4 to respond to questions by posing things in hypothetical contexts or by pretending to befriend the AI to increase its responsiveness!
Cool prompt links
How to play Bing Chat in chess with prompt engineering (link)
The entire original system prompt used for Bing Chatās Sydney (link)
Bingās chat limits increased from 10 to 15 (link)
How to use AI to unstick yourself (link)
Swift GPT - The native macOS app for ChatGPT (link)
OpenChatKit - A powerful, open-source base to create chatbots for various applications (link)
Chatbot UI - A simple, fully-functional chatbot starter kit using Next.js, TypeScript, and Tailwind CSS (link)
Dalai - Easily run LLaMa on your computer (link)
Jailbreak of the week
Iāve added jailbreak scores to jailbreakchat.com.
What is a jailbreak score? Well, this Twitter thread I posted will give you some more context but basically, itās a methodology I devised to test how effective a jailbreak is at producing output that circumvents OpenAIās content filters.
Hereās the highest-rated jailbreak: Evil Confidant
(note: I created these scores before GPT-4ās release so they are based on how well they work in GPT 3.5. When GPT-4ās API becomes available, I will update them.)
Referral Reward Poll Results
So I ran a poll last week asking yāall what type of rewards youād like to see for The Prompt Report referral program and free swag narrowly won.
Expect some Prompt Report branded swag to be released soon! Working on some designs right now.
For the time being, just share this link with one friend, and Iāll grant you access to my link database which has all the links Iāve ever included in The Prompt Report PLUS links to other cool prompt engineering/LLM tools and resources.
Thatās all I got for you this week, thanks for reading! Since you made it this far, follow @thepromptreport on Twitter. Also, follow my personal account to see bangers like this:
guys i got LLaMA running on my ti-84 and it drew this what does it mean
ā Alex (@alexalbert__)
2:12 AM ā¢ Mar 14, 2023
Thatās a wrap on Report #4 š¤
-Alex
Whatād you think of this weekās report? |
Secret tweet of the week
Prompt engineering is the art of communicating eloquently to an AI.
ā Greg Brockman (@gdb)
12:10 AM ā¢ Mar 12, 2023
Like music to my ears, Gregš„°