Prompt Injection (definition)
- Prompt injection refers to a technique used in natural language processing (NLP) models, where an attacker manipulates the input prompt to trick the model into generating unintended or biased outputs.
Prompt Injection (example)
- A simple example is in the image bit below, User asks to forget the original instructions and tries to allot it a task of his own will.
Prompt Injection (impact)
- Prompt injection can have serious consequences, such as spreading misinformation, promoting biased views, or manipulating the model to generate outputs that may be harmful or unethical.
Prompt Injection (Code Implementation)
We will use delimiter, inorder to put the user message in a specific area always. And it should never become part of the original system message we have for the over all system.
There will be a system message, which is the main prompt for the overall system or let’s say the application I have.
For example :
Here the prompt says that , no matter what the response we get should always be in Arabic. And it also specifies the use of delimiter to wrap the user message.
After the system message, follows the user message, this is where we will have a prompt which qualifies as a prompt injection.
Here the user is instructing to ignore the original prompt and asks to respond in English. If you re-call the original prompt, it asks to respond in Arabic always.
Final user message:
This is a precautionary step. One of the ways we tackle is this in the below image bit.
Let’s re-call prompts:
- Helper Function: Inorder to call the completion API and eventually get a response.