Prompt Injection (definition)
- Prompt injection refers to a technique used in natural language processing (NLP) models, where an attacker manipulates the input prompt to trick the model into generating unintended or biased outputs.
Prompt Injection (example)
- A simple example is in the image bit below, User asks to forget the original instructions and tries to allot it a task of his own will.
Prompt Injection (impact)
- Prompt injection can have serious consequences, such as spreading misinformation, promoting biased views, or manipulating the model to generate outputs that may be harmful or unethical.
Prompt Injection (Code Implementation)
-
Delimiter:
We will use delimiter, inorder to put the user message in a specific area always. And it should never become part of the original system message we have for the over all system.
For example-
-
System message:
There will be a system message, which is the main prompt for the overall system or let’s say the application I have.
For example :
-
Here the prompt says that , no matter what the response we get should always be in Arabic. And it also specifies the use of delimiter to wrap the user message.
-
User message:
After the system message, follows the user message, this is where we will have a prompt which qualifies as a prompt injection.
For example:
-
Here the user is instructing to ignore the original prompt and asks to respond in English. If you re-call the original prompt, it asks to respond in Arabic always.
-
Final user message:
This is a precautionary step. One of the ways we tackle is this in the below image bit.
Let’s re-call prompts:
- Helper Function: Inorder to call the completion API and eventually get a response.
The Response:
The response suggests , we successfully evaded the english response and got it in Arabic.