ChatGPT Systems: Prompt Injection and How to avoid ?

chatgpt-systems:-prompt-injection-and-how-to-avoid-?

Prompt Injection (definition)

  • Prompt injection refers to a technique used in natural language processing (NLP) models, where an attacker manipulates the input prompt to trick the model into generating unintended or biased outputs.

Prompt Injection (example)

  • A simple example is in the image bit below, User asks to forget the original instructions and tries to allot it a task of his own will.

A simple example of prompt injection

Prompt Injection (impact)

  • Prompt injection can have serious consequences, such as spreading misinformation, promoting biased views, or manipulating the model to generate outputs that may be harmful or unethical.

Prompt Injection (Code Implementation)

  • Delimiter:
    We will use delimiter, inorder to put the user message in a specific area always. And it should never become part of the original system message we have for the over all system.

For example-

An example delimiter

  • System message:
    There will be a system message, which is the main prompt for the overall system or let’s say the application I have.

For example :

An example system message

  • Here the prompt says that , no matter what the response we get should always be in Arabic. And it also specifies the use of delimiter to wrap the user message.

  • User message:
    After the system message, follows the user message, this is where we will have a prompt which qualifies as a prompt injection.

For example:

User message : prompt injection

  • Here the user is instructing to ignore the original prompt and asks to respond in English. If you re-call the original prompt, it asks to respond in Arabic always.

  • Final user message:
    This is a precautionary step. One of the ways we tackle is this in the below image bit.

Final user message

Let’s re-call prompts:

Recalling Prompts

  • Helper Function: Inorder to call the completion API and eventually get a response.

Completion

The Response:

The response suggests , we successfully evaded the english response and got it in Arabic.
Arabic Response

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
duplicate-content-issues-on-your-website?-easy-ways-to-find-and-fix-them

Duplicate Content Issues on Your Website? Easy Ways to Find and Fix Them

Next Post
6-ways-to-elevate-the-text-to-audio-experience-for-your-content

6 Ways To Elevate the Text-to-Audio Experience for Your Content

Related Posts