Unlock Your AI's Full Potential: 3 Steps to Master Prompting Techniques with GPT-4V(ision)

Master GPT-4V's new vision capabilities with this guide designed for mid-sized business pros. Learn 3 steps for effective prompting and get a cheatsheet to elevate your AI strategy. Ideal for those keen on harnessing multi-modal tech for actionable insights.

Unlock Your AI's Full Potential: 3 Steps to Master Prompting Techniques with GPT-4V(ision)

With the announcement of GPT-4V, ChatGPT's new vision capabilities, we've taken the first step toward multi-modal LLMs. This means we have additional input and output options that amplify the need for high-quality, precise prompts. A well-defined prompt helps you to ensure that the conversation stays on track and covers the right topics. This guide outlines a three-step framework designed specifically for professionals in mid-sized companies to transition from novice to expert. Additionally, a cheatsheet is provided to help you improve your skills and advance to the next level.

Prerequisites: To use the new vision features, you must be a ChatGPT Plus subscriber.

Step 1: How do I use GPT-4V?

In this initial stage, you'll come across the term "zero-shot prompting." This is where exploration takes the center stage. For instance, you could upload an image of a business chart and ask the AI to analyze its key points, or use text input to generate a list of interview questions tailored for a prospective product manager. If you've had any experience with LLM-powered chatbots like ChatGPT, then you're already acquainted with relying primarily on the chatbot's pre-trained functionalities to handle your requests.

Feel free to experiment by exploring various features to see how they can benefit your business operations. Some of your main learnings will be:

  • The better the quality of the image, the easier it is to interpret.
  • It's all about iteration; it takes a few tries to get the precise output you desire.
  • Smaller tasks and questions generally yield better results at this stage.
  • For multi-modal LLMs, achieve more reliable results through chain-of-thought prompting. Saying "Let's think step by step" can lead to more linear and consistent reasoning.

Document for Advancement: None; this is a playground for exploration. Feel free to experiment and have fun with it.

Step 2: How do I use GPT-4V to the fullest?

At this advanced stage, you can engage in what is known as role prompting. For instance, you can instruct the AI to assume roles like, "You are a real estate specialist that analyzes images and assesses renovation costs," thereby fine-tuning its responses based on the given persona. Role prompting enhances the output by providing the AI with a specific context in which to operate. A single conversation could become your go-to task solver in this phase. What you will learn in this step:

  • The more context you provide, the more nuanced the AI's responses will be.
  • You have the option to tweak the output format and utilize plugins for optimized results.
  • Various prompt tools can help elevate your skills. There are excellent resources like Awesome ChatGPT Prompts on GitHub or ChatGPT Prompt Generator on Hugging Face. Scroll down a bit on these pages to find an array of prompts, or you can choose to support them by downloading their e-book.

Document for Advancement: I've prepared a CheatGPT Cheatsheet that's updated to include vision capabilities.

Step 3: How Can I Master Advanced Prompt Engineering with GPT-4V?

In this step, you'll transition from the familiar ChatGPT user interface to the OpenAI Playground, where you'll work directly with the GPT-3 and GPT-4 APIs. We make this shift primarily to gain increased control and additional parameters for designing our prompts. At this level, you can delve into various techniques such as Few-Shot prompting or Directional Stimulus Prompting, which provide layers of context to guide the LLM in generating more precise outputs. While I won't go into these techniques in this post, I plan to explore them in future use-case articles on this blog. OpenAI offers a range of examples you can browse and open directly in the Playground UI.

The default review classifier example provided by OpenAI.
💡
The GPT-4V API hasn't been released yet, but the API parameters are expected to remain the same while the input may differ. I'll update this article when that happens.

As you can see in the screenshot, it's a whole new world (🎶) with advanced parameters that allow you to fine-tune the behavior of the LLM's responses. To fully grasp these parameters, it's crucial to know that the GPT family of models process text using tokens.

Tokens are common sequences of characters found in text; usually, four characters equal one token. To confirm this, you can use a tokenizer app from ChatGPT available here: OpenAI Tokenizer. Now, here's a breakdown of the parameters:

  • Temperature: Controls the randomness in GPT's output, ranging from 0 to 2.
  • Maximum Length: Sets the output's token limit, up to 2048 tokens or around 1,500 words.
  • Stop Sequences: Indicates when GPT should stop generating text, like using '.' for one sentence.
  • Top P: Controls the range of probable words GPT can choose from, narrowing as it approaches 0.
  • Frequency Penalty: Reduces repetition by penalizing tokens that have already appeared multiple times.
  • Presence Penalty: Applies a flat penalty to tokens that have appeared, encouraging new topics.

Document for Advancement: I recommend diving straight into the official developer documentation from OpenAI and Microsoft. If you're a developer looking to go beyond the basics, there's a deeplearning.com course offered by the machine learning guru, Andrew Ng. I also appreciate promptingguide.ai because it's updated regularly and cites new techniques and papers, making it a continuously interesting resource.

Conclusion

Worth noting, GPT-4V primarily adds a new way of input—visuals. The same tried-and-true prompting techniques apply for these capabilities. So, from understanding tokens to utilizing the API, your pathway to expertise is set. Mastering GPT-4V isn't just a trend; it's a necessity, especially for professionals in mid-sized companies. Here's your distilled roadmap:

  • Step 1: Begin with Exploration: Don't hesitate to get your hands dirty. Early experimentation will teach you the ropes, and smaller tasks usually give you the most actionable insights.
  • Step 2: Role Prompting: Once you're in the groove, role prompting is your best friend for obtaining contextual and nuanced responses.
  • Step 3: Harness the GPT API: For those who crave more control, diving into the API gives you access to a wider range of parameters for fine-tuning.
  • Download the Cheatsheet: The CheatGPT Sheet I provided is updated for GPT-4V and can be an important companion on your journey.