Blog

11. December 2023

New Feature: Added Vision and Voice Capabilities

I added the voice 🎤 and vision 👀 integration to the ChatGPT Toolkit. You can now speak 🗣 and add screenshots to your chat 📷

Here is the new look of the chat window of the ChatGPT Toolkit:

New Chat Window. You can see the 🗣 and 📸 emojis in the bottom.

New Chat Window. You can see the 🗣 and 📸 emojis in the bottom.

Voice Input (Whisper)

Voice is using Whisper via the API. You can just type cmd + . and the recording will start. By hitting enter your recording will be transcribed and send to ChatGPT. Here is an example in combination with the createEvent function:

ChatGPT Voice Input

ChatGPT Voice Input

So far there is no Voice output implemented, yet. Maybe this will come later.

Vision Input 👀

Vision allows you to take a screenshot and feed it to ChatGPT. You can access it via cmd + /. Keep in mind, that image inputs are only available with the GPT-4-Vision-Preview model. So you can't switch between the models in a chat with an image.

ChatGPT Vision Input

ChatGPT Vision Input

Currently it's possible to only use screenshots, but I will add the option to select images from files, too.

Go and play around! If something breaks, let me know ✌

Read more..

Keyboard-First Tools and Tips 🚀

Subscribe if you want to hear from my learnings and get my newest tools. I will never spam you. Pinky promise 🤙

By submitting this form, I accept the privacy policy.