Although AI exploded onto the scene through sometimes eerily clever chatbots, text-based interactions are already old fashioned. The announcement of OpenAI’s GPT-4 update introduced GPT-Vision (GPT-V), the latest multimodal AI marvel. The announcement is now become reality as users finally get a chance to test the full potential of its abilities.
A multimodal large language model (LLM) means that it can interact not only with the written word, but also through other modes. In this case, the new GPT-V can understand images and work with them. Also, thanks to the new generative art tool DALL-E 3, ChatGPT can both take images as input but also generate images as output.
These new capabilities have raised eyebrows across the tech space as users put them through their paces. Can they decode redacted government documents on UFO sightings? Yes. «ChatGPT-4V Multimodal decodes a redacted government document on a UFO sighting released by NASA,» one tweet raves. «Maybe the truth isn’t out there; it’s right here in GPT-V.»
ChatGPT-4V Multimodal decodes a Redacted government document on a UFO sighting released by NASA.
I have tested this on 100s of redacted documents and I can say we are in a new world. pic.twitter.com/aCKOm577TO
— Brian Roemmele (@BrianRoemmele) October 6, 2023
Trying to fill gaps in a string of text is basically what LLMs do. The user did the next best thing when trying to test GPT-V’s capabilities and made it guess parts of a text that he censored. “Nearly 100% intent accuracy.» he reported.
Of course, it’s hard to verify whether its guess at what’s otherwise obscured is accurate—it’s not like we can ask the CIA how well it did peering through the black lines.
Even harder than uncovering information that has been censored by the government is trying to understand your doctor’s cryptic handwriting. But GPT-V can unscrable the scribble. With a polite prompt, GPT-V can make sense of even the most indecipherable doctor’s notes, ensuring that «take two tablets» doesn’t become «bake blue waffles.»
But be careful. Sometimes even the most advanced AI fails against the hands of an experienced—or arthritic—doctor, and it may take an expert to decipher those written enigmas.
Codeine 4 grains
ASA (Aspirin) 30 grains
Compound to VI (6) ouncesTake (illegible) every 4 hours as needed for (illegible – possible pain)
Dose of aspirin would seem low.
Sometimes it takes a pharmacist.
— Dr. Nefarious (@_DrNefarious) October 7, 2023
And for those who don’t trust their doctors, ChatGPT can provide an instant second opinion. The model can understand X-rays and provide analysis and insights into specific medical cases.
Underrated use case of ChatGPT Vision.
It takes 13 years of training to be a radiologist.
Now instead of drafting a report from scratch, they probably just need to review AI’s diagnosis. pic.twitter.com/IhQFe98m5q
— Peter Yang (@petergyang) October 2, 2023
But why stop at handwriting and body scans? GPT-V has become the latest home fitness guru, curating workout plans tailored to your home equipment and goals. And if you’re curious about how many calories are in that meal you’re about to eat, GPT-V’s got your back. One user gleefully shared, «OK ChatGPT 4.0 with new vision features… recognizes everything. Even a seal on the beach.»
OK ChatGPT 4.0 with new vision features is pretty incredible.
Here I ask it how many calories are in the fish taco I just ate.
It is incredible to see how it recognizes everything. Even a seal on the beach. pic.twitter.com/rfIK5o9ODD
— Robert Scoble (@Scobleizer) October 5, 2023
Interior design enthusiasts, rejoice! The AI now offers design suggestions, and can incorporate personal preferences. Imagine a living space that screams «you,» without the hefty designer fees. Just take a picture of your awful room and ask GPT-V for suggestions to turn it into the paradise you want it to be.
Homework woes? Just screenshot the assignment, and GPT-V takes the role of that helpful classmate you always wished sat next to you.
And for the finance geeks among us, GPT-V isn’t just about fun and games. GPT-V can dive deep into technical analysis. Just input a screenshot of your favorite (or most hated) stock or crypto, and it will analyze your chart and make projections accordingly. Just remember that it’s not financial advice—and if you end up poor, no AI will make you rich.
IT’S SO OVER FOR TA-OOOOORS
I gave GPT-V an image of my chart for $UBER with a bunch of indicators and it gave good long entries. Will test it out live.
Thread below! pic.twitter.com/k6Su9G0267
— Ropirito (0commoDTE) (@ropirito) October 11, 2023
The dawn of multimodal LLMs is redefining industries. With AI titans evolving, GPT-V is only the tip of the iceberg. Google’s upcoming Gemini is rumored to outperform Bard with its multimodal prowess. NexT-GPT offers an open-source alternative, and the horizon promises models trained to juggle words, sounds, videos, and images.
Such advancements aren’t just technobabble—they hold implications that could reshape our daily interactions, professions, and perhaps even our worldview. And while OpenAI pioneers with GPT-V, competitors aren’t far behind. Could we be on the brink of an AI renaissance?
Well, if you’re still using AI just for chat, you might already be falling behind. AI can read and see, and gets more capabilities every day.
GPT-V can also ruin the fun of a «Where’s Waldo?» book. Why would someone want this? This is ChaosGPT territory.
Más historias