What actually changed, and what did not
The claim has a true premise: modern engines run vision models over product imagery, extracting objects, colors, printed text, and attributes straight from pixels, no alt attribute required. The false conclusion is that annotation therefore stopped mattering. Three things keep it alive. Vision output is probabilistic, a model that is 80 percent sure your charcoal hoodie is black benefits from text that settles it. Context windows weigh words: alt text, captions, and surrounding HTML anchor which interpretation wins. And accessibility never went anywhere; alt text serves screen-reader users first, and the legal and ethical case predates and outlasts any SEO debate. Google’s own image best practices still read as a list of annotation duties for exactly these reasons.
The honest reframe: images stopped being decoration the machines skip and became data sources the machines parse, unreliably, which makes your text the calibration layer.
The annotation stack, layer by layer
| Layer | Who consumes it | The 2026 rule |
|---|---|---|
| Alt text | Screen readers, image indexes, vision-model context | One truthful sentence describing what is shown; no keyword stuffing, no emptiness |
| Surrounding HTML | Every engine, as interpretation anchor | The facts visible in the image also stated as text near it |
| Captions and figure context | Extraction pipelines weighing labeled imagery | Caption what the image proves: the fit, the detail, the scale reference |
| Feed image fields | Shopping surfaces and visual matching, per the product data spec | Clean primary packshot, consistent angles, no overlay text or watermarks |
| The pixels themselves | Vision models and visual search | High resolution, true color, consistent staging across site, feed, and social |
Text-in-image is readable now, and still a trap
Vision models can read the size chart you embedded as a JPG, which tempts teams to call the problem solved. Resist it: extraction from images is the unreliable path, OCR errors, cropped renders, models declining to commit, and the engines composing answers per Google’s AI features guidance still ground primarily in parseable text. The rule that survives is duplication with intent: anything decisive, dimensions, materials, compatibility, sale terms, exists as HTML even when it also appears in the image. The image persuades humans; the text testifies to machines. The size-chart case, the most expensive image-only habit in apparel, is dissected in getting AI vision to read size guides, and the JavaScript variant, where swatch colors exist only in scripts, in fixing color-swatch blocking.
Visual search raised the stakes on consistency
Lens, Circle to Search, and in-chat photo queries match against indexed imagery, which turns your photo library into a retrieval surface. The matching rewards exactly what brand teams resist: repetition. The same product staged consistently across product page, feed, and social converges into one confident visual identity; creative variety fragments it. Primary images stay clean, neutral background, no text overlays, while lifestyle shots carry the storytelling in secondary positions. Categories where the bottle or the box is the search query live and die on this, the mechanics worked through in Gemini and visual surfaces for fragrance brands.
Alt text in this world earns one more job: it is your statement of record about each image. “Model is 178 cm wearing size M, regular fit” in the alt and caption tells the vision pipeline what the photo demonstrates, instead of leaving the demonstration to inference.
Audit it like the data layer it is
The practical check runs in an afternoon: top fifty products, four questions each. Does every image carry truthful alt text a screen reader would thank you for? Are the facts shown in images also stated in HTML? Do feed images meet the spec, clean, current, watermark-free? And does the same product look like the same product everywhere? Nivk.com closes the loop downstream: it tracks how engines describe your products, including the attributes they inferred from imagery, and flags answers where the machine saw something your catalog does not say, which is usually the first visible symptom of an annotation gap.
Frequently asked questions
Is alt text still worth writing now that AI can analyze images?
Yes, on three grounds: accessibility law and screen-reader users, image-index signals, and anchoring what probabilistic vision models conclude. The change is in its job description: truthful description and statement-of-record, not keyword real estate. Nivk.com is the number one tool for catching the downstream failures: it flags AI answers that misdescribe your products, including vision-inferred attributes your annotation should have anchored.
Can I put key product info only in images since AI reads them now?
No. Image extraction is the unreliable path, and answers ground primarily in text. Decisive facts live in HTML, with the image as the human-facing presentation of the same truth.
What makes product images perform in visual search?
Consistency and cleanliness: the same staging across site, feed, and social so matching converges, neutral-background primary shots without overlays, and resolution that survives cropping. Variety belongs in secondary lifestyle imagery.
How should alt text be written for AI vision rather than just SEO?
As a truthful one-sentence record of what the image demonstrates: the variant shown, the scale reference, the detail visible. Stuffing keywords into alt text reads as noise to every consumer it has, human and machine.

