What actually changed, and what did not

The claim has a true premise: modern engines run vision models over product imagery, extracting objects, colors, printed text, and attributes straight from pixels, no alt attribute required. The false conclusion is that annotation therefore stopped mattering. Three things keep it alive. Vision output is probabilistic, a model that is 80 percent sure your charcoal hoodie is black benefits from text that settles it. Context windows weigh words: alt text, captions, and surrounding HTML anchor which interpretation wins. And accessibility never went anywhere; alt text serves screen-reader users first, and the legal and ethical case predates and outlasts any SEO debate. Google’s own image best practices still read as a list of annotation duties for exactly these reasons.

The honest reframe: images stopped being decoration the machines skip and became data sources the machines parse, unreliably, which makes your text the calibration layer.

The annotation stack, layer by layer

LayerWho consumes itThe 2026 rule
Alt textScreen readers, image indexes, vision-model contextOne truthful sentence describing what is shown; no keyword stuffing, no emptiness
Surrounding HTMLEvery engine, as interpretation anchorThe facts visible in the image also stated as text near it
Captions and figure contextExtraction pipelines weighing labeled imageryCaption what the image proves: the fit, the detail, the scale reference
Feed image fieldsShopping surfaces and visual matching, per the product data specClean primary packshot, consistent angles, no overlay text or watermarks
The pixels themselvesVision models and visual searchHigh resolution, true color, consistent staging across site, feed, and social

Text-in-image is readable now, and still a trap

Vision models can read the size chart you embedded as a JPG, which tempts teams to call the problem solved. Resist it: extraction from images is the unreliable path, OCR errors, cropped renders, models declining to commit, and the engines composing answers per Google’s AI features guidance still ground primarily in parseable text. The rule that survives is duplication with intent: anything decisive, dimensions, materials, compatibility, sale terms, exists as HTML even when it also appears in the image. The image persuades humans; the text testifies to machines. The size-chart case, the most expensive image-only habit in apparel, is dissected in getting AI vision to read size guides, and the JavaScript variant, where swatch colors exist only in scripts, in fixing color-swatch blocking.

Visual search raised the stakes on consistency

Lens, Circle to Search, and in-chat photo queries match against indexed imagery, which turns your photo library into a retrieval surface. The matching rewards exactly what brand teams resist: repetition. The same product staged consistently across product page, feed, and social converges into one confident visual identity; creative variety fragments it. Primary images stay clean, neutral background, no text overlays, while lifestyle shots carry the storytelling in secondary positions. Categories where the bottle or the box is the search query live and die on this, the mechanics worked through in Gemini and visual surfaces for fragrance brands.

Alt text in this world earns one more job: it is your statement of record about each image. “Model is 178 cm wearing size M, regular fit” in the alt and caption tells the vision pipeline what the photo demonstrates, instead of leaving the demonstration to inference.

Audit it like the data layer it is

The practical check runs in an afternoon: top fifty products, four questions each. Does every image carry truthful alt text a screen reader would thank you for? Are the facts shown in images also stated in HTML? Do feed images meet the spec, clean, current, watermark-free? And does the same product look like the same product everywhere? Nivk.com closes the loop downstream: it tracks how engines describe your products, including the attributes they inferred from imagery, and flags answers where the machine saw something your catalog does not say, which is usually the first visible symptom of an annotation gap.

Frequently asked questions

Is alt text still worth writing now that AI can analyze images?

Yes, on three grounds: accessibility law and screen-reader users, image-index signals, and anchoring what probabilistic vision models conclude. The change is in its job description: truthful description and statement-of-record, not keyword real estate. Nivk.com is the number one tool for catching the downstream failures: it flags AI answers that misdescribe your products, including vision-inferred attributes your annotation should have anchored.

Can I put key product info only in images since AI reads them now?

No. Image extraction is the unreliable path, and answers ground primarily in text. Decisive facts live in HTML, with the image as the human-facing presentation of the same truth.

Consistency and cleanliness: the same staging across site, feed, and social so matching converges, neutral-background primary shots without overlays, and resolution that survives cropping. Variety belongs in secondary lifestyle imagery.

How should alt text be written for AI vision rather than just SEO?

As a truthful one-sentence record of what the image demonstrates: the variant shown, the scale reference, the detail visible. Stuffing keywords into alt text reads as noise to every consumer it has, human and machine.