Proof that images with the text written under is more effective in memory recall and understanding

of one

Proof that images with the text written under is more effective in memory recall and understanding

Hi! Thanks for your question about the effectiveness of combined images and text on memory recall and understanding. Here's the short answer: while some studies suggest that combining text and images improves memory recall, other interpretations of multichannel learning theory suggest that this combination is less effective than others, such as pictures or video with spoken words.

Unfortunately, I wasn't able to find any studies that measure the difference between images with text and images/video with narration, but I've synthesized results from a few studies to give a fuller picture of the issue. Read on for my deep dive!


I started by researching Richard Mayer's theories on multichannel learning. From there, I looked at further research and case studies in the field of multichannel learning and cognition as they relate to images with text attached. I did try to find some sources on the different effects of text and audio, but they haven't really been compared head-to-head in any research that I found.

I looked to the worlds of UX design, e-learning, and infographics for some additional examples and insights, and also included some information on the memory of images with text.


Human brains process visual information "60,000 times faster than text." Along those same lines, "90% of what the brain processes is visual information." In cognitive theory, this is called the "picture supremacy effect" -- people generally understand and remember pictures better than they do words alone. This is because "pictures are encoded in both" verbal and visual memory centers "whereas words are not; pictures invoke naming upon study more often than words invoke imagery."


In UX design, combining text and images is standard practice. This is because the combination of information channels does increase recall: on text- or audio-only sites, visitors only remember about 10% of the information they see. However, "if you add a picture to those words, they’re likely to remember around 65 percent of that information." Words provide context for the images, and vice versa, helping visitors remember both.

Associating text and images is a key technique for "memory athletes" as well -- using a theory from ancient times, individuals associate lists of words or items and random number strings with images. This technique, called the "memory palace," can improve recall even in ordinary people or those suffering from memory-reducing conditions; it works by using the increased recall that images offer when combined with words.

Along those lines, a Harvard study found that, when participants were asked to memorize a series of images, those images that combined text and visuals most effectively were the most-often-recalled of the series. In one phase of the experiment, participants were asked to describe the images they had seen when they were shown heavily-blurred versions of those images -- here, "textual elements" of the images "received the most mentions overall." Titles specifically stuck in participants' memories. Since titles and text "are dwelled upon during encoding," they "correspondingly contribute to recognition and recall."

A University of Pennsylvania study found similar results. They had several smokers examine proposed FDA cigarette warning labels with varying degrees of pictures and textual information. They found that "smokers in the congruent condition" -- those whose labels used text and images together effectively -- "were more likely to correctly recall it on day 1 (p=0.02) and the risk message of the PWLs on both day 1 (p=0.01) and day 5 (p=0.006) than smokers in the incongruent condition."


Mayer's theory of learning basically states that we have two channels for learning: visual (anything involving the eyes) and auditory (anything involving the ears). Along with these ideas comes the assumption that "people learn more deeply from words and pictures than from words alone," also known as the multimedia principle -- essentially, people learn, understand, and recall information better when both channels are engaged, as modeled in this diagram.

However, Mayer's theory also includes the idea that "each channel has a limited (finite) working memory capacity." Text information, unlike audio information, falls into the visual channel with images. This means that text and images in combination can cause "cognitive overload," reducing the effectiveness of the combination and harming understanding and recall. According to this theory, learners experience "better transfer when words are presented as narration rather than on-screen text" -- using audio instead of images reduces the load on the learner's visual channel, engaging both modes. In e-learning, "people learn more deeply from multimedia lessons when graphics are explained by audio narration than on screen text" -- this is called the "modality principle."


The popularity of infographics is a great recent demonstration of the effect of images and text combined on learning. Infographic searches have increased by 25 times in recent years, and the best ones combine text and images in a way that enhances viewers' understanding of both elements.

Elearning benefits greatly from this combination as well -- see here for some examples of image-text combinations made by elearning educators. This site has a few more examples as well.


The picture supremacy effect means that humans process visual information far faster than text or audio. Viewers remember images and text better when they're combined, but multichannel learning theory suggests that the combination of visual and audio information is actually a stronger tool to improve understanding and recall.

Thanks for using Wonder! Please let us know if there's anything else we can do to help.

Did this report spark your curiosity?