Do Masks Work? A Review Of The Evidence

Authored by Jeffrey Anderson via City-Journal.org,

“Seriously people—STOP BUYING MASKS!” So tweeted then–surgeon general Jerome Adams on February 29, 2020, adding, “They are NOT effective in preventing general public from catching #Coronavirus.”

Two days later, Adams said, “Folks who don’t know how to wear them properly tend to touch their faces a lot and actually can increase the spread of coronavirus.”

Less than a week earlier, on February 25, public-health authorities in the United Kingdom had published guidance that masks were unnecessary even for those providing community or residential care:

“During normal day-to-day activities facemasks do not provide protection from respiratory viruses, such as COVID-19 and do not need to be worn by staff.”

About a month later, on March 30, World Health Organization (WHO) Health Emergencies Program executive director Mike Ryan said that

“there is no specific evidence to suggest that the wearing of masks by the mass population has any particular benefit.”

He added,

“In fact there’s some evidence to suggest the opposite” because of the possibility of not “wearing a mask properly or fitting it properly” and of “taking it off and all the other risks that are otherwise associated with that.”

Surgical masks were designed to keep medical personnel from inadvertently infecting patients’ wounds, not to prevent the spread of viruses. Public-health officials’ advice in the early days of Covid-19 was consistent with that understanding. Then, on April 3, 2020, Adams announced that the CDC was changing its guidance and that the general public should hereafter wear masks whenever sufficient social distancing could not be maintained.

Fast-forward 15 months. Rand Paul has been suspended from YouTube for a week for saying, “Most of the masks you get over the counter don’t work.” Many cities across the country, following new CDC guidance handed down amid a spike in cases nationally caused by the Delta variant, are once again mandating indoor mask-wearing for everyone, regardless of inoculation status. The CDC further recommends that all schoolchildren and teachers, even those who have had Covid-19 or have been vaccinated, should wear masks.

The CDC asserts this even though its own statistics show that Covid-19 is not much of a threat to schoolchildren. Its numbers show that more people under the age of 18 died of influenza during the 2018–19 flu season—a season of “moderate severity” that lasted eight months—than have died of Covid-19 across more than 18 months. What’s more, the CDC says that out of every 1,738 Covid-19-related deaths in the U.S. in 2020 and 2021, just one has involved someone under 18 years of age; and out of every 150 deaths of someone under 18 years of age, just one has been Covid-related. Yet the CDC declares that schoolchildren, who learn in part from communication conveyed through facial expressions, should nevertheless hide their faces—and so should their teachers.

How did mask guidance change so profoundly? Did the medical research on the effectiveness of masks change—and in a remarkably short period of time—or just the guidance on wearing them?

Since we are constantly told that the CDC and other public-health entities are basing their recommendations on science, it’s crucial to know what, specifically, has been found in various medical studies. Significant choices about how our republic should function cannot be made on the basis of science alone—they require judgment and the weighing of countless considerations—but they must be informed by knowledge of it.

In truth, the CDC’s, U.K.’s, and WHO’s earlier guidance was much more consistent with the best medical research on masks’ effectiveness in preventing the spread of viruses. That research suggests that Americans’ many months of mask-wearing has likely provided little to no health benefit and might even have been counterproductive in preventing the spread of the novel coronavirus.

It’s striking how much the CDC, in marshalling evidence to justify its revised mask guidance, studiously avoids mentioning randomized controlled trials. RCTs are uniformly regarded as the gold standard in medical research, yet the CDC basically ignores them apart from disparaging certain ones that particularly contradict the agency’s position. In a “Science Brief” highlighting studies that “demonstrate that mask wearing reduces new infections” and serving as the main public justification for its mask guidance, the CDC provides a helpful matrix of 15 studies—none RCTs. The CDC instead focuses strictly on observational studies completed after Covid-19 began. In general, observational studies are not only of lower quality than RCTs but also are more likely to be politicized, as they can inject the researcher’s judgment more prominently into the inquiry and lend themselves, far more than RCTs, to finding what one wants to find.

A particular favorite of the CDC’s, so much so that the agency put out a glowing press release on it and continues to give it pride of placement in its brief, is an observational (specifically, cohort) study focused on two Covid-positive hairstylists at a beauty salon in Missouri. The two stylists, who were masked, provided services for 139 people, who were mostly masked, for several days after developing Covid-19 symptoms. The 67 customers who subsequently chose to get tested for the coronavirus tested negative, and none of the 72 others reported symptoms.

This study has major limitations. For starters, any number of the 72 untested customers could have had Covid-19 but been asymptomatic, or else had symptoms that they chose not to report to the Greene County Health Department, the entity doing the asking. The apparent lack of spread of Covid-19 could have been a result of good ventilation, good hand hygiene, minimal coughing by the stylists, or the fact that stylists generally, as the researchers note, “cut hair while clients are facing away from them.” The researchers also observe that “viral shedding” of the coronavirus “is at its highest during the 2 to 3 days before symptom onset.” Yet no customers who saw the stylists when they were at their most contagious were tested for Covid-19 or asked about symptoms. Most importantly, this study does not have a control group. Nobody has any idea how many people, if any, would have been infected had no masks been worn in the salon. Late last year, at a gym in Virginia in which people apparently did not wear masks most of the time, a trainer tested positive for the coronavirus. As CNN reported, the gym contacted everyone whom the trainer had coached before getting sick—50 members in all—“but not one member developed symptoms.” Clearly, this doesn’t prove that not wearing masks prevents transmission.

Another CDC-highlighted study, by Rader et al., invited people across the country to answer a survey. The low (11 percent) response rate—including about twice as many women as men—indicated that the mix of respondents was hardly random. The study found that “a high percentage of self-reported face mask-wearing is associated with a higher probability of transmission control,” and “the highest percentage of reported mask wearers” are found, unsurprisingly, “along the coasts and southern border, and in large urban areas.” However, as the researchers note, “It is difficult to disentangle individuals’ engagement in mask-wearing from their adoption of other preventive hygiene practices, and mask-wearing might serve as a proxy for other risk avoidance behaviors not queried.” Moreover, achieving greater “transmission control” is not remotely the same thing as ensuring fewer deaths. For example, per capita, Utah is in the top ten in the nation in Covid-19 cases and the bottom ten in Covid-19 deaths, while Massachusetts is in the bottom half in cases and the top five in deaths.

An additional observational study, but one that the CDC does not reference in its brief, is a large, international Bayesian study by Leech, et al. It finds that mask-wearing by 100 percent of the population “corresponds to” a 24.6 percent reduction in transmission of the novel coronavirus. Mask mandates correspond to no decrease in transmission: “For mandates we see no reduction: 0.0 percent.” Like all observational studies, however, this study is ill-equipped to show causation, to separate out the effects of just one variable from among other, frequently related, ones.

Mask supporters often claim that we have no choice but to rely on observational studies instead of RCTs, because RCTs cannot tell us whether masks work or not. But what they really mean is that they don’t like what the RCTs show.

The randomized controlled trial dates, in a sense, to 1747, when Royal Navy surgeon James Lind divided seamen suffering from similar cases of scurvy into six pairs and tried different methods of treatment on each. Lind writes, “The consequence was, that the most sudden and visible good effects were perceived from the use of oranges and lemons.”

The RCT eventually became firmly established as the most reliable way to test medical interventions. The following passage, from Abdelhamid Attia, an M.D. and professor of obstetrics and gynecology at Cairo University in Egypt, conveys its dominance:

The importance of RCTs for clinical practice can be illustrated by its impact on the shift of practice in hormone replacement therapy (HRT). For decades HRT was considered the standard care for all postmenopausal, symptomatic and asymptomatic women. Evidence for the effectiveness of HRT relied always on observational studies[,] mostly cohort studies. But a single RCT that was published in 2002 . . . has changed clinical practice all over the world from the liberal use of HRT to the conservative use in selected symptomatic cases and for the shortest period of time. In other words, one well conducted RCT has changed the practice that relied on tens, and probably hundreds, of observational studies for decades.

A randomized controlled trial divides participants into different groups on a randomized basis. At least one group receives an “intervention,” or treatment, that is generally tested against a control group not receiving the intervention. The twofold strength of an RCT is that it allows researchers to isolate one variable—to test whether a given intervention causes an intended effect—while at the same time making it very hard for researchers to produce their own preferred outcomes.

This is true at least so long as an RCT’s findings are based on “intention-to-treat” analysis, whereby all participants are kept in the treatment group to which they were originally assigned and none are excluded from the analysis, regardless of whether they actually received the intended treatment. Eric McCoy, an M.D. at the University of California, Irvine, explains that intention-to-treat analysis avoids bias and “preserves the benefits of randomization, which cannot be assumed when using other methods of analysis.”

Such other methods of analysis include subgroup, multivariable, and per-protocol analysis. Subgroup analysis is susceptible to “cherry-picking”—as researchers hunt for anything showing statistical significance—or to being swayed by random chance. In one famous example, aspirin was found to help prevent fatal heart attacks, but not in the subgroups where patients’ astrological signs were Gemini or Libra.

“Multivariable analysis,” writes Marlies Wakkee, an M.D. and Ph.D. at Erasmus University Medical Center in the Netherlands, “only adjusts for measured confounding”—that which a researcher decides is worth examining. (Confounders are extra variables that affect the analysis; for example, eating ice cream may be found to correlate with sunburns, but heat is a confounding variable influencing both.) She adds, “This is a significant difference compared to randomized controlled trials, where the randomization process results in an equal distribution of all potential confounders, known and unknown.”

Per-protocol analysis departs from randomization by basically allowing participants to self-select into, or out of, an intervention group. McCoy writes, “Empirical evidence suggests that participants who adhere [to research protocols] tend to do better than those who do not adhere, regardless of assignment to active treatment or placebo.” In other words, per-protocol analysis is more likely to suggest that an intervention, even a fake one, worked. Of these three departures from intention-to-treat analysis, per-protocol analysis is perhaps the most extreme.

With these different methods of analysis in mind, it becomes easier to evaluate the 14 RCTs, conducted around the world, that have tested the effectiveness of masks in reducing the transmission of respiratory viruses. Of these 14, the two that have directly tested “source control”—the oft-repeated claim that wearing a mask benefits others—are a good place to start.

A 2016 study in Beijing by MacIntyre, et al. that claimed to find a possible benefit of masks did not prove very informative, as only one person in the control group—and one in the mask group—developed a laboratory-confirmed infection. Much more illuminating was a 2010 study in France by Canini, et al., which randomly placed sick people, or “index patients,” and their household contacts together into either a mask group or a no-mask control group. The authors “observed a good adherence to the intervention,” meaning that the index patients generally wore the furnished three-ply masks as intended. (No one else was asked to wear them.) Within a week, 15.8 percent of household contacts in the no-mask control group and 16.2 percent in the mask group developed an “influenza-like illness” (ILI). So, the two groups were essentially dead even, with the sliver of an advantage observed in the control group not being statistically significant. The authors write that the study “should be interpreted with caution since the lack of statistical power prevents us to draw formal conclusion regarding effectiveness of facemasks in the context of a seasonal epidemic.” However, they state unequivocally, “In various sensitivity analyses, we did not identify any trend in the results suggesting effectiveness of facemasks.”

With the two RCTs that directly tested source control providing essentially no support for the claim that wearing a mask benefits others, what about RCTs that test the combination of source control and wearer protection? By dividing participants into a hand-hygiene group, a hand-hygiene group that also wore masks, and a control group, three RCTs allow us to see whether the addition of masks (worn both by the sick person and others) provided any benefit over hand hygiene alone.

A 2010 study by Larson, et al. in New York found that those in the hand-hygiene group were less likely to develop any symptoms of an upper respiratory infection (42 percent experienced symptoms) than those in the mask-plus-hand-hygiene group (61 percent). This statistically significant finding suggests that wearing a mask actually undermines the benefits of hand hygiene.

A multivariable analysis of this same study found a significant difference in secondary attack rates (the rate of transmission to others) between the mask-plus-hands group and the control group. On this basis, the authors maintain that mask-wearing “should be encouraged during outbreak situations.” However, this multivariable analysis also found significantly lower rates in crowded homes—“i.e., more crowded households had less transmission”—which tested at a higher confidence level. Thus, to the extent that this multivariable analysis provided any support for masks, it provided at least as much support for crowding.

Two other studies found no statistically significant differences between their mask-plus-hands and hands-only groups. A 2011 study in Bangkok by Simmerman, et al. observed very similar results for both groups. A CDC-funded 2009 study in Hong Kong by Cowling, et al. observed that the hands-only group generally did better than the mask-plus-hands group, but not to a statistically significant degree. Subgroup analysis by Cowling, et al., limited to interventions started within 36 hours of the onset of symptoms, found that the mask-plus-hands group beat the control group to a statistically significant degree in one measure, while the hands-only group beat the control group to a statistically significant degree in two measures. Summarizing this study, Canini writes that “no additional benefit was observed when facemask [use] was added to hand hygiene by comparison with hand hygiene alone.”

So, if masks don’t improve on hand hygiene alone, what about masks versus nothing?

Various RCTs have studied this question, with evidence of masks’ effectiveness proving sparse at best. Aside from a 2009 study in Japan by Jacobs, et al.—which found that those in the mask group were significantly more likely to experience headaches and that “face mask use in health care workers has not been demonstrated to provide benefit”—only two RCTs have produced statistically significant findings in intention-to-treat analysis, and one of those studies contradicted itself.

The previously mentioned 2011 study in Bangkok by Simmerman, et al. found that the secondary attack rate of ILI was twice as high in the mask-plus-hand-hygiene group (18 percent) as in the control group (9 percent), a statistically significant difference. (The ILI rate was 17 percent in the hand-hygiene-only group.) Finding essentially the same thing in multivariable analysis, the researchers wrote that, relative to the control group, the odds ratios for both the mask-plus-hands group and the hands-only group “were twofold in the opposite direction from the hypothesized protective effect.”

Subsequently, a small 2014 study—with 164 participants—by Barasheed, et al. of Australian pilgrims in Saudi Arabia, staying in close quarters in tents, found that significantly fewer people in the mask group developed an ILI than in the control group (31 percent to 53 percent). Unlike the exact fever specifications utilized in other RCTs, however, this study accepted self-reporting of “subjective” fever in determining whether someone had an ILI. Lab tests revealed opposite results, with twice as many participants having developed respiratory viruses in the mask group as in the control group. These lab-test findings were not statistically significant; still, the lab tests’ greater reliability makes it far from clear that the masks in this study provided any genuine benefit.

Other RCTs found no statistically significant benefit from masks in intention-to-treat analysis. A 2008 pilot study by Cowling et al. in Hong Kong observed that secondary attack rates, using the CDC’s definition of ILI, were twice as high in the mask group (8 percent) as in the hand hygiene (4 percent) or control (4 percent) groups, but these observed differences were not statistically significant.

Other methods of analysis, deviating from intention-to-treat analysis, found the following.

A per-protocol analysis of a 2009 study in Sydney by MacIntyre, et al. found a significant effect when combining the surgical-mask group with a group wearing N95 hospital respirators. However, the authors write, a “causal link cannot be demonstrated because adherence was not randomized.”

In subgroup analysis of 2010 and 2012 studies in Michigan by Aiello, et al., limited to the final several weeks of the respective studies, each study’s mask-plus-hands group had significantly lower rates of ILI than its control group, while its mask-only group did not. In 2010, the results for the mask-only group also hinted at a slight benefit, reducing ILI by an observed (but not statistically significant) 8 percent to 10 percent. In 2012, the authors concluded, “Masks alone did not provide a benefit.” They nevertheless recommended the combination of mask use and hand hygiene, despite not having tested whether that combination works better than hand hygiene alone.

A multivariable analysis of a smallish (218 participants) 2012 study in Germany by Suess, et al. found that combining the mask group and mask-plus-hands group, while limiting analysis to interventions begun within 48 hours, produced a finding of significantly lower levels of lab-confirmed influenza (but not of ILI) in that combined group (but not in either group separately). The authors, from Berlin, recommended masking and hand hygiene, while opining, “Concerns about acceptability and tolerability of the interventions should not be a reason against their recommendation.”

The only RCT to test mask-wearing’s specific effectiveness against Covid-19 was a 2020 study by Bundgaard, et al. in Denmark. This large (4,862 participants) RCT divided people between a mask-wearing group (providing “high-quality” three-layer surgical masks) and a control group. It took place at a time (spring 2020) when Denmark was encouraging social distancing but not mask use, and 93 percent of those in the mask group wore the masks at least “predominately as recommended.” The study found that 1.8 percent of those in the mask group and 2.1 percent of those in the control group became infected with Covid-19 within a month, with this 0.3-point difference not being statistically significant.

This study—the first RCT on Covid-19 transmission—apparently had difficulty getting published. After the study’s eventual publication, Vinay Prasad, an M.D. at the University of California, San Francisco, described it as “thoughtful,” “useful,” and “well done,” but noted (with criticism), “Some have turned to social media to ask why a trial that may diminish enthusiasm for masks and may be misinterpreted was published in a top medical journal.”

Meanwhile, the CDC website portrays the Danish RCT (with its 4,800 participants) as being far less relevant or important than the observational study of Missouri hairdressers with no control group, dismissing the former as “inconclusive” and “too small” while praising the latter, amazingly, as “showing that wearing a mask prevented the spread of infection”—when it showed nothing of the sort.

Each of the RCTs discussed so far, 13 in all, examined the effectiveness of surgical masks, finding little to no evidence of their effectiveness and some evidence that they might actually increase viral transmission. None of these 13 RCTs examined the effectiveness of cloth masks. “Cloth face coverings,” according to former CDC director Robert Redfield, “are one of the most powerful weapons we have.”

One RCT tested these masks that so many high-profile public-health officials have touted. This “first RCT of cloth masks,” in the trial’s own words (it is apparently still the only one), was a 2015 study by MacIntyre, et al. in Hanoi, Vietnam. A relatively large study, with over 1,100 participants, it tested cloth masks against surgical masks and did not feature a no-mask control group. The trial tested the protection of health-care workers, instructing them to wear a two-layer cloth mask at all times on every shift (“except in the toilet or during tea or lunch breaks”) across four weeks.

The study found that those in the cloth-mask group were 13 times more likely (2.28 percent to 0.17 percent) to develop an influenza-like illness than those in the surgical-mask group—a statistically significant difference. The trial also lab-tested penetration rates and found that while surgical masks were “poor” at preventing the penetration of particles—letting 44 percent through—cloth masks were “extremely poor,” letting 97 percent through. (N95 hospital respirators let 0.1 percent through.)

The authors write that wearing a cloth mask “may potentially increase the infection risk” for health-care workers. “The virus may survive on the surface of the facemasks,” they explain, while “a contaminated cloth mask may transfer pathogen from the mask to the bare hands of the wearer,” which could lead to hand hygiene being “compromised.” As for double-masking, the authors write, “Observations during SARS suggested double-masking . . . increased the risk of infection because of moisture, liquid diffusion and pathogen retention.” Absent further research, they conclude, “cloth masks should not be recommended.”

MacIntyre and several other authors of this study, perhaps under pressure from the CDC or other entities with similar agendas, released what the CDC calls a “follow up study,” in September 2020. This follow-up isn’t really a study at all, certainly not a new RCT, yet the CDC cites it favorably while disparaging the original study, which, the CDC asserts, “had a number of limitations.” This 2020 follow-up pretty much amounts to publishing the finding that when hospitals washed the cloth masks, health-care workers were only about half as likely to get infected as when they washed the cloth masks themselves. Still, the 2020 publication says, “We do not recommend cloth masks for health workers,” much as the 2015 one said.

Other reviews of the evidence have been mixed but generally have come to similar conclusions. Certain masking advocates admit that the RCT evidence is “inconclusive” but cite other forms of evidence that have held up poorly. A study for Cochrane Reviews by Jefferson, et al. that examines 13 of the 14 RCTs discussed herein (all but the Denmark Covid-19 study) notes “uncertainty about the effects of face masks” and writes that “the pooled results of randomised trials did not show a clear reduction in respiratory viral infection with the use of medical/surgical masks during seasonal influenza.” Meantime, a study by Perski, et al., which performed a Bayesian analysis on 11 of the 14 RCTs discussed herein, concluded that when it comes to “the benefits or harms of wearing face masks . . . the scientific evidence should be considered equivocal.” They write, “Available evidence from RCTs is equivocal as to whether or not wearing face masks in community settings results in a reduction in clinically- or laboratory-confirmed viral respiratory infections.”

In sum, of the 14 RCTs that have tested the effectiveness of masks in preventing the transmission of respiratory viruses, three suggest, but do not provide any statistically significant evidence in intention-to-treat analysis, that masks might be useful. The other eleven suggest that masks are either useless – whether compared with no masks or because they appear not to add to good hand hygiene alone—or actually counterproductive. Of the three studies that provided statistically significant evidence in intention-to-treat analysis that was not contradicted within the same study, one found that the combination of surgical masks and hand hygiene was less effective than hand hygiene alone, one found that the combination of surgical masks and hand hygiene was less effective than nothing, and one found that cloth masks were less effective than surgical masks.

Hiram Powers, the nineteenth-century neoclassical sculptor, keenly observed, “The eye is the window to the soul, the mouth the door. The intellect, the will, are seen in the eye; the emotions, sensibilities, and affections, in the mouth.” The best available scientific evidence suggests that the American people, credulously trusting their public-health officials, have been blocking the door to the soul without blocking the transmission of the novel coronavirus.