Subscribe X
Back to Top


June 22, 2020 by Mental Health Data Science with Melanie Wall

Refocusing what we mean by race as a variable used in statistical analysis

In meeting today a colleague said “It’s interesting that there was a time in the not too distant past when data analysis of outcomes by race was discouraged and if it were to be done needed to have a strong justification for why that was necessary, and now it seems like we have done a 180 and want to emphasize it.”

With the growing discourse/reckoning that is happening thanks to the Black Lives Matter movement and the urgent need for white people in the U.S. to embrace anti-racist actions, I as a statistician have some areas close to home I can start focusing on.  I have analyzed LOTS of data about human public and mental health in my career. Race/ethnicity almost always is included as a covariate and sometimes as an effect modifier (interaction by race).

The issue of importance about race as a variable for study in statistical analysis is about what we mean by racial categories.  Those who have in the past tried to caution against looking at race are (probably) worried about us slipping into our eugenics past when race was thought of as a biological characteristic of someone.    But what we need to do is to clarify that race is not only an individual level characteristic (and certainly not a biological characteristic) of a person but instead a characteristic of a person’s level of exposure to a racist and white-supremest society and environment.  The way we use race in analysis should not be thought of as a variable that describes the individual themself, but instead describes their collective experience of being identified and treated by the society in a certain way.

When we use race without careful explanation in our statistical analysis, readers are left to think of it in any way they choose, which may allow for the perpetuation of racist interpretations (e.g. for example black-white differences being attributed to individual or cultural level inadequacies).  I think it is our job as communicators of data and interpreters of findings to help lead the reader to interpret race from its institutionalized lens as a measure of the racist context a person lives within.  One simple suggestion is to to start adding footnotes to race variable whenever it is included in a statistical table.  Perhaps the footnote would say: “Race/ethnicity categories delineate differences of experiences in the person’s lifelong exposure to a racist society/institution”. Something like that, suggestions/improvements welcome.

Comment I am a biostatistician who has worked on applied research questions in psychosocial and behavioral public health and psychiatry my whole career. Here is a place to post ideas, helpful hints, and maybe start discussions about this exciting work especially as we enter a new era of Data Science.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Blogs

  • Dr. David Healy

    Dr. David Healy

    Dr. Healy is a professor of psychiatry at Cardiff University in Wales and an author on the history of pharmaceuticals and government regulation.
  • Mad In America: Robert Whitaker

    Mad In America: Robert Whitaker

    Journalist and author Bob Whitaker distills the latest in pharmaceutical and mental health research.
  • Selling Sickness

    Selling Sickness

    Creating a new partnership movement to challenge the selling of sickness.
  • Kathy Brous

    Kathy Brous

    A serial of Kathy's recovery journey as an adult with attachment disorder.
  • Nev Jones

    Nev Jones

    Exploring the intersections of psychiatry, philosophy, neuroscience, cultural theory, critical community psychology and the mad/user/survivor movement.
  • 1boringoldman


    Retired psychiatrist and raconteur offers insightful analysis of the day's events from the woods of Georgia.