Prevention & Treatment, Volume 1, Article 0003c, posted June 26, 1998
Copyright 1998 by the American Psychological Association

Comment on Listening to Prozac but Hearing Placebo: A Meta-Analysis of Antidepressant Medication

Prozac and Placebo: There's a Pony in There Somewhere

Larry E. Beutler
University of California

Kirsch and Sapirstein (1998) have provided a provocative analysis of placebo contributions to antidepressant effects. They distinguish among response to treatment, treatment effect, placebo response, and placebo effect. In each case, response defines the total amount of change associated with the implementation of a treatment or placebo, whereas effect defines that portion of the response that can be attributed to the medication or placebo. They suggest, and I have been persuaded to concur that the field has inappropriately ignored the overshadowing role of the relative size of placebo and treatment effects in its rush to acclaim the effectiveness of antidepressants. They suggest that the inclusion of a proportional measure that describes the relative amount of change that is distributed to active treatment effects and placebo response will help balance presentations on the effects of treatments. The results have even broader and more important implications than those acknowledged by the authors, extending to prescription practices, how depression is conceptualized within a diagnostic perspective, and to the concept of treatment-induced deterioration effects. Collectively, however, the poor showing of antidepressants, both in this and other meta-analytic studies of these drugs, raise an interesting question about why and how public enthusiasm and faith is maintained in these treatments. This is a research question whose importance may even exceed that of the specific effects of the drugs themselves.

Correspondence concerning this article should be addressed via email to

Kirsch and Sapirstein (1998) have provided a compelling meta-analysis of the effectiveness of antidepressant medications. This presentation includes, and then goes far beyond, restating and confirming the usual conclusion that antidepressant effects are significantly stronger than those associated with placebo. Their effort to creatively extract the unique effects of placebo from general placebo response and to then apply this same logic to differentiating treatment effects from the broader response to treatment invokes some uncertain assumptions, but it also clarifies some critical relationships that are usually lost in meta-analyses and stimulates a number of very important questions related to research design, prescription practices, and the relations between science and practice. This article will expand on the implications of the startling results and will attempt to extend the discussion into areas and topics that, although untouched by Kirsch and Saperstein, beg for consideration. As an offshoot of these implications, science and scientists must seek to discover why substantial disparities exist between the findings presented and the conventional wisdom that exists in the field regarding drug effects.

Treatment Effects Versus Total Treatment Response

In order to understand the importance of Kirsch and Sapirstein's (1998) work, the reader must appreciate the distinction between a treatment response (the magnitude of change following the introduction of a treatment) and a treatment effect (response to treatment, less the improvement associated with nonspecific effects and spontaneous changes). These figures require a direct comparison of pretreatment to posttreatment change in both conditions. In conventional analyses, the size of these changes are usually overlooked because the effect size (d.) comparisons do not express this information and instead focus on the much smaller values that merely describe the end-of-treatment differences that exist between treatment and placebo or control groups. These methods obscure the fact that the amount of actual change in both types of groups is always greater than is reflected in this single comparison score and that the preponderance of these changes are similar in the two groups. They are the effects of what are conventionally thought to be placebo variables—expectation, hope, faith—and sundry incidental events..

To Kirsch and Sapirstein (1998), the difference between the magnitudes of response to an experimental treatment and a control condition allows one to estimate the magnitude of the effect of the experimental treatment. A treatment effect, in other words, is the difference between the treatment response associated with the targeted treatment and that associated with a no-treatment or placebo control group. Kirsch and Sapirstein express the relative drug effects as a ratio of treatment effect to total response. In their analysis, the total treatment effect of psychoactive medication was 25% of the total treatment response, for example.

Although this analysis captures the proportional distinction between treatment response and treatment effect, it does not adequately illustrate the similar distinctions between placebo response and placebo effect. The total reaction one has to a placebo is partly under the influence of factors that are incidental to the process and is partly under the influence of factors that are inherent in the process of taking a pill that is expected to produce change. Thus, inert placebos can be expected to have less effect than active placebos. Kirsch and Sapirstein (1998) illustrate that only the total response to treatment estimate, based on pre–post differences for each treatment, allows one to compare and appreciate how much change occurs in depression because of placebo and incidental influences. Moreover, only when studies include either a no-treatment control or an active placebo control can an estimate be made of how much of the actual placebo response is associated with the effects of these incidental versus drug-specific factors.

The Clinical Meaningfulness of Placebo Response and Treatment Effects

The significance of the overall response magnitude is seen by comparing the usual effect size obtained in comparisons of antidepressant medications with an inert placebo. For example, in the analysis undertaken by Kirsch and Sapirstein (1998), the mean effect size (treatment effect) associated with antidepressant use among the 19 studies in their analysis was .39. This is a nonremarkable but moderate estimate of treatment effects. As long as treatment and placebo groups are initially equivalent on the dependent variable, this value is the same whether it is based on a separate computation of each treatment's total, associated response or on the shorter and more conventional method that is based only on a direct comparison of the placebo and treatment posttest scores. In the first instance, d is derived as a difference between differences (standardized pre–post scores). In the latter case, they are derived only on comparisons of posttest differences, in each instances modified by either a pooled or initial standard deviation (SD). In either case, effect size is only a reflection of the relative response obtained to the two treatments and occludes the fact that incidental (nontreatment) variables lead to most of the change. Only the treatment response estimates, which are usually ignored, include these incidental factors.

To put these figures in perspective, one can refer to a table of normative values. If one does so, it becomes evident that the moderate effect size of .39 found by Kirsch and Sapirstein (1998) could seriously lead to an exaggeration of the power of the treatment. With a d of this magnitude, one can determine that in Kirsch and Sapirstein's analysis, the average (median) patient among those who received an antidepressant was better off than 65% of those who received a placebo, only a 15% gain in number of patients benefitted by antidepressants over placebo alone. More telling, translating the mean placebo response effect size of 1.16 in a similar way reveals that 88% of patients who received only placebos experienced improvement (12% stayed the same or got worse). This is a remarkably high percentage and is the basis for Kirsch and Sapirstein's conclusion that placebo accounts for 75% of the total response to the antidepressant medications. To some, it might appear obvious that the front line treatment of choice is placebo, not antidepressants.

Kirsch and Sapirstein (1998)'s introduction of the proportional estimate statistic is a valuable addition to the usual report of effect size (d) because, as the foregoing illustrates, it forces the reader to acknowledge the very strong effects that even inert placebos have on depression. But a report of the actual, pre–post effect size of each treatment would also be helpful to include in research reports. The significance of these estimates, reported in juxtaposition with one another, is made more apparent by Kirsch and Sapirstein's effort to partial out the proportion of the total treatment response that can be attributed to specific antidepressant effects, as well as to the incidental expectations associated with inert placebo, and the factors that are invoked by reactions to medication that do not include a specific antidepressant effect. Their effort to resolve these issues involved the completely novel comparison of the mean placebo response in drug studies with the response observed in no-treatment control groups in psychotherapy studies.

To my knowledge, Kirsch and Sapirstein's (1998) effort to compare the specific effects of administering an inert placebo with the effect of providing no treatment at all is a first. Their inclusion of a no-treatment comparison drawn from separate studies was a hazardous but very creative solution to the dilemma of understanding placebo responses. The comparison would have been more persuasive if they had assessed the hypothesis of "no difference" with an equivalence test rather than a difference test (Rogers, Howard, & Vessey, 1993). This latter test is a statistical comparison based on power analyses and yields a specific probability that the compared groups are comparable within the limits of measurement error (e.g., the standard error of measurement for the particular test used). It is a strong procedure that allows one to actually assess the probability of the null hypothesis being true when comparing groups on discrete variables.

Nonetheless, even without such an analysis, the results are intriguing and are sufficiently strong as to evoke serious thought to their conclusions. They demonstrate that (a) antidepressant effects are about equivalent to the effects of credible but non-antidepressant drugs—another form of the Do-Do bird verdict; (b) the change arising from either an active or an inactive placebo is several times more powerful than the change that can be attributed to specific antidepressant medications; and (c) in the classic understatement, the relative size of placebo effects is considerably larger than conventionally believed when treating depression medically.

Antidepressant Effects Versus Nonspecific Drug Response

Kirsch and Sapirstein's (1998) analysis provides first-time information on the specificity of the effects that are attributable to antidepressant medications. As they observe, the usual method of comparing an active drug to an inert placebo almost certainly overestimates the strength of the effect attributed to antidepressant medications. To get a more realistic estimate of the active effects of treatment, one must find a way to extract the portion of the total medication response that accrues from the inflated hopes and expectations that arise when a patient notes chemically induced changes and then uses these to infer that the medication he or she is receiving is active. The results of comparisons among placebo conditions, response to non-antidepressant but active medications, and antidepressant medications add an exponential level of interest to Kirsch and Sapirstein's analysis.

In a just world, the results reported by Kirsch and Sapirstein (1998) would challenge the belief that depression is a set of discrete, diagnosable illnesses linked to specific biochemical functions and "chemical imbalances." These results should cry out for a reconsideration of the validity of categorical diagnostic taxonomies for the depressive disorders. Kirsch and Sapirstein's data do not provide support for depression as a specific and discrete disorder. The possibility that the incidental effects on depression, consequent to administering drugs whose chemistry is not compatible with known theories of depression, is at least as large as the direct effects of antidepressants themselves, begs a reconsideration of the specificity of depressive diagnoses. The equivalent responses to a variety of antidepressants, anxiolytics, and minor tranquilizers argues for the view that depression is a general marker of distress that cuts across other conditions, rather than a disorder or dysfunction in its own right (e.g., Gotlib, Lewinsohn, & Seeley, 1995).

Even more telling is the remarkable observations of Kirsch and Sapirstein (1998) that antidepressants actually have an inhibiting effect on symptomatic change, at least among some patients. If such a finding is true, it indicates that antidepressants may actually slow the rate of symptomatic improvement, relative to the incidental and indirect effects of minor tranquilizers. Thus, some observers may now modify their original conclusions and assert that the first-line treatment of choice is either placebo or a general anxiolytic and minor tranquilizer rather than antidepressants. Indeed, the observed net response of patients to antidepressants (d = 1.14) may be attributed largely (75%) to the combined effect of active and inactive factors in placebo treatment, according to the figures provided by Kirsch and Sapirstein..

The possibility of an inhibiting effect of antidepressants recalls similar conclusions by Antonuccio, Danton, and DeNelsky (1995). They also reported a symptom-retarding effect of combining antidepressant medications with cognitive therapy, when contrasted with either treatment alone. The implications demand more research attention. In the most positive scenario, if the incidental antidepressant effects of non-antidepressants is, in fact, as powerful or more powerful than those of selective serotonin reuptake inhibitors (SSRIs), tricyclics, and other antidepressants, it may render unimportant the observation that 90% of primary care providers and 70% of psychiatrists mismanage antidepressant medications (Wells & Sturm, 1996). Somewhat facetiously, one may conclude that it may be just as well that these providers misdiagnose depression and prefer to prescribe minor tranquilizers rather than antidepressants, since the latter may inhibit positive response rates if administered to inappropriate patients.

Negative Effects and Deterioration

The fact that the mean effect size associated with the use of antidepressants, reported by Kirsch and Sapirstein (1998), was negative (d = -.17) compared to the effect size associated with non-antidepressant drugs has another implication. Though a negative effect size does not directly indicate that patients actually became worse, the downward shift of the normal response curve would place a higher percentage of individuals in the range of deterioration than would be true of the comparison condition. Thus, these results demand inspection of the possibility that antidepressants are more likely to induce actual deterioration than are active placebos, and perhaps even more than non-antidepressant medications. One may wonder whether the 14% increase in the number of patients improved is worth the cost.

Data on actual deterioration, as contrasted to negative correlations and effect sizes, are not directly extracted from the information provided by Kirsch and Sapirstein (1998). Unfortunately, Kirsch and Sapirstein failed to identify the studies used in this aspect of their analysis. However, through analysis of a study that is readily available to me, I could compare the antidepressant effects of a pill placebo and a non-antidepressant drug. This study (Beutler et al., 1987) can be used to illustrate the significance of the problem of treatment-induced deterioration.

Beutler et al. (1987) reported a randomized clinical trial of alprazolam, a high potency benzodiazapine that is widely used to treat moderate to severe anxiety and panic. It was conducted at a time when alprazolam was incorrectly thought to have direct, antidepressant effects, in addition to its anxiolytic ones. We randomly assigned depressed patients to one of four treatments: (a) an alprazolam regimen, (b) a placebo regimen, (c) an alprazolam regimen along with group cognitive therapy (CT), or (d) a pill placebo along with group CT. The direct effect of placebo was underestimated in this study because placebo responders were excluded from analysis during a pretreatment placebo washout period. However, the results revealed that there was little change in depressive symptoms in either the drug or placebo conditions. Indeed, on self-report measures, the average patient receiving alprazolam deteriorated through 20 weeks of treatment and a 3-month follow-up period. In absolute terms, mean scores scores on the Beck Depression Inventory (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) were higher at the end of follow-up than at the beginning of treatment. While not significant when compared to placebo, the distribution indicated that this active drug group had a higher rate of deterioration than any of the other three groups, significantly so when compared to either of the conditions that included CT. Over half of the patients in the alprazolam-only condition experienced actual increases in depressive symptomatology. The placebo-only condition produced no discernible change in depression; only the two CT conditions earned mean changes that were significantly and clinically positive.

Although the design did not allow for a comparison with an established and known antidepressant, it was interesting that with only one exception, the end-of-treatment and follow-up scores on both self-report and clinician measures were higher in the CT-plus-alprazolam condition than in the CT-plus-placebo condition. Though not statistically reliable, the pattern is again consistent with the suggestion (Antonuccio et al., 1995) that medication may inhibit the effects of treatments for depression. Moreover, taken with the other analyses cited here, these results challenge certain widely held beliefs about the effectiveness of medication and have direct relevance for questions about the adequacy of contemporary methodologies to control for the effects of expectation, hope, and nonspecific treatments.

Why Does Everyone Believe Drugs Are So Good?

If the treatment effects associated with antidepressants are so poor, to what can we attribute the widely held faith in the efficacy and effectiveness of these medications? In contemplating this important question, several nonexclusive possibilities reveal themselves. First and most likely, the failure to identify the actual magnitude of treatment response before the effects of placebo are extracted may have led many to misinterpret the amount of support provided by research studies for the efficacy of antidepressants. When one is presented only with the net effect of antidepressant treatment, in the form of an effect size or significance level, the fact that placebo response both is three times as large and accounts for over three times as many improved patients as the active medication is likely to be ignored.

Second, the analyses of Kirsch and Sapirstein (1998) do not reveal differences that may be present in the speed of the effects of antidepressants and placebos. The National Institute of Mental Health Treatment of Depression Collaborative Research Program (TDCRP), for example, revealed that imipramine produced faster effects than the comparison treatments, including placebo, even though the overall mean effects were indistinguishable (Elkin, 1994; Imber et al., 1990). To explore the implications of this study relative to the proposals of Kirsch and Sapirstein, I undertook a cursory analysis of data presented by Elkin et al. (1995). These data did not include the entire TDCRP set but were selected because they were restricted to the most severe patients in the sample. These are the patients for whom imipramine was found to have the largest effects (Elkin et al., 1995), and their selection was designed to ensure that the results would present the medication in their most favorable light. Even in this best-case example, however, application of the proportional score advocated by Kirsch and Sapirstein reveals that placebo response accounted for approximately 60% of the total medication response. If medication has an advantage over placebo, it should show up in this sample because placebo responders were excluded, and indeed it did. However, the overall effect size is overshadowed, once again, by the sheer magnitude of the placebo contributors to the response.

A third explanation for the disparity between the modest results presented by Kirsch and Sapirstein (1998), and the degree of faith with which both the professional and nonprofessional public imbues antidepressants, may reside in the dramatic nature of some drug-induced changes. Extreme events are more clearly remembered and more broadly generalized than "average" results. That is, when medication works, it may work very well, leaving the recipient with a missionary zeal for the value of the medication and distorted estimates of the frequency of their occurrences. Thus, those who have been helped may be relatively vocal in advocating for their new found benefactor while those who are not helped may choose to avoid the issue. And, those who observe do so through rose-colored glasses.

These three possibilities may work together to leave the public with a false expectation and hope for the amount of benefit and the degree of promise offered by chemical solutions to depression. But, it is difficult to believe that the strength of the lobby and enthusiasm generated by psychopharmaceutical solutions can be accounted for by these three simple possibilities alone. At the very least, the provocative results reported by Kirsch and Sapirstein (1998) call on scientists to include in their search for specific biological mechanisms and sites of action a broader view of the psychosocial context in which medication responses occur. Research, not just on placebo response and not just on clinical efficacy, but on placebo effects and levels of satisfaction, are mandated by these results. Science must come to understand the interactive roles played by initial expectations, chemical effects, feedback that confirms or disconfirms patient expectations, and levels of satisfaction to ultimate symptom change. New conceptualizations, such as those provided by Kirsch and Sapirstein, should help move research in this direction.


Antonuccio, D. O., Danton, W. G., & DeNelsky, G. Y. (1995). Psychotherapy versus medication for depression: Challenging the conventional wisdom with data. Professional Psychology: Research and Practice, 26, 574-585.

Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.

Beutler, L. E., Scogin, F., Kirkish, P., Schretlen, D., Corbishley, A., Hamblin, D., Meredith, K., Potter, R., Bamford, C. R., & Levenson, A. I. (1987). Group cognitive therapy and alprazolam in the treatment of depression in older adults. Journal of Consulting and Clinical Psychology, 55, 550-556.

Elkin, I. (1994). The NIMH Treatment of Depression Collaborative Research Program: Where we began and where we are. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 114-139). New York: Wiley.

Elkin, I., Gibbons, R. D., Shea, M. T., Sotsky, S. M., Watkins, J. T., Pilkonis, P. A., & Hedecker, D. (1995). Initial severity and differential treatment outcome in the National Institute of Mental Health Treatment of Depression Collaborative Research Program. Journal of Consulting and Clinical Psychology, 63, 841-847.

Gotlib, I. H., Lewinsohn, P. M., & Seeley, J. R. (1995). Symptoms versus a diagnosis of depression: Differences in psychosocial functioning. Journal of Consulting and Clinical Psychology, 63, 90-100.

Imber, S. D., Pilkonis, P. A., Sotsky, S. M., Elkin, I., Watkins, J. T., Collins, J. F., Shea, M. T., Leber, W. R., & Glass, D. R. (1990). Mode-specific effects among three treatments for depression. Journal of Consulting and Clinical Psychology, 58, 352-359.

Kirsch, I., & Sapirstein, G. (1998). Listening to Prozac but hearing placebo: A meta-analysis of antidepressant medication. Prevention & Treatment, 1, Article 0002a. Available on the World Wide Web:

Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553-565.

Wells, K. B., & Sturm, R. (1996). Informing the policy process: From efficacy to effectiveness data on pharmacotherapy. Journal of Consulting and Clinical Psychology, 64, 638-645.