Copyright C) 1998 by Lawrence Erlbaum Associates, Inc. Personality and Social Psychology Review 1998, Vol. 2, No. 3, 184-195 Why's My Boss Always Holding Me Down? A Meta-Analysis of Power Effects on Performance Evaluations John C. Georgesen and Monica J. Harris Department of Psychology University of Kentucky One factor with potential links to performance evaluation is evaluator power. In a meta-analytic review of the available literature, the relation between power and performance evaluation was examined. Results indicate that as power levels increase, evaluations of others become increasingly negative and evaluations ofthe self become increasingly positive. We examined moderators ofthese relations, and methodological variables caused the most differences in effect sizes across studies. The article addresses implications of these findings for businesses and social psychological theories of power. evaluation process is not limited to executives, howIndividuals lower in the organizational hierarchy also may view performance evaluations as inaccurate and biased (Kleiman, Biderman, & Faley, 1987). Given that many employees feel their evaluations are overly negative and inaccurate, it is interesting to note that substantial differences in evaluation results exist between differing types of evaluators such as the self, supervisors, and peers (M. M. Harris & Schaubroeck, 1988). Performance evaluations traditionally have been conducted by one's supervisor, although increasing numbers of organizations use multiple raters in conducting evaluations (Budman & Rice, 1994). Self-evaluations and peer-employee evaluations are becoming increasingly common. Frequently, however, evaluations of the same individual differ significantly between raters (M. M. Harris & Schaubroeck, 1988; Landy & Farr, 1980). Furthermore, only moderate agreement exists between self-evaluations and supervisor evaluations, although the relation between peer and supervisor evaluations is somewhat stronger (M. M. Harris & Schaubroeck, 1988). What causes these discrepancies in evaluation? Egocentric bias has been suggested as one explanation of the discrepancies between supervisor, self-, and peer evaluations (M. M. Harris & Schaubroeck, 1988). This theory suggests that individuals inflate self-ratings to improve their evaluation and secure their employment. Within this theory, discrepancies are also believed to result from attributional differences; peers and supervisors, serving as observers, attribute poor performance to internal personality factors whereas selfevaluators, or actors, attribute poor performance to their environment (M. M. Harris & Schaubroeck, 1988). Furthermore, peers and supervisors attribute good performance to the environment whereas self-evaluators From being threatened with additions to one's "permanent record" in grade school to the review process for a tenured position, individuals encounter evaluative ever. situations throughout their lives. Performance evaluations are omnipresent in many careers and show no signs of lessening in either frequency or popularity (Budman & Rice, 1994). Periodic evaluations occur throughout the employment of most individuals in professional jobs, and the topic has been of interest to psychologists for more than 50 years (Landy & Farr, 1980). As one might expect of such a well-established practice, performance evaluations offer tangible benefits: They may yield concrete measures of employee performance, often provide useful information for improving problem weak areas, and may facilitate careers (Queen, 1995; White, 1979). Although the previous factors illustrate only a few of the total benefits, why organizations rely on the wealth of information stemming from evaluations is obvious. Some members of organizations, however, perceive evaluative rating as biased and unreflective of their true abilities (Gioia & Longenecker, 1994; Landy & Farr, 1980). A variety of factors may bias the performance evaluation process. Executives often complain about politics pervading others' evaluations of them (Gioia & Longenecker, 1994). As their position in an organization rises, executives increasingly feel evaluations of their performance are driven by factors such as issues of executive control and ensuring the continuance of the current structure in the organization (Gioia & Longenecker, 1994). This dissatisfaction with the Requests for reprints should be sent to John C. Georgesen, Department of Psychology, 125 Kastle Hall, University of Kentucky, Lex- ington, KY 40506-0044. E-mail: jcgeorl @pop.uky.edu. 184 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION attribute good performance to internal factors. Other variables, however, also affect performance evaluation, cause discrepancies between raters, or both (M. M. Harris & Schaubroeck, 1988; Landy & Farr, 1980). For example, the general job performance of the rater affects the quality of that rater's evaluations (Landy & Farr, 1980) as does the rater's opportunity to observe the target's performance (M. M. Harris & Schaubroeck, 1988; Landy & Farr, 1980). Another factor potentially biasing performance evaluation is the power level of the evaluator. Although addressed in one research program (Kipnis, 1972; Kipnis, Castell, Gergen, & Mauch, 1976; Kipnis, Schmidt, Price, & Stitt, 1981; Wilkinson & Kipnis, 1978) and other isolated studies, the effects of power on evaluation have not been systematically explored. In addition, except for a recent resurgence of interest, power phenomena have been irregular targets of social psychological inquiry and theorizing (Fiske & Morling, 1996; Ng, 1980). Therefore, a general definition of power may be useful before addressing its potential effects on performance evaluation. In the past, power has been defined as the amount of influence that one person can exercise over another person (Dahl, 1957; Huston, 1983; Kipnis et al., 1976). More recent definitions of power (Fiske, 1993; Fiske & Morling, 1996), however, have focused on actual control rather than influence as the key component of power. The reason for this change in focus has been the realization that one may possess influence without possessing actual control over the outcomes of others. For example, one might be a moral leader in the community and influence others' decisions through persuasive rhetoric but possess no actual control over others' outcomes. Thus, in current conceptualizations, power is the difference in the amount of control possessed by members of a dyad (Fiske, 1993; Fiske & Morling, 1996). The member of a dyad with the most control over the other's outcomes has more power. Defining power in this way is also congruent with other traditional conceptions of power. For example, in the classic French and Raven (1959; Raven, 1992) typology, a distinction is made between power that involves control of others' outcomes (e.g., reward and coercive power) and bases of power that do not (e.g., expert, informational, referent). The latter forms of power are more indirect and involve the willing cooperation and deference of the target. These power bases involve influence over others, but whether they address the actual ability of the power holder to control the outcomes of the other is not clear. For example, an individual high in referent power may not be able to actually control another by reward, coercion, or both. Therefore, we prefer to adopt a definition of power that encompasses the idea of unilateral control because we feel that making this distinction allows the creation of theoretically unambiguous operational definitions of power. Defining power in other ways requires the assumption that the target actively participates in the influence attempt. With its relation to the control of others, power is an obvious candidate for consideration of its potential effects on evaluation. Two models address the potential relation between power and performance evaluation. The first model of how power may affect evaluations stems from the research of Kipnis and his colleagues (Kipnis, 1972; Kipnis et al., 1976, 1981; Wilkinson & Kipnis, 1978). They argued that as an evaluator's power increases, that evaluator will make more attempts to influence others. As more attempts to influence others are made, the evaluator comes to believe that he or she controls the behavior(s) of other people. This portion of Kipnis' model is similar to Kelley and Thibaut's (1978) conception of fate control. This belief of responsibility for others' behaviors consequently causes the individual to devalue the performance of others; that is, the higher power person comes to believe that he or she is the causal agent in producing relevant outcomes. Furthermore, this bias conceivably may cause an evaluator to take responsibility for any successes associated with the work of others. Applied to a performance evaluation situation, the model suggests that as the power level of an evaluator increases, the positivity of their evaluations of subordinates will decrease. Another model less directly addressing the power-evaluation relation but theoretically promising in its implications concerns the association between power and stereotyping (Fiske, 1993; Fiske & Morling, 1996). The general premise of this model is that persons in positions of power are especially vulnerable to stereotyping subordinate others. Fiske and Morling (1996) argued that individuals in powerful positions attended less to subordinates for three reasons: they lacked cognitive capacity, their outcomes were not controlled by their subordinates, and they may not have wanted to attend because of their dominance level and associated beliefs. Individuals in powerful positions may suffer a lack of cognitive capacity due to increased attentional demands (Fiske, 1993). For example, a powerful individual's position may involve the supervision of 20 subordinates whereas an individual lower in the organizational hierarchy may supervise only 1 or 2 employees. Also, because their outcomes often do not depend on their subordinates, individuals with power may focus the brunt of their attention elsewhere. Finally, dominance orientation may lead to a lack of attention in that individuals with a dominant personality may attempt to control their interactions with others and consequently ignore the actions and motivations of others during the interaction (Fiske & Morling, 1996). Regardless of its cause, decreases in attention make powerful individuals more likely to depend on stereo- 185 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 GEORGESEN & HARRIS types in interacting with subordinates (Fiske & Morling, 1996). This model has not been directly tested on a range of performance evaluation situations but is intuitively appealing in its implications. If a powerful individual in an evaluator role does not attend to a subordinate and holds negative stereotypes about subordinates generally, then an evaluation of that subordinate might be negatively affected by stereotyping. This reasoning assumes that individuals in positions of power hold negative stereotypes about subordinates, an argument that may be supported in part by Western beliefs that power is given to people on the basis of their talents and skills (Goodwin, Fiske, & Yzerbyt, 1997). Thus, on the basis oftheir belief that power is earned, powerful individuals may be more likely to stereotype negatively those individuals with lower power than themselves. Both of the previous models hold promise for explaining potential relations between power and performance evaluation. However, the basic question of what exactly the effect of power is on performance evaluation remains as yet unanswered. Some researchers have found negative effects of power on evaluation (Kipnis, 1972; Kipnis et al., 1976, 1981; Wilkinson & Kipnis, 1978), whereas other researchers have found power effects to be negligible (Lightner, Burke, & Harris, 1997; Pandey & Singh, 1987; Wexley & Snell, 1987). The range of findings in this area suggest that a synthesis is necessary, both to ascertain if power actually affects performance evaluation and to lay a foundation for later testing of the mechanism(s) by which it affects evaluation. Our purpose here was to conduct the first quantitative review of research on the effects of power on performance evaluation. We addressed three areas of primary interest in this meta-analysis: 1. Does the power position of the evaluator affect performance evaluation? We asked this question from two perspectives: (a) What is the effect of power on evaluations of a lower ranked other?, and (b) What is the effect of power on self-evaluations? What is the size and direction of the effect? Based on both of the power theories mentioned earlier, one might expect that as power increases, evaluations of lower power others become increasingly negative. Available literature, however, provides no clear predictions concerning the effects of power on self-evaluation. 2. What theoretical variables moderate the effect of power on performance evaluation? For the purposes of this meta-analysis, a variable was considered a theoretical moderator if that variable would add to a more refined theoretical understanding of the nature of power. Theoretical moderators examined in this study include location of study, participant gender, and participant age. For example, location of study is considered a theoretical variable in that various cultural factors associated with different geographic regions may affect patterns of power usage. 3. Can differences in effect sizes between studies be explained by methodological variables? Methodological moderators studied were the laboratory responsible for research, quality of study, strength of power manipulation, type of research, type of study, and year of study. Method Sample The literature search began with a computer-based strategy. Searches were conducted on the PsycLIT (American Psychological Association, 1974-present) database for articles published since 1974. Also, computer searches were conducted on ABI Inform (University Microfilms International, 1975-present), a business database, for articles published since 1975. After these searches, the first author manually inspected the Journal of Applied Psychology (1960-1996) to ensure both that articles were not missed on the computer searches due to improper keyword usage and that earlier articles in the area had been uncovered. Following the collection of articles elicited in the previous searches, an ancestry approach of exploring their references was used to collect any previously undiscovered articles. The ancestry approach involves using the references of relevant articles to retrieve more articles. Then, the references of the relevant articles are used to retrieve yet more articles in a continuing process until no further useful references are discovered. As a final step, the Social Sciences Citation Index (Institute for Scientific Information, 1981-1998) was used in a descendancy search involving several often-cited articles (Kipnis et al., 1976; Wexley & Snell, 1987; Wilkinson & Kipnis, 1978) in the already obtained literature. The descendancy approach involves perusing the Social Sciences Citation Indexes to retrieve a list of all the articles that cite a particular well-known study. The retrieval of articles discovered in the ancestry and descendancy searches was not limited by year of publication. Criteria for inclusion and exclusion. Articles collected in the previous searches were chosen for inclusion or exclusion on the basis of both publication and content criteria. Studies included in this meta-analysis were published journal articles and available unpublished experiments. The content of a study also affected its inclusion or exclusion. To be included in this analysis, a study must have conceptualized power in a manner similar to the following definition: Power is the ability of an individ- 186 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION ual to exert more control or influence over another's outcomes than the other can exert over the controlling individual's outcomes (Fiske, 1993; Fiske & Morling, 1996). The study also had to address the effects of power on evaluations explicitly and either manipulate positional power differences or use preexisting power differences. Studies were included whether they measured real or perceived power differences. Investigations exploring status without mention of power were excluded from this review because individuals may possess differing status without the ability to exercise unequal amounts of influence over one another (Fiske, 1993). Finally, the study had to contain an actual evaluation of a subordinate, the self, or both by an individual(s) possessing a power advantage over the outcomes of others. Coding The first author Georgesen coded studies selected for inclusion along the following dimensions to investigate the moderating effects of both theoretical and methodological variables on the relation between evaluator power and performance evaluation. In addition, a second rater also coded the subjective variables included in this study: overall quality, strength of manipulation, and type of study. Initial interrater reliabilities, as measured by interrater correlations, were varied: .40 for the strength of manipulation variable, .68 for overall quality, and 1.00 for the type of study variable. Due to the somewhat low reliabilities associated with the strength of manipulation and quality variables, these rating discrepancies were explored and resolved by conference. The coding process began with each study being coded as to whether it measured self- or other performance evaluation. Studies were also coded by location, percentage of male participants, and participants' age. Location, defined as whether the study was conducted in the United States or elsewhere, may moderate the effects of power on evaluation via unique cultural factors. For example, power may have larger effects on evaluation in an individualistic rather than a collectivistic cultures due to emphasis placed on individual achievement in individualistic cultures. Coding for the percentage of male participants allows for examining potential gender effects on power roles.' The consideration of age is important in that one may become increasStudies were not coded for gender more specifically (i.e., gender of supervisor) because the vast majority of the retrieved studies did not contain the necessary information. Available evidence from several studies in our own laboratory, which did code and analyze for supervisor gender, suggests that the supervisor's gender does not affect self- and other-evaluation. ingly likely to hold more powerful positions with increasing age, which may strengthen any effects of power on other-evaluation. When possible, median age of study participants was coded. Age was dropped as a potential moderator after initial coding, however, due to its complete redundancy with the type of study (in all cases, experimental investigations of power took place on college campuses). Also redundant with type of study, the type of power measured may moderate the effects of power on other-evaluation. Participants who actually possess power in a workplace setting may be more likely to derogate others in their evaluations than participants experiencing a contrived power difference stemming from an experimental manipulation. Studies were also coded for several methodological variables that may moderate the relation between evaluator power and evaluation. Each study was coded by the raters on a 4-point scale ranging from 1 (very poor overall) to 4 (very good overall) for both overall study quality and strength of experimental manipulation (if applicable). The overall quality of the study was rated by considering issues such as power, sample size, adequate controls, randomization, and other methodological factors that can affect study outcomes (Rosenthal, 1991). For example, a study that had a large enough sample to investigate the question of interest with at least a moderate degree of power, randomly assigned participants to groups, conceptualized measures clearly, and performed the appropriate analyses would receive a quality rating of 4. A study that met all of the previous requirements except for one criterion would receive a quality rating of 3, and so forth down the rating scale until a study that met only one of the previous criteria would receive a quality rating of 1. The strength of an experimental manipulation was rated by considering the realism, believability, and participant involvement associated with the power manipulation. For example, a manipulation that led the participants to believe they were part of a supervisor-subordinate dyad in which the supervisor could negatively affect the subordinate's outcomes, directly involved both participants in their respective roles, and made it seem as if the power difference was affecting the results of the interactions would receive a strength manipulation of 4. A study that contained two of these elements but not three would be down rated to a strength of 3, a trend that could continue downward until a manipulation was rated as having a strength of 1 if it contained none of the previous elements. The studies were also coded for author (laboratory responsible for research), overall number of participants, nature of research (psychology or business publication), type of study (correlational, quasi-experimental, or experimental), and the year in which the article was published. These variables were chosen for inclusion because of their demonstrated effects on study outcomes in other meta-analyses (Rosenthal, 1991). 187 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 GEORGESEN & HARRIS In addition to being coded for theoretical and methodological variables, each study was coded to yield one index of effect size and significance level. This metaanalysis used the Pearson correlation coefficient r as the estimate of effect size. In studies with multiple outcome measures of other-evaluation, self-evaluation, or both, the effect size for each measure was transformed to its associated Fisher Zr and combined with the other Fisher Zr's to yield the mean Fisher Zr. This was then transformed back to. r to yield the mean effect size for the study. In the case of one of the other-evaluation studies, not enough statistical information was included to calculate an effect size, so in line with Rosenthal's (1995) recommendations, that study was assigned an r of .00 and a p of .50. Results The literature search yielded a total of 25 codable studies; 7 studies examined the effects of power on self-evaluation, and 18 studies examined the effects of power on other-evaluation. Tables 1 and 2 list the studies included in each category, their scores on the coded methodological and theoretical variables, and the effect size and significance level extracted from each study. As mentioned, we analyzed these two groups of studies separately due to the differing predictions associated with their relation to power. For both groups of studies, all effects were in the predicted direction with the exception of the otherevaluation study coded as having an effect size of r = .00 and p = .50 due to insufficient statistical information. Following the calculation and coding of individual effect sizes and significance levels (see Table 1 and Table 2), these results were combined to yield a mean effect size and mean significance level for each of the two groups of studies. General Analyses The overall effect size for each set of studies was calculated by obtaining the weighted mean Fisher Zr and transforming it back to the weighted mean r (Rosenthal, 1991). Due to the disparate sample sizes of articles included in this meta-analysis, each study was weighted by its total degrees of freedom as a means of taking study size and the reliability associated with larger samples into account. Where possible, this weighting occurred throughout the analyses. The weighted mean effect size associated with the combined self-evaluation studies was r= .45, indicating a medium to large effect of power on self-evaluation. As power levels increase, self-evaluations become increasingly positive in tone. When unweighted, the mean effect size for this set of studies was somewhat smaller, r = .38 (see Table 3). The median effect size, minimum and maximum effect size, quartile scores, and 95% confidence intervals for this set of studies were also computed, and are reported in Table 3. Throughout the analyses, confidence intervals were calculated according to the procedures recommended by Rosenthal (1991), using weighted mean effect sizes. A stem and leaf plot (see Table 3) was constructed, using the effect sizes of the self-evaluation studies, to check for possible outliers affecting the results, but none were found. The weighted mean effect size associated with the combined other-evaluation studies was r = .29, indicating a roughly medium effect of power on other-evaluation.2 As power levels increase, one's evaluations of others become more derogatory. When unweighted, the mean effect size did not vary, r = .29 (see Table 4). The median effect size, minimum and maximum effect size, quartile scores, and 95% confidence intervals were also calculated for this set of studies (see Table 4). Confidence intervals were calculated using the weighted mean effect size. A stem and leaf plot did not reveal any outlying effect sizes. The combined significance levels for each of the two sets of studies were calculated with the Stouffer method (Rosenthal, 1991), again weighted by degrees of freedom. The combined significance level for the selfevaluation studies was significant, Z = 12.88, p < .001. The combined significance level for the other-evaluation studies was significant as well, Z= 1 1.07,p < .001. Thus, that no relation exists between power and evaluation for either set of studies is extremely unlikely. Following the previous computations, a chi-square testing heterogeneity of variance was computed for each set of studies to examine the consistency of results across studies (Rosenthal, 1991). The chi-square test for self-evaluation studies indicated significant differences in effect sizes across studies, X2(6, N = 7) = 21.26, p < .05. Significant differences in effect sizes were also found for the other-evaluation studies, x2( 17, N = 18) = 62.62, p < .01. These results suggested that various factors may moderate the effects of power on evaluation and prompt closer examination of the data. Effects of Moderating Variables The moderating effects of coded methodological and theoretical variables were assessed using both the contrast Z and correlational approaches (Rosenthal, 1991). Recall that one of the other-evaluation studies was assigned an effect size of zero due to insufficient statistical information. A set of analyses was also conducted excluding this study, thus treating it as missing data. The weighted mean effect size r for other-evaluation studies was nearly identical, r = .30, unweighted mean effect size r = .30, and median r = .31. Therefore, leaving out the effect size r of .00 did not alter our pattern of results, and we decided to continue its inclusion in subsequent analyses, a more conservative approach recommended by Rosenthal (1991). 188 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION Table 1. Coded Variables for Power and Self-Evaluation Studies Manipulation Quality Rating Effect Size (r) Significance Rating Correlational NA 3 .50 12.04*** Kentucky Experiment 3 3 .24 1.87* Kentucky Experiment 3 4 .19 1.95* Mdn = 35 New York factory Correlational NA 4 .31 3.35*** 200 College Canada Experiment 4 4 .53 7.49*** 75 29-49 India Quasi-experiment 3 3 .52 4.50*** 63 College Illinois Experiment 4 4 .36 2.86** Authors N Evans & Fischer (1992) Georgesen, Harris, & Lightner (1997) Georgesen et al. 580 20-55 U.S. workplaces 61 College 108 College 117 Age Type of Study Location Level (Z) (1997) Ilardi, Leone, Kasser, & Ryan (1993) Sachdev & Bourhis (1985) Singh (1994) Stolte (1978) Note: The manipulation rating is a measure of the strength of an experimental manipulation on a scale ranging from 1 (weakest manipulation) to 4 (strongest manipulation). The quality rating is a measure of the study's overall quality on a scale ranging from I (very poor overall) to 4 (very good overall). The letters NA in a category denote that the rating was inapplicable to a given study. *p< .05. **p< .01. ***p< .001. Table 2. Coded Variablesfor Power and Other-Evaluation Studies Manipulation Quality Effect Significance Rating Rating Size Level Authors N Age Assor (1989) Brief, Aldag, & Russell (1979) Georgesen, Harris, & Lightner 95 199 61 College Mid-age College Israel Iowa Kentucky Quasi-experiment Correlational Experiment NA NA 3 3 3 3 .26 .03 .45 2.54** 0.42 3.51*** 108 88 104 College College College Kentucky Indiana California Experiment Quasi-experiment Experiment 3 2 4 4 4 2 .24 .24 .53 2.49** 2.25* 5.30*** 120 College Kentucky Experiment 3 4 .05 0.55 Correlational NA 4 .31 3.39*** College Pennsylvania Unknown Pennsylvania Experiment Correlational 4 NA 2 3 .43 .42 2.27* 3.66*** Location Type of Study (1997) Georgesen et al. (1997) Gundlach & Cadotte (1994) Harari, Bujarski, Houlne, & Wullner (1975) M. J. Harris, Lightner, & Manolis (in press) Ilgen, Peterson, Martin, & Boeschen (1981) Kipnis (1972) Kipnis, Castell, Gergen, & Mauch (1976) Kipnis et al. (1976) Kipnis, Schmidt, Price, & Stitt (1981) Lightner, Burke, & Harris 120 Mdn = 47 U.S. factory 28 76 25 113 Adults College Pennsylvania Pennsylvania Correlational Correlational NA NA 1 4 .60 .31 3.00*** 3.30*** 54 College Kentucky Experiment 2 3 .23 1.69* College College College Mdn = 30 Mdn =27 India Canada Illinois U.S. store Experiment Experiment Experiment Correlational Correlational 2 4 4 NA NA 3 4 4 2 2 .00 .52 .24 .08 .34 0.00 7.35*** 1.92* 0.57 2.78** (1997) Pandey & Singh (1987) Sachdev & Bourhis (1985) Stolte (1978) Wexley & Snell (1987) Wilkinson & Kipnis (1978) 72 200 63 51 67 Pennsylvania Note: The manipulation rating is a measure of the strength of an experimental manipulation on a scale ranging from 1 (weakest manipulation) to 4 (strongest manipulation). The quality rating is a measure of the study's overall quality on a scale ranging from 1 (very poor overall) to 4 (very good overall). The letters NA in a category denote that the rating was inapplicable to a given study. *p <.05. **p <.01. ***p <.001. The correlational approach uses study as the unit of analysis and correlates the effect sizes of studies with the coded moderating variable of interest to yield a correlation coefficient. The contrast Z approach weights studies by moderating variables and tests for differences between groups, yielding a Z used to determine the significance of the difference. Because the contrast Z approach takes into account the individual sample size of each study, it is a much more powerful analysis than the correlational approach, but contrasts were computed with both methods where possible. Due to the number of moderator analyses, a Bonferroni correction 189 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 GEORGESEN & HARRIS was applied to the p values associated with each moderating analysis for the self-and other-evaluation studies. Table 5 shows the effect of moderator variables on the power and self-evaluation relation. One coded variTable 3. Stem and Leaf Plot for Power and Self-Evaluation Studies Stem .5* .5 4* .4 .3* .3 .2* 2 .1* .1 Leaf 023 68 1 9 Note: Mean = .38; Median = .36; Minimum = .19; Maximum = .53. Confidence interval,.,, = .25; confidence intervalupw = .51. For the 25 percentile, the value = .24. For the 75 percentile, the value = .52. * =5. Table 4. Stem and LeafPlotfor Powerand Other-Evaluation Studies Stem .6 .5 .4 .3 .2 .1 .0 Leaf 0 23 235 114 34446 Table 6. Correlations Between Study Variables and Effect Sizefor Other-Evaluation Studies Study Variables 0358 Note: Mean = .29; Median = .29; Minimum = .00; Maximum = .60. Confidence intervalOW, = .21; confidence interval , = .38. For 25 percentile, the value = .19. For the 75 percentile, the value = .44. Table 5. Correlations Between Study Variables and Effect Size for Self-Evaluation Studies Study Variable able, a theoretical one, moderated the relation between power and self-evaluation. Studies occurring outside the United States showed a stronger effect of power on the positivity of self-evaluation, Z = 2.97, Bonferroni p < .01. When this relation was assessed with the correlational approach, the resulting correlation was large in magnitude, r = .73, but nonsignificant due to the small number of studies. Interestingly, none of the coded variables theoretically linked to power significantly moderated the magnitude of the power and other-evaluation relation (see Table 6). Effect sizes did not vary systematically with the percentage of male participants or the location of the study. All r' s were less than .17 and contrast Zs were less than .58. Four methodological variables, however, did moderate the effects of power on other-evaluation (see Table 6). Articles stemming from Kipnis's research program (Kipnis, 1972; Kipnis et al., 1976; Wilkinson & Kipnis, 1978) showed stronger negative effects of power on other-evaluation than articles written by other authors, Z = 2.57, Bonferroni p < .05 (see Table 6). An analysis of the moderating influence of year of study on effect size was marginally significant following the Bonferroni correction, Z = -2.40, Bonferroni p < .07. Studies conducted in earlier years reported a stronger effect of power on other-evaluation than more recent studies. The correlational analysis for this moderator was also Effect Size Contrast Z Score Number of Participants .47 .41 Percentage of Males in Study .44 1.46 Strength of Manipulation -0.94 -.22 Quality of Study .73 -2.97* Study Location -.30 -1.41 Type of Study Year of Study -.35 -1.86 Note: Strength of manipulation was rated on a scale ranging from 1 (weakest manipulation) to 4 (strongest manipulation). Quality of study was rated on a scale ranging from 1 (very poor overall) to 4 (very good overall) scale. Study location refers to whether the study was conducted within or outside the United States. Nature of research refers to whether the experiment was conducted as a business or psychology investigation. Type of study refers to whether the study was correlational, quasi-experimental, or experimental in its methodology. *p <.01. Laboratory (Author) Number of Participants Percentage of Males in Study Strength of Manipulation Quality of Study Study Location Nature of Research Effect Size -.46 -.17 Contrast Individual Group Z Score a P -2.57* -.46 -.05 -.02 .66* 3.20** -.35 -.07 .22 -2.63* 0.58 2.05 .02 -.40 0.08 -2.40 .66* .81 -.35 -.23 -.40 -.29 (Journal) Type of Study Year of Study Note: Laboratory refers to whether the study was conducted by Kipnis's laboratory or other researchers. Strength of manipulation was rated on a scale ranging from I (weakest manipulation) to 4 (strongest manipulation). Quality of study was rated on a scale ranging from I (very poor overall) to 4 (very good overall). Study location refers to whether the study was conducted within or outside the United States. Nature of research refers to whether the experiment was conducted as a business or psychology investigation. Type of study refers to whether the study was correlational, quasi-experimental, or experimental in its methodology. Individual P refers to the standardized regression weight when the individual moderator is used to predict the effect size; Group P refers to the moderator's standardized regression weight when all significant moderators are included as predictors of effect size. *p <.05. **p <.01. 190 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION Table 7. Correlations Between Moderator Variables Variable Laboratory (Author) Number of Participants Percentage of Males in Study Strength of Manipulation Quality of Study Study Location Nature of Research (Journal) Type of Study Year of Study 2 1 1.00 .24 .06 -.29 .53** .25 1 .24 .49* 1.00 -.07 .31 .23 .07 -.09 -.24 .15 1.00 .81** .06 .23 -.02 .06 .03 5 4 3 1.00 -.15 .04 .43 .37 -.70** 1.00 .13 .23 .28 .48* 6 7 8 - - - 1.00 -.18 .20 .11 1.00 -.37 .05 - 1.00 .21 9 1.00 Note: Strength of manipulation was rated on a scale ranging from 1 (weakest manipulation) to 4 (strongest manipulation). Quality of study was rated on a scale ranging from 1 (very poor overall) to 4 (very good overall). Study location refers to whether the study was conducted within or outside the United States. Nature ofresearch refers to whether the experiment was conducted as a business or psychology investigation. Type of study refers to whether the study was correlational, quasi-experimental, or experimental in its methodology. *p<.05. **p<.01. nonsignificant, but given the obtained r of -.40, these failures to reach significance is almost certainly due to the overall small number of studies. The strength of experimental manipulations of power position affected the relation between power and other-evaluation, Z = 3.20, Bonferroni p < .01. Experiments more strongly manipulating power position reported larger negative effects of increased power on other-evaluation. Finally, studies of lower overall quality demonstrated larger effects of power on other-evaluation, Z = -2.63, Bonferroni p < .01. Due to the unknown nature of the relations existing between the moderator variables, the intercorrelations among moderator variables were computed (see Table 7). The resulting intercorrelations suggest that the significant moderating analyses are not redundant with one another and that the various coded moderators do indeed represent different constructs. Although many of the moderator variables are correlated with one another, only two of the correlations exceeded .60, and none are high enough to suggest that our results are attributable to multicollinearity among the moderators.3 Although the correlational analyses did not indicate multicollinearity among the sets of methodological and theoretical moderator variables, we conducted regression analyses to investigate further the possibility of multicollinearity. The set of significant theoretical and methodological moderating variables were included in the regression analyses for the other regression studies. Because only one self-evaluation moderator was significant, we did not conduct a regression analysis for the self-evaluation studies. Due to low power (the df',t for the other-evaluation model was 5), only one moderator effect, strength of manipulation, was statistically significant when analyzed with a regression approach. However, only one of the variables appeared to be multicollinear (see Table 6). The laboratory variable has a beta of -.45 when entered singly into a regression equation and a beta of -.05 when entered in a regression equation containing the other significant moderators. None of the other variables, however, cease to contribute to the prediction of effect size when analyzed in a regression including other significant moderators. Thus, this comprises further evidence that multicollinearity does not greatly affect our pattern of results. File Drawer Analysis Critics of meta-analysis often claim that its findings inflated because it summarizes mainly published studies that are more likely to be statistically significant and a large number of nonsignificant unpublished studies languishing in researchers' file drawers may exist that are not included in the meta-analysis. Taking into consideration the potential problem of sampling bias, file drawer analyses were conducted (Rosenthal, 1991). A fail-safe N, the number of unpublished studies with a Z of .00 that would have to exist before the overall level of significance of this review would be brought to p = .05, was calculated. For the analyses of the effect of power on self-evaluation, 375 additional unpublished studies would have to exist before the combined significance level of the studies was raised to p = .05. For the analyses of the effects of power on other-evaluation, 715 additional unpublished studies with a Z of .00 would have to exist before the combined significance level of the studies was raised to p = .05. Based on the previous analyses, the reliability of these meta-analytic findings appears unlikely to be affected by sampling bias. are Discussion The results of this meta-analysis clearly demonstrate that power affects evaluation. As evaluator power increases, performance ratings of others become increasingly negative and self-evaluations become increas- ingly positive. Therefore, regardless of target, evaluator power position is related to the outcomes of performance evaluations. Power has a roughly medium size effect on other-evaluations and a somewhat larger effect on self-evaluations. These effects are robust; the results of this meta-analysis indicate that power effects occur across different types of studies, exist in workplace and laboratory settings, occur in investigations 191 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 GEORGESEN & HARRIS conducted by members of various disciplines, and are not dependent on sample size. In addition, one should note that the fact that power effects occur across different types of studies also means that power effects occur both when power differences are real (as in correlational studies) and manipulated (as in experiments). Another question of interest addressed in this study was whether methodological variables explained existing differences in results between studies. For selfevaluation studies, no effect size differences across studies were attributable to the coded methodological variables. As with all self-evaluation analyses in this review, however, one should note that the small number of available studies severely limited our ability to detect significant effects of moderator variables on the power and self-evaluation relation. The effects of methodological variables on the results of power and other-evaluation studies are more wide ranging. Investigations of lower overall quality reported larger effects of power on the performance evaluations of others. One potential reason for this effect is that due to the small number of studies in both sets of studies, a few studies with relatively poor quality but large effect sizes may have disproportionally affected the study quality moderating analysis. As one can see from the data, included were several studies of relatively lesser quality that were associated with substantial effect sizes. Studies conducted in earlier years showed larger effects of power on other-evaluation. This finding may be partially explained by the relation between year of study and strength of manipulation (see Table 7). Earlier studies are associated with stronger power manipulations, and as seen earlier in this article, stronger manipulations of power are associated with larger effects of power on performance evaluations of others. The supplementary regression analyses we conducted seem to support this interpretation, whereas the effect of year of study is clearly not due solely to its covariance with strength of manipulation, its beta does decrease when included in a regression equation with strength of manipulation (see Table 6). Effect sizes associated with the power and otherevaluation relation also differ between research programs. Articles associated with Kipnis's research program (Kipnis, 1972; Kipnis et al., 1976, 1981; Wilkinson & Kipnis, 1978) show larger effects of power than does research published by other authors. One potential explanation of this difference is that the strength of Kipnis and colleagues's manipulations in experimental settings, the time of their studies, and their population choices for correlational investigations may have been especially appropriate for demonstrating power effects on other-evaluation. An example of this is one study (Kipnis et al., 1976) that examined housewives' evaluations and perceptions of their household maids. In this population during the time of this inves- tigation, very strong and societally prescribed power differences may have existed, which could have contributed to the size of the authors' demonstrated effect. This explanation is supported by the finding that this moderating effect of research program is mostly due to covariance between research program and the other significant moderating variables (see Table 6). This result suggests that whereas power effects occur in a variety of settings, they are especially strong in situations with extremely salient power differentials. Variables coded for theoretical reasons were weak moderators of the power-evaluation relation. Only one variable, study location, moderated the effects of power on self-evaluation. Studies outside the United States may have shown a larger positive effect of power on self-evaluation due to cultural differences involving self-evaluative processes. This effect should be interpreted with caution, however, due to the low number of studies involved in the power and self-evaluation moderating analyses. As mentioned, no variables with potential theoretical links to power moderated the effects of power on otherevaluation. Given the dearth of studies testing specific theories in this area, previous studies may not have measured the constructs necessary for discovering theoretical moderators of the power-evaluation relation. For example, no studies measured propensity of rater stereotyping, a variable moderating the power-evaluation relation in current theory (Fiske & Morling, 1996). One issue raised in considering the results of this meta-analysis is whether one can consider the relation between power and performance evaluation to be of a causal nature. By nature, meta-analysis is a correlational enterprise. Thus, attributing causality to phenomena on the basis of meta-analytic results is difficult. The only time drawing a firm causal inference would be appropriate is if all the studies in the meta-analysis were randomized experiments (Rosenthal, 1991). Given this study's mixture of experimental and correlational studies in this meta-analysis, we can only cautiously state that the relation between power and performance evaluations is a causal one. Several factors, however, support a causal inference. First, we found a reliable effect of power across all studies. In addition, this effect occurred across other-evaluation studies regardless of whether the study was correlational or experimental. Furthermore, in the studies that were experimental, we found no obvious confounds or fatal methodological flaws. Thus, we can tentatively state that the demonstrated relation between power and performance evaluation is causal.4 A second limitation the available data forces on this meta-analytic discussion of power is that whether power has a biasing effect on performance evaluation remains unclear. Individuals in positions of higher We acknowledge the editor's helpful suggestions concerning this line of reasoning. 192 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION power may give lower ratings because of negative stereotyping, power-devaluation phenomena, or both, or these lower ratings may occur because higher power individuals are more accurate in their performance rating. Unfortunately, in the available literature, no data concerning the accuracy of performance evaluations in situations involving raters of differing power levels exists. Knowing that self-serving biases often cause us to portray ourselves in an overly positive fashion (Fiske & Taylor, 1991), that power has a biasing effect on self-evaluation is plausible. Additionally, given the range of biases affecting our judgments of others (Fiske & Taylor, 1991), that power does indeed negatively bias our evaluations of others is also possible. Until accuracy data is collected in power-related investigations, however, it will remain unclear whether and to what extent power effects on evaluations truly involve an overly positive (self-evaluation) or negative (other-evaluation) bias. The relatively small number of studies in this metaanalysis demonstrates that power effects comprise a largely unexplored area with phenomena in need of further social psychological investigation. The data available on power and evaluation suggest some appropriate next steps. Before addressing broader topics, research should test the current competing power theories to demonstrate the degree of their usefulness in explaining the relation between power and evaluation. At this point, only speculation is possible in fitting the observed relation to the framework of the two previously discussed power theories. Even at this preliminary stage, however, several observations can be offered. Neither the power-stereotyping theory (Fiske, 1993; Fiske & Morling, 1996) nor the power-devaluation theory (Kipnis, 1972; Kipnis et al., 1976, 1981; Wilkinson & Kipnis, 1978) predict that as power levels increase, self-evaluations grow increasingly positive. This finding may best be explained with a third theory, the egocentric or self-serving bias theory (M. M. Harris and Schaubroeck, 1988), which states that individuals inflate their ratings to improve their evaluations and protect their employment. This theory could explain the results of this review with only slight modification. Individuals with higher power levels in their employment may have more invested in their jobs and derive greater benefits from maintaining their employment than someone of lower power in the same organization. Consequently, individuals in more powerful jobs may be more motivated to defend themselves and their employment by rating themselves even more positively than individuals in lower power roles rate themselves. In addition, to protect their position further, individuals in more powerful positions may attribute their failures to external factors. The results of this meta-analysis do, however, provide support for both of the existing power theories when the relation between power and other-evaluation is considered. As both theories suggest, increases in evaluator power level have negative effects on the evaluations of individuals possessing less power. It is not immediately obvious whether this effect occurs because of negative stereotyping by the supervisor or because of devaluation. The majority of the available research did not measure the mechanisms each theory deems important in moderating the power-evaluation relation. Also, further research is needed to rule out alternative explanations for the observed relation between increasing power and decreasing positivity of subordinate relationships. For example, this effect could potentially result from similarity-dissimilarity effects (Byrne, 1971), assuming that a power differential provides a relevant basis for perceiving dissimilarity. The increasing negativity of evaluation associated with increasing power could occur because of powerful raters perceiving subordinate others as more dissimilar to themselves and therefore evaluating subordinates less positively. Conversely, raters with less power might view the subordinate other as more like themselves, increasing the positivity of their evaluations. Available data, however, suggests that this is not the case. Two of the studies in this meta-analysis had individuals with low power rate individuals with higher levels of power (Stolte, 1978; Gundlach & Cadotte, 1994). Similarity-dissimilarity theory (Byrne, 1971) predicts that these subordinates would view higher power individuals as more dissimilar than lower power individuals and consequently evaluate them more negatively. This was not the case in either study. Low-power participants rated individuals with high power more favorably than they rated other individuals with low power, suggesting that similarity-dissimilarity theory does not have much explanatory value for the power-evaluation relation. Even considering other potential explanations, the best existing theories predicting this phenomenon seem to be the two addressed earlier, and future research in this area should focus on testing those theories within the framework of the reviewed power-evaluation effects. This review also has several useful practical implications. It demonstrates that across situations, evaluator power affects performance evaluation and is associated with either strongly positive or negative ratings depending on the target of the evaluation. Power differences may explain some of the variance in discrepancies between various raters in an evaluation process (M. M. Harris & Schaubroeck, 1988) suich as the low correlations observed between the self-ratings and supervisor ratings of the same individual. These results also suggest that the trend of using multiple raters in performance evaluation is soundly based and that the effect of power roles should be taken into consideration when the results of performance evaluations are used. Instead of relying only on the traditional supervisor ratings, performance evaluations should contain peer ratings as well as self-ratings. 193 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 GEORGESEN & HARRIS Given the probability that each type of rating is biased to a degree, soliciting multiple evaluations allows for a more accurate evaluation, much as having multiple measures of a construct in psychology is beneficial. Furthermore, performance evaluations by supervisors should be considered in light of power effects on otherevaluation and perhaps adjusted upward accordingly. Almost all of us are evaluated throughout the course of our careers. These evaluations may be perfunctory, or they may decide whether we remain employed. We would all like to believe that evaluations of our performance primarily reflect our abilities and accomplishments. This meta-analysis has shown, however, that the relative power of the person evaluating us also matters. Given the potentially enormous stakes involved, this finding merits our concern and further attention. References References marked with an asterisk indicate studies included in the meta-analysis. American Psychological Association. (1974-present). PsycLIT [CDROM]. Wellesley Hills, MA: SilverPlatter Information Services [Producer and Distributor]. *Assor, A. (1989). The power motive as an influence on the evaluation of high and low status persons. Journal of Research in Personality, 23, 55-69. *Brief, A. P., Aldag, R. J., & Russell, C. J. (1979). An analysis of power in a work setting. The Journal of Social Psychology, 109, 289-295. Budman, M., & Rice, B. (1994). The rating game. Across the Board, 31, 34-38. Byrne, D. (1971). The attraction paradigm. New York: Academic. Dahl, R. A. (1957). The concept of power. Behavioural Science, 2, 201-215. *Evans, B. K., & Fischer, D. G. (1992). A hierarchical model of participatory decision-making, job autonomy, and perceived control. Human Relations, 45, 1169-1189. Fiske, S. T. (1993). Controlling other people: The impact of power on stereotyping. American Psychologist, 48, 621-628. Fiske, S. T., & Morling, B. (1996). Stereotyping as a function of personal control motives and capacity constraints: The odd couple of power and anxiety. In R. M. Sorrentino & E. T. Higgins (Vol. Eds.), Handbook of motivation and cognition: Vol. 3. The interpersonal context (pp. 322-346). New York: Guilford. Fiske, S. T., & Taylor, S. E. (1991). Social cognition (2nd ed.). New York: McGraw-Hill. French, J. R. P., Jr., & Raven, B. H. (1959). The bases of social power. In D. Cartwright (Ed.), Studies in social power (pp. 150-167). Ann Arbor, MI: Institute for Social Research. *Georgesen, J. C., Harris, M. J., & Lightner, R. (1997). The balance ofpower: Interpersonal consequences of differential power and expectancies. Manuscript submitted for publication. Gioia, D. A., & Longenecker, C. 0. (1994). Delving into the dark side: The politics of executive appraisal. Organizational Dynamics, 22, 47-58. Goodwin, S. A., Fiske, S. T., & Yzerbyt, V. (1997). Power implicitly biases impression formation: Stereotyping subordinates by default and design. Manuscript submitted for publication. *Gundlach, G. T., & Cadotte, E. R. (1994). Exchange interdependence and interfirm interaction: Research in a simulated channel setting. Journal of Marketing Research, 31, 516-532. *Harari, H., Bujarski, R., Houlne, S., & Wullner, K. (1975). Student power and faculty evaluation. Journal of College Student Personnel, 8, 75-79. *Harris, M. J., Lightner, R. M., & Manolis, C. (in press). Awareness of power as a moderator of expectancy effects: Who's the boss around here? Basic and Applied Social Psychology. Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer-supervisor ratings. Personnel Psy- chology, 41, 43-62. Huston, T. L. (1983). Power. In H. H. Kelley, E. Berscheid, A. Christensen, J. H. Harvey, T. L. Huston, G. Levinger, E. McClintock, L. A. Peplau, & D. R. Peterson (Eds.), Close relationships (pp. 169-219). New York: Freeman. *Ilardi, B. C., Leone, D., Kasser, T., & Ryan, R. M. (1993). Employee and supervisor ratings of motivation: Main effects and discrepancies associated with job satisfaction and adjustment in a factory setting. Journal of Applied Social Psychology, 23, 1789-1805. *Ilgen, D. R., Peterson, R. B., Martin, B. A., & Boeschen, D. A. (1981). Supervisor and subordinate reactions to performance appraisal sessions. Organizational Behavior and Human Performance, 28, 311-330. Institute for Scientific Information. (1981-1998). Social Sciences Citation Index. Philadelphia, PA: Author. Kelley, H. H., & Thibaut, J. W. (1978). Interpersonal relations: A theory of interdependence. New York: Wiley. *Kipnis, D. (1972). Does power corrupt? Journal of Personality and Social Psychology, 24, 33-41. *Kipnis, D., Castell, P. J., Gergen, M., & Mauch, D. (1976). Metamorphic effects of power. Journal ofApplied Psychology, 61, 127-135. *Kipnis, D., Schmidt, S., Price, K., & Stitt, C. (1981). Why do I like thee: Is it your performance or my orders? Journal of Applied Psychology, 66, 324-328. Kleiman, L. S., Biderman, M. D., & Faley, R. H. (1987). An examination of employee perceptions of a subjective appraisal system. Journal of Business and Psychology, 2, 112-121. Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72-107. *Lightner, R. M., Burke, A., & Harris, M. J. (1997, May). The impact of distraction and need for cognition on interpersonal expectancy effects. Paper presented at the meeting of the Midwestern Psychological Association, Chicago, IL. Ng, S. H. (1980). The social psychology ofpawer. New York: Academic. *Pandey, J., & Singh, P. (1987). Effects of Machiavellianism, other-enhancement, and power-position on affect, power feeling, and evaluation of the ingratiator. The Journal of Psychology, 121, 287-300. Queen, V. A. (1995). Performance evaluation: Building blocks for credentialing and career advancement. Nursing Management, 26, 52-55. Raven, B. H. (1992). A power/interaction model of interpersonal influence: French and Raven thirty years later. Journal ofSocial Behavior and Personality, 7, 217-244. Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.). Thousand Oaks, CA: Sage. Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 118, 183-192. *Sachdev, I., & Bourhis, R. Y. (1985). Social categorization and power differentials in group relations. European Journal oJ Social Psychology, 15, 415-434. *Singh, P. (1994). Perception and reactions to inequity as a function of social comparison referents and hierarchical levels. Journal of Applied Social Psychology, 24, 557-565. *Stolte, J. F. (1978). Positional power and interpersonal evaluation in bargaining networks. Social Behavior and Personality, 6, 73-80. University Microfilms International. (1975-present). ABI Inform [CD-ROM]. Ann Arbor, MI: Author. *Wexley, K. N., & Snell, S. A. (1987). Managerial power: A neglected aspect of the performance appraisal interview. Journal of Business Research, 15, 45-54. 194 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016 POWER EFFECTS ON PERFORMANCE EVALUATION White, R. (1979). Performance enhancement by reciprocal accountability. Public Personnel Management, 8, 262-276. *Wilkinson, I., & Kipnis, D. (1978). Interfirm use of power. Journal ofApplied Psychology, 63, 315-320. 195 Downloaded from psr.sagepub.com at PENNSYLVANIA STATE UNIV on May 16, 2016