The Long-Run Stock Price Performance of Firms with Effective TQM Programs

This paper documents the long-run stock price performance of firms with effective Total Quality Management (TQM) programs. The winning of quality awards is used as a proxy for effective TQM implementation. The stock price performance of award winners is compared against various matched control groups for a five-year implementation period and a five-year post- implementation period. During the implementation period there is no difference in the stock price performance of award winners and firms in the various control groups. During the post-implementation period award winners significantly outperform firms in the various control groups. Depending on the control group used, the mean outperformance ranges from 38% to 46%.

1. Introduction

This paper provides evidence on the value of Total Quality Management (TQM) by documenting the long-run stock price performance of firms with effective TQM implementation. The winning of quality awards is used as the proxy that the firm has effectively implemented TQM. The results are based on a sample of nearly 600 firms who won their first award between 1983 and 1994. We measure stock price performance of award winners relative to benchmarks consisting of sets of matched control firms. Stock price performance is measured separately for a five-year implementation period and a five-year post-implementation period.

Two main issues motivate this paper. The first is the controversy between the popularity of TQM and the criticisms about TQM’s ability to improve firm performance. The past decade has witnessed a remarkable awareness of and growth in TQM practices. Surveys indicate that TQM initiatives and adoption rates are increasing (Haim (1993)). TQM has been and continues to be a key strategic initiative of many firms. Many organizations are becoming proactive in supporting TQM principles and philosophies by not only recognizing firms that have done an outstanding job in implementing TQM through quality awards, but also by promoting awareness of TQM practices. Recent statistics compiled by the National Institute of Standards and Technology of the U.S. Department of Commerce indicate that 44 out of 50 states in United States have their own quality award system, and about 40 new quality awards have been initiated outside of United States, many at the national level. Furthermore, many firms have initiated quality award systems for their suppliers.

Despite the widespread popularity of TQM, there is considerable skepticism about the value creation potential of TQM. For example, in a recent commentary on management paradigms in Business Week, Byrne (1997) proclaimed that “TQM is as dead as a pet rock”. Other negative articles on TQM have appeared with headlines such as “Is TQM Dead?” (USA Today (1995)), “The Straining of Quality” (The Economist (1995)), and “Total Quality is Termed Only Partial Success“ (Fuchsberg (1992)). The strategy literature has also questioned the ability of TQM to improve and sustain performance (Hayes and Pisano (1994) and Porter (1996)). Although, these articles report management perceptions about whether TQM is beneficial or not, they rarely provide objective data and statistical evidence to support their claims. The negative publicity about TQM may have caused firms to question the relation between TQM and financial performance (Ittner and Larcker (1996)). Proponents of TQM have reacted to the negative publicity by reiterating the few well-publicized success stories of TQM. Others have argued that the link between quality and financial performance is strong but hard to establish (Stratton (1993)).

This paper is also motivated by the need to set realistic expectations about the magnitude of value that TQM can deliver. Managers often pass judgment on the success and failure of their TQM implementations by comparing actual performance against prior expectations. Given that many have credited TQM as the primary reason for the economic success of Japan (Deming (1986), Imai (1986)), and also for restoring America’s economic competitiveness (Juran (1993)), it would not be surprising if expectations on what TQM can deliver are very high. When the high expectations are not met, managers can get disillusioned and view TQM as a failure. Unrealistic expectations may have also contributed to the negative publicity about TQM.

Evidence on the stock price performance of effective TQM implementers can shed light on the value of TQM, and help resolve the controversy and set realistic expectations. Existing empirical evidence linking TQM to stock price performance is limited. Early attempts have used event study methodology on samples of quality award winners. Hendricks and Singhal (1996) report that for a sample of 91 announcements of winning quality awards, the mean abnormal return on the announcement day is a statistically significant 0.64%. Adams et al. (1996) find marginally positive stock price response on the announcement day for the 12 publicly traded firms that won the Baldrige award during 1988 to 1994. Anderson et al. (1995) find that the mean abnormal return is insignificantly different from zero for a sample of 221 ISO 9000 certified firms. If these results are used as an estimate of the value of effectively implementing TQM, the conclusion would be that the value is fairly low. Such a low value seems hardly likely given the investments and efforts firms are making in implementing TQM. It is plausible that the market views the winning of quality awards somewhat cautiously, causing stock prices to adjust slowly over time as firms establish the benefits of their TQM programs, and/or that the market has already reflected the value of TQM before the announcement of winning quality awards.

There are a few studies that attempt to link TQM to the long-run stock price performance. In a sample of 91 quality award winners, Hendricks and Singhal (1996) find no evidence of long-term abnormal performance over a period starting from three years before and ending one year after the winning of awards. Easton and Jarrell (1998) examine the performance of 108 firms that seem to have made a serious and successful effort to implement TQM in the majority of their businesses. They find that median cumulative abnormal return is a statistically significant 16.05% at the end of year five. Finally, the National Institute of Standards of Technology (1999) reports that a buy-and-hold strategy of investing in publicly traded Baldrige award winners outperformed the S&P 500 by 2.6 to 1 (460 percent returns of winners against a 175 percent return for the S&P 500). The small sample size (about 20) and the lack of any statistical test and sensitivity analysis are limitations of this study.

Our study is different from the aforementioned long-term stock price studies on a number of important dimensions. First, our results are based on a sample of nearly 600 firms whereas the results of Easton and Jarrell (1998) and Hendricks and Singhal (1996) are based on a sample of about 100 firms. Second, Easton and Jarrell measure performance starting at the approximate time when their sample firms began serious effort to implement TQM. Given that it can easily take about three to five years to implement an effective TQM program, their results are indicative of what firms can expect to see during TQM implementation. In contrast, our study examines the performance both before and after effective implementation. Third, Easton and Jarrell and our study both use some form of assessments to identify firms that have effectively implemented TQM. Easton and Jarrell identify firms based on information gathered from interviews conducted by George Easton, a former senior examiner for the Baldrige award. Our sample of effective TQM implementers (award winners) are chosen based on assessments conducted by teams of examiners that are different across the different award giving organizations.

Finally, since examining the long-run stock price performance is not the primary focus of Easton and Jarrell (1998) and Hendricks and Singhal (1996), they do not investigate this issue in depth. More importantly, the results of both studies are based on cumulating daily abnormal returns. Conrad and Kaul (1993) show that cumulating short-run abnormal returns over long periods can have a positive bias due to the bid-ask spread. Barber and Lyon (1997) and Kothari and Warner (1997) document potential biases and misspecifications associated with using cumulative abnormal returns. They identify approaches that overcome these problems. The results of our study are based on these new approaches.

The evidence in this paper is presented from the perspective of firms that have made significant investment in TQM who seek to validate the value of their investments. To provide a balanced view about the value of TQM, we estimate the buy-and-hold abnormal returns (BHARs) separately for a five-year implementation period and five-year post implementation period, using the matched control firm methodology advocated by Barber and Lyon (1997). The evidence indicates that during the implementation period the abnormal returns are insignificantly different from zero. During the postimplementation period award winners significantly outperform firms in the various control groups. Depending on the control group used, the mean outperformance ranges from 38% to 46%. These results are robust when alternative benchmarks and methods are used.

Section 2 describes the sample, data, and methods. Section 3 presents the main results on BHARs. Section 4 explores the sensitivity of abnormal performance using alternative benchmarks and methods. Section 5 discusses the implications of our findings for the efficient market hypothesis. Section 6 concludes.

2. Sample, data, and methods

2.1 Sample selection

Our sample of award winners is from three sources. First, by identifying announcements of quality awards in the Wall Street Journal, PR Newswires, Business Wires, and Dow Jones News Service. Second, from lists of quality award winners published in monthly publications such as Automotive Engineering, Business Electronics, Distribution, and Ward’s Auto World. Third, by contacting a number of quality award givers for a list of their award winners. These award givers typically indicated the month or the quarter of the year when the award was given.

About 140 different award givers are represented in our sample, some of which are listed in Table 1. Many of the award givers are customers who have developed quality award systems for their suppliers. Award givers also include independent organizations such as the National Institute of Standards and Technology, National Association of Manufacturers, and about 30 different state award givers. A firm does not have to be a supplier to compete for awards given by independent organizations.

Table 1: Names of some quality award givers whose award recipients are included in the sample

Customers that give awards to their suppliersIndependent Award Givers
Auto Alliance International Inc. (Part of Mazda Motor Manufacturing)Alabama Senate Productivity & Quality Award
Chrysler Corp.Arizona’s Pioneer and Governor’s Award for Quality
Consolidated RailCalifornia Governor’s Golden State Quality Awards
Eastman Kodak Co.Connecticut Quality Improvement Award
Ford Motor Co.Delaware Quality Award
General Motors Corp.Florida Governor’s Sterling Award
General ElectricMassachusetts Quality Award
Goodyear TiresMaryland Senate Productivity Award
GTE Corp.Maine State Quality Award
Honda of America Manufacturing Inc.Michigan Quality Award
International Business MachinesMinnesota Quality Award
J. C. Penny & COMissouri Quality Award
Lockheed Corp.National Association of Manufacturers (The Shingo Prize)
Minnesota Mining and ManufacturingNational Institute of Standards and Technology (Baldrige Award)
National Aeronautical and Space AuthorityNorth Carolina Quality Leadership Award
New United Motor Manufacturing Inc. (NUMMI)New Mexico Quality Award
Toyota Motor Manufacturing U.S.A Inc.New York Governor’s Excelsior Award
Nissan Motor Manufacturing Corp. U.S.ANebraska Edgerton Quality Award
Pacific BellOklahoma Quality Award
Sears Roebuck & Co.Oregon Quality Award
Texas Instrument Co.Pennsylvania Quality Award
TRW Inc.Rhode Island Award for Competitiveness and Excellence
Xerox Corp.Texas Quality Award
Union CarbideTennessee Quality Award
WestinghouseVirginia Senate Productivity & Quality Award
WhirlpoolWashington State Quality Award

Overall our database consists of 3000 different firms that have won quality awards. We limit our study to firms that have stock price information available on the University of Chicago Center for Research in Security Prices (CRSP) daily tape. 665 out of the 3000 firms are listed on CRSP. We dropped 57 of these 665 firms from our analyses because these firms did not have any return during the 10 years surrounding the date of their first quality award and/or did not have data on CRSP and COMPUSTAT on measures such as market value of equity and book-to-market ratio that we need to identify control firms. Panel A of Table 2 presents the distribution of the year each of the 608 firms in our sample won their first quality. Panel B of Table 2 presents statistics for sample firms based on the closest fiscal year completed before or after winning the first quality award. The median observation represents a firm with market value of equity of $385.4 million. The sample has firms from nearly 200 distinct four-digit SIC codes (51 distinct two-digit SIC codes). Table 3 presents the distribution of the 608 sample firms across their primary two-digit SIC Codes. While a large number of industries are represented in the sample, 77% of the sample firms (470 out of 608) are in the manufacturing sector.

2.2 Establishing the time period of analysis

By determining the date when the sample firm won its first award we can at least establish a date when the sample firm had a reasonably effective TQM program in place. It can typically take the award giving organizations and their experts about 6 to 8 months or longer to evaluate and certify the effectiveness of the program. Therefore, it is reasonable to assume that firms had an effective TQM program in place before they won their first quality award. We assume that the program was effective about 12 months before the date when the firm won its first quality award. Examining the performance after this date provides an estimate of the value of these programs once they are effective. We examine performance for a five-year period after this date. This time period is referred to as the postimplementation period.

Unfortunately, the TQM literature does not provide much theoretical guidance on what should be the appropriate post-event time period. This is true not only for TQM but for many other long-term stock price studies that have appeared in the financial economics literature. We have based our choice of the length of the post-implementation period on time periods typically used in recent long-term stock price studies and earlier studies that examine long-term performance of TQM firms (Hendricks and Singhal (1997), and Easton and Jarrell (1998)). Most studies tend to use a four to five year post-event time period (Kothari and Warner (1997) and Fama (1998)). Our post-implementation period spans a four-year post-event period from the event of winning the first quality award. Furthermore, in the case of TQM, many have indicated that a long time period is needed to establish the link between TQM and financial performance because of the evolutionary rather than revolutionary nature of the changes associated with TQM ((Stratton (1993) and Garvin (1991)).

We also examine performance for a five-year period before the post-implementation period. We refer to this time period as the implementation period. There are two reasons for examining the performance during the implementation period. First, during this period firms are likely to vigorously implement TQM and incur various implementation costs. Hence, to provide a balanced perspective on the value of TQM, it is important to examine the stock price performance during the implementation period. A five-year implementation period is chosen because it can easily take 3 to 5 years to implement TQM effectively (King (1992), Hockman (1992), and the United States General Accounting Office (1991)). Secondly, in a recent paper Adams et al. (1996) show that abnormal stock price performance on the announcement day of Baldrige award winners has diminished over time. They conjecture that this could indicate that forward-looking stock analyses have already reflected the rewards of TQM before the formal announcement of winning awards. Examining the performance of award winners during the implementation period provides a direct test of this conjecture. The results for the implementation period could have implications on how efficient the market is in reflecting the value of TQM.

2.3 Defining event months

For each sample firm we identify the time the firm won its first quality award. If the time was a day (for example, January 4, 1990), then this day is the event day (or day 0 as it is commonly referred). If the time the firm won its first quality award was a particular month or quarter of the year, then we randomly chose a trading day in that month or quarter as the event date. The random event date should not be a serious concern given that we are measuring the long-run performance and the fact that the mean abnormal return on the announcement day was previously found to be quite small at around 0.60% (Hendricks and Singhal (1996)).

Event month 0 is defined as a 21-trading day interval consisting of the event day as the first trading day and the 20 trading days following the event day. The succeeding (preceding) event months are defined as successive 21-trading-day periods after (before) event month 0. We examine the stock price performance over a 121 month period (over 10 years) beginning 72 months before month 0 and ending 48 months after month 0. Following the usual conventions, these 121 months are numbered as event months -72, -71, ...., 0, +1, ...., +48. Since we assume that the TQM implementation is effective 12 months before the time of winning the first quality award, the 60 month time period from event months -72 to -13 is the implementation period and the 61 month period from event months -12 to + 48 is the post-implementation period.

2.4 Methods for computing abnormal returns

The basic idea in long-term stock price studies is to estimate abnormal returns for a sample of firms that have experienced the same kind of event, and then test the null hypothesis that the mean abnormal return over the period of interest is equal to zero. An abnormal return is the difference between the return on a stock and the return on an appropriate benchmark. Three methods are commonly used to compute abnormal returns: (1) buy-and-hold abnormal returns (BHARs), (2) the cumulative abnormal returns (CARs), and (3) mean monthly abnormal returns (MMARs). In the BHAR approach, the raw returns of the sample firm and its benchmark are first compounded across the periods. The abnormal return is then the difference between the compounded returns of the sample firm and its benchmark. In the CAR approach, abnormal returns are computed for each period (typically a month) and then summed across the periods. In the MMAR approach, the monthly abnormal returns are averaged over the period of interest. Various benchmarks can be used to compute the abnormal returns including sample of matched control firms, stock indices such as the S&P 500 and the CRSP market index, and asset pricing model such as the Capital Asset Pricing Model and the Fama-French three-factor model (1993).

There is considerable debate in the literature about the right approach and right benchmark for examining long-term returns. Barber and Lyons (1997), Kothari and Warner (1997), and Fama (1998) discuss the pros and cons of various approaches and benchmarks. BHARs capture the investor experience more accurately, while the other methods may give more reliable test statistics. As a starting point we focus on estimating BHARs using Barber and Lyon’s (1997) suggested approach of matching samples firms to control firms based on specific characteristics. Since our choice could be open to potential criticisms, we also examine the robustness of the conclusions from BHARs using other approaches and benchmarks.

2.5 Creating samples of control firms

We identify control firms based on factors such as size (the market value of equity), book-tomarket (BM) ratio, and industry (SIC codes). The importance of controlling for size and BM ratios in long-run abnormal stock returns is well accepted in the literature (see, for example, Banz (1981), Fama- French (1993)). Matching on industry controls for the effect of industry-wide factors on stock price performance.

We require that the control firm has at least the same amount of stock return data that the sample firm has during the 121-month period of interest. Thus, any survivorship bias present benefits the control firms. To reduce cross-sectional dependence, a control firm is used only once in a control group. Finally, to further control for any potential bias in the selection of control firms, three different control groups are considered: (1) an industry-matched group, (2) an industry-size-matched group, and (3) an industry-size-BM-matched group.

To generate the industry-matched control group, we first attempt to pair each sample firm with a control firm that has at least the same three-digit SIC code and is the closest in terms of the market value of equity at the year-end before the year of winning the first quality award.3 If a three-digit SIC code match is not found, we attempt to find a control firm with the same two-digit SIC code. Although the control firm closest in size is chosen, this may not guarantee a good size match.

To prevent any possible bias from extreme size mismatches, we generate the industry-sizematched control group. We first attempt to pair each sample firm with a control firm that has at least the same three-digit SIC code and is the closest in terms of the market value of equity, with the constraint that the difference in size between the larger firm and the smaller firm in a matched-pair is not less than 70% of the size of the larger firm in the matched-pair. If some sample firms are not matched after this stage, a second attempt is made to find a control firm that has at least the same two-digit SIC code within the same size factor. The 70% size factor is chosen to be “reasonable” and to generate the largest number of matches without being able to statistically distinguish the difference in size at conventional levels of significance.

To generate the industry-size-BM-matched control group, we first identify all firms with at least the same two-digit SIC code as that of the sample firm. From this set of firms we then chose as the control firm the one where the sum of the absolute percent difference between the size (the market value of equity) and the book-to-market ratios is the minimum.

Table 4 gives statistics to compare the characteristics of the 608 sample firms and the three control groups. We are able to match 507 (84%) firms in the industry-size-matched control group and nearly 98% of the firms in the other two control groups. The matching on book-to-market ratio is good in all control groups. The matching on size is good for the industry-size matched control group, but not good for the other control groups. The industry-matched and the industry-size-matched control groups give better industry matches than the industry-size-BM-matched control group. Although it is hard to get control samples that are perfect on all matching criteria, the three control samples together with other methods and benchmarks used should control for any potential biases.

3. Empirical results

3.1 Buy-and-hold returns

Table 5 compares the five-year buy-and-hold returns of the sample firms and the three control groups. Implementation (post-implementation) period includes all trading days that span event months - 72 to -13 (-12 to +48). If a sample firm does not have return data over the entire implementation (post-implementation) period, then its buy-and-hold return is computed using all available data during this period.

Implementation period results: Panel A of Table 5 presents the results for the implementation period. When the industry-matched control group is used as the benchmark, the mean buy-and-hold return for the sample is 99.33%. The corresponding mean buy-and-hold return for the control group is 92.58%. The mean and median of the paired-differences between sample and control firms are 6.75% and 2.01%, respectively. Both these values are insignificantly different from zero. 50.71% of the sample firms beat their respective controls which is insignificantly different from 50%. Similar conclusions are reached when the industry-size-matched and industry-size-BM-matched control groups are used as benchmarks. None of the parametric or non-parametric test statistics are significant at conventional levels. Basically, there is no difference in the stock price performance of the sample firms and the control firms during the implementation period. Furthermore, these results do not support the conjecture by Adams et al. (1996) that the value of TQM is reflected before the formal announcements of awards.

Post-implementation period results: Panel B of Table 5 indicates that the stock price performance results are very different in the post-implementation period. When the industry-matched control group is used as the benchmark, the mean buy-and-hold return for the sample is 117.66%. The corresponding mean buy-and-hold return for the control group is 79.82%. The mean of the paireddifferences between sample and control firms is 37.84% with t-statistic of 2.56 (significantly different from zero at the 2.5% level in a two-tailed test). The median of the paired-differences is 17.27%. The Wilcoxon-signed rank test’s Z-statistic of 3.91 indicates that the median is significantly different from zero at the 1% level. 59.25% of the sample firms beat their respective control firms. The sign-test’s Zstatistic indicates that this is significantly different from 50% at the 1% level.

The results using the other two control groups are similar. The mean (median) of the paireddifferences between the returns of the sample and the industry-size-matched control group is 46.32% (9.95%), and is 44.70% (20.21%) with the industry-size-BM-matched control group. All these differences are highly significant. The fraction of sample firms that beat their respective control firms in both these control samples is significantly higher than 50%. The basic conclusion is that during the postimplementation period, the sample firms significantly outperform firms in each of the three control groups.

The economic significance of the extent of outperformance during the post-implementation period can be judged in a number of different ways. First, by estimating how much more money would have to be invested in the control firms than the sample firms to have the same wealth at the end of the post-implementation period. For example, assume that $100 is equally invested among all the sample firms at the beginning of the post-implementation period. When the industry-matched control group is used as the benchmark, the mean buy-and-hold return of the sample firms for the post-implementation period is 117.66%. Thus, $100 invested in the sample firms grows to $217.66 by the end of the postimplementation period. Because the mean buy-and-hold return of the control firms for the postimplementation period is 79.82%, an investment of $121 is required to receive the same $217.66 (121*1.798). Thus, one would have to invest 21% more money in the control firms than in the sample firms to achieve the same wealth. One would have to invest about 26% more in the other two control groups than the sample firms to achieve the same wealth.

Another way to judge the economic significance is to take the ratio of the mean returns of the sample and the control group. Award winners outperform the industry-matched control group by about 1.5 to 1, and the other two control groups by 1.6 during the post-implementation period. Overall, the extent of outperformance is economically significant.

3.2 Analysis of potential biases during the post-implementation period

The check for the robustness of the statistically significant BHARs during the post-implementation period, we next explore to what extent the abnormal performance could be caused by potential sample selection and survivorship biases.

3.2.1 Sample selection bias

Although our review of the criteria used by various quality award givers clearly indicates that financial performance, and in particular stock price performance, is not an explicit considerations in choosing award winners, it is plausible that financial performance might be implicitly considered in choosing winners. To set an example and identify role models, award givers may have incentives to recognize firms that not only have a good TQM implementation but also demonstrate better financial performance. Thus, we may be biasing the post-implementation period’s results by including the time period before the award announcement.

To test this, we repeat our analyses by considering the year prior to winning the quality award as part of the implementation period instead of the post-implementation period. Table 6 presents these results. Panel A assumes a six-year (event months -72 to -1) implementation period. The results indicate that the mean BHARs are insignificantly different from zero, consistent with the results of Panel A of Table 5 assuming a five-year (event months -72 to -13) implementation period.

Panel B of Table 6 presents the results for a four-year (event months 0 to +48) postimplementation period. The mean BHARs range from 23% to 32% depending on the control sample used, and are significantly different from zero at the 5% level or better. The median abnormal returns are also positive and significantly different from zero at the 1% level or better. This suggests that including the year prior to winning the quality award (event months -12 to -1) in the post-implementation period is not driving this period’s results. Note that the results over event months 0 to +48 provide a more conservative estimate of the value of TQM. Furthermore, these results are indicative of the returns from an investment strategy of buying and holding the stocks of award winners from the time the awards are announced. A referee suggests that a more appropriate estimate of the returns available from such an investment strategy should exclude event month 0 as this month includes the return on the announcement day, which may not be available to an investor who invests after the announcement. Furthermore, since we could not establish an announcement day for some of our sample firms, we chose a random day over the period when the awards may have been given. Panel C of Table 6 gives the results from month +1 to +48. The mean and median BHARS are positive and statistically significant.

A second source of selection bias could result because of the relationship between award givers and winners. Our sample consists of firms that have won awards from independent organizations, and from customers that give awards to their supplier. In the case of independent awards there is no business relationship between the giver and winner. However, the business relationships between customers and its award winners could potentially bias the results. For example, if a customer has a strategy to reduce the number of suppliers, it may have an incentive to give more business than normal from the suppliers that have been cut to its award winners. While some may consider this to be a reward for implementing TQM, others may consider this to be an outcome of the supplier reduction strategy of the customer that may have nothing to do with TQM. Some insights on this issue can be obtained by segmenting our sample into firms that have only won awards from their customers and firms that have won independent awards.

Table 7 presents these results for independent and customer award winners separately for the three matched control samples. The results indicate that the mean and median BHARs for both independent and supplier award winners are positive and generally significant at the 10% level or better in two-tailed tests. Thus, both groups of award winners benefit from effective implementation of TQM. Although the results indicate that the average abnormal performance of the independent award winners is higher than that of the customer award winners, statistical tests indicate that the difference in performance between independent and customer award winners is not significantly different.

3.2.2 Survivorship bias

In computing the five-year BHARs, no constraints are imposed on the amount of data availability during the estimation period. If the sample firm has any data over the estimation period, its return is computed using all available data. Thus, the BHARs of a sample firm computed using 12 months of data gets the same weighting as that of a firm whose BHARs is computed using 60 months of data. This raises the possibility whether a small set of firms and/or the firms that did not survive during the post-implementation period could be driving the statistically significant results. To explore this issue further, we examine post-implementation period’s BHARS considering firms that got delisted during the this period, and firms that have data for a certain minimum number of months during the postimplementation period.

Delisting Bias: Delisting codes from CRSP tape indicate that during the post-implementation period 67 firms were delisted for various reasons (see Panel A of Table 8). Most of the delisted firms fall in Category 200 (mergers) and category 500 (delisted by exchange). Although a 500 delisting category does not imply bankruptcy and/or liquidation, often firms are delisted by exchanges because the firm’s capital or surplus or equity or assets are insufficient to meet the standards set by the exchange, or the price falls below an acceptable level. Such delistings could be potential indication of financial distress and some of these firms could declare bankruptcy or liquidate in the future.

Panel B of Table 8 gives the mean BHARs for all the delisted firms. Two of the control groups have negative BHARs (-2.02% and -10.06%), and the third has a slight BHAR(0.40%). The 67 delisted firms did not have much impact on the overall mean for our full sample.

We next examine the mean BHARs by delisting code category. Firms that delisted because they merged (category code 200) show positive BHARs of 29%, 23%, and 49% depending on the control group used. This is consistent with the literature that has documented that merged/acquired firms generally experience a run up in price when a merger/acquisition offer is made or closed. Given that the BHARs of the full sample are around 40%, mergers and acquisitions do not drive the results of the full sample.

On the other hand, firms that had a 500 delisting code (potential candidates for financial distress, bankruptcy, liquidation etc.) experienced large negative abnormal returns. Depending on the control group, the mean BHARs are -80.36%, -96.71%, and -138.36%. Given that BHARs of the full sample are around 40%, the impact of including these firms in our sample is to reduce the overall mean. Thus, BHARs for the post-implementation period are biased downwards because these firms are included in the sample. A priori, we have no reasons for excluding these firms. Finally, Panel C of Table 8 gives the mean BHARs excluding those firms that were delisted for any reason (no survivor ship bias). The BHARs are highly significant, and higher than the full sample (see Panel B of Table 5).

We also examined the post-implementation period’s BHARs for subsamples of firms that have data for at least 60, 48, and 36 months. Using the Industry-size-BM matched control group, the BHARs for firms with at least 60, 48, and 36 months are 52.11% (t-statistic of 3.54), 50.65% (tstatistic of 3.40), and 56.08% (t-statistic of 3.88), respectively. The results from the other two controls groups are similar.

To summarize, the above sensitivity analysis indicates that the post-implementation period’s BHARs are not driven by sample selection or survivorship biases.

4. Additional results for the post-implementation period from other benchmarks/methods

The results we have presented till now are BHARs using three different set of matched control firms as the benchmarks. As mentioned earlier, there is considerable debate about the best method for measuring long-term abnormal returns. To overcome any potential criticism of using the matched control firm method, we examine the sensitivity of our post-implementation period’s results to alternate benchmarks and methods including cumulative abnormal returns (CARs), monthly abnormal returns using the Fama-French three-factor model and calendar-time portfolio methods, and annual BHARs.

Alternate benchmarks: The use of control firms as a benchmark is always open to criticism with respect to how well the sample and control firms are matched. One way to deal with this criticism is to use various portfolios as benchmarks. We estimated post-implementation period’s BHARs using the S&P 500, and the CRSP value-weighted market index of firms traded on the NYSE, AMEX, and NASDAQ. Award winners outperformed the S&P 500 by a mean of 34.02% (t-statistic of 2.72) and the CRSP market index by 38.11% (t-statistic of 3.05).

Cumulative abnormal returns (CARs): We compute CARs using the matched control firms as benchmarks. Abnormal returns are calculated for each 21-day period for each sample firm, and averaged across all firms to give the mean monthly abnormal returns. CARs are computed by summing the mean monthly abnormal returns over the appropriate time period. The time series of CARs during the post-implementation period (event months -12 to +48) show a positive trend. The CARs at the end of month +48 using the industry-matched, industry-size-matched, and industry-size-BM-matched control samples are 16.64%, 18.92%, and 14.52%, respectively. Based on test-statistics that take into account the autocorrelation in the time series of CARs (Ritter (1991)), the CARs at the end of month +48 are significantly different from zero at the 1% level or better in two-tailed tests.

Fama-French three-factor model: Another commonly used approach is to estimate abnormal performance relative to an explicit asset-pricing model. We use the following three-factor model developed by Fama and French (1993):

Rjt – Rft = aj +bj (Rmt-Rft) + sj SMBt + hj HMLt + ejt (1)

where Rjt is the return on firm j, Rft is the return on one-month Treasury bills, Rmt is the return on the value-weighted market index, SMBt is the return on a value-weighted portfolio of small stocks less the return on the value-weighted portfolio of large stocks, and HMLt is the return on a value-weighted portfolio of high-book-to-market stocks less the return on the value-weighted portfolio of low book-tomarket. The intercept, aj, from the regression serves as an estimate of abnormal returns. A positive intercept indicates that after controlling for market, size, and book-to-market factors, the sample firm has performed better than expected by aj% per month.

We estimate the intercept for each firm in our sample from the three-factor regression equation using data over the post-implementation period (months –12 to +48). We require a minimum of 18 months of data to estimate the intercept. Based on 592 individual regressions, the mean intercept is 0.25% per month, with a t-statistic of 3.16 (significantly different from zero at the 1% level in two-tailed test). The median intercept is 0.23% per month and is significantly different from zero at the 1% level. Nearly 60% of the intercepts are positive.

Calendar-time portfolio methods: CARs and BHARs suffer from the problems of cross-sectional dependence among sample firms because the returns of sample firms could overlap in calendar time. For example, with a five-year post-implementation period there is significant overlap in calendar time of returns among firms that won their first quality award in 1989. This overlap could introduce crosscorrelations between returns, and thus affect the statistical inference. To deal with this, Fama (1998) and Lyons, Barber, and Tsai (1999) recommend using calendar-time portfolio methods. We consider two approaches: one based on the use of mean monthly calendar-time abnormal returns and the other based on the Fama-French three factor model.

For each calendar month t, we calculate the abnormal return for each firm that has month t as part of its post-implementation period. We then average the abnormal returns for month t for all firms that had this month in its post-implementation period to get the portfolio abnormal return for month t. The process is repeated every month. The time-series variation of the monthly portfolio abnormal returns accounts for the cross-correlations due to calendar-time overlap of returns among sample firms. The mean and variance of this time series can be used to test if the mean monthly abnormal return is zero. In applying this method, we compute abnormal returns based on the three matched control groups used for estimating BHARs. We require a minimum of 10 abnormal returns each month for that month to be included in the time series.

The results indicate that mean monthly abnormal returns using calendar-time portfolio method are positive and significantly different from zero. When the industry-matched control group is used as the benchmark, the mean monthly abnormal return is 0.50% per month, with t-statistic of 4.03. The median abnormal return is 0.37% per month. The results using the other two control groups are similar. The mean (median) abnormal return per month with the industry-size-matched control group is 0.40% (0.44%), and is 0.34% (0.35%) with the industry-size-BM-matched control group. The means and medians are significantly different from zero at the 2% level or better in two-tailed tests.

The process for computing portfolio returns for the Fama-French three-factor model is similar except that we now use the monthly raw returns of the sample firms instead of abnormal returns to compute calendar-time portfolio returns. This time-series of portfolio returns replaces Rjt in equation 1. The intercept from the regression serves as an estimate of the monthly abnormal returns. Using the Fama-French model, the intercept equals 0.30% per month, with t-statistic of 2.248. Based on the analysis of calendar-time portfolio returns, award winning firms experience significant positive abnormal performance during the post-implementation.

Compounding bias of buy-and-hold abnormal returns: Mitchell and Stafford (1998) point out that BHARs can give false impression of the extent of abnormal performance. The reason is that compounding can cause BHARs to grow with the time horizon even when there is no abnormal performance after the early periods. To test for this, Table 9 presents the mean annual BHARs for the post-implementation period. In computing these returns we rebalance the portfolio at the end of each year. The returns are generally positive across all years, and statistically significant for two years. These time periods span event months +25 to +36, and +37 to +48. Depending on the control sample, the mean abnormal returns range from 7.5% to 9.5% during event months +25 to +36, and from 10% to 14% event months +37 to +48.

Although BHARs are not driven by abnormal returns during the early part of the postimplementation period, the results do indicate that significant abnormal performance seems to occur in the later part of the post-implementation period. Thus, there seems to be a delayed market reaction to the event of winning quality awards. The extent of delay is less pronounced when indices such as CRSP market index and S&P 500 are used as benchmarks. For example, annual BHARs are positive and statistically significant during event months +13 to +24, +25 to +36, and +37 to +48.

To further explore the issue of delayed market reaction, we use the Fama-French three-factor model (equation 1), and the calendar-time portfolio methods to estimate abnormal returns during two post-implementation sub-periods: an early period (event months -12 to + 18), and a later period (event months +19 to +48).

Based on 581 individual regressions using equation 1, the mean intercept is 0.30% per month (t-statistic of 3.11) for the early part of the post-implementation period, and based on 560 individual regressions it is 0.24% per month (t-statistic of 2.57) for the later part of the post-implementation period. When the Fama-French model is used with calendar-time portfolio returns, the intercept is 0.31% per month (t-statistic of 2.15) for the early part of the post-implementation period, and is 0.40% per month (t-statistic of 2.65) for the later part of the post-implementation period.

In the case of the calendar-time abnormal portfolio returns method, the mean monthly abnormal returns for the early part of the post-implementation period are 0.47% (t-statistic of 3.16) using the industry-matched control group; 0.37% (t-statistic of 1.98) using the industry-size-matched control group; and 0.26% (t-statistic of 1.67) using the industry-size-BM-matched control group. The corresponding results for the later part of the post-implementation period are 0.49% (t-statistic of 2.76), 0.49% (t-statistic of 2.21), and 0.55% (t-statistic of 3.28). The mean monthly abnormal returns are positive and statistically significant in both the early and later periods of the post-implementation period.

Overall, the results indicate that although the magnitude of abnormal returns in the later period is somewhat higher when compared to the early period, the Fama-French model and calendar time portfolio methods indicate that both periods show evidence of statistically significant positive abnormal performance. Thus, the support for delayed market reaction is not consistent across different methods for computing abnormal performance.

5. Discussion of the conflict of the evidence with market efficiency

The significant positive abnormal returns during the post-implementation period are in conflict with market efficiency. The results indicate that the market underestimates the efficiency gains from TQM, and underreacts to the information conveyed by winning quality awards. We discuss a number of possible explanations to analyze this conflict.

One way to resolve the conflict with market efficiency is to estimate abnormal performance using different methodologies on different samples across different time periods, and look for consistencies. Our results are consistent with the results from the few studies that examine performance of TQM firms using objective financial data. Based on a sample of 108 firms, Easton and Jarrell (1998) report that median cumulative abnormal return is a statistically significant 16.05% at the end of five years. Although there are some major differences between their study and this study, the fact that both studies report positive abnormal performance reduces the probability that the results are purely by chance. More importantly, the long-term positive abnormal returns documented in Easton and Jarrell’s study and this study are consistent with improvement in operating performance documented in two recent studies. Easton and Jarrell (1998) report positive and statistically significant abnormal performance in income based measures in their sample of TQM firms. Hendricks and Singhal (1997) find that in their sample of quality award winners the mean and median control-adjusted changes in operating income during the post-award period are positive and statistically significant. Thus, positive abnormal returns are associated with improved operating performance.

From a market efficiency perspective, the question that remains is why does the market not capitalize the improved operating performance once it knows about effective TQM implementation. Easton and Jarrell (1998) argue that since TQM is a recent management phenomenon and rigorous evidence about the impact of TQM is just starting to emerge, markets may have limited basis and experience in assessing TQM’s impact on performance. This may have been further compounded by the controversy surrounding TQM. In the 1980s TQM was introduced in United States with much fanfare and promise of what it can deliver and how it can improve performance. Since then there has been considerable controversy and debate about TQM, with claims and counter claims about the failure rate of these programs and the value created by these programs. This debate could have created uncertainty in the market on how to evaluate TQM.

The TQM event is different from other events such mergers and acquisitions, corporate restructuring etc. that have been the focus of long-term performance evaluation in many studies. Unlike traditional events which are deployed at a discrete point of time, the deployment and evolution of TQM occurs over several years. The specific approaches and how systematically these approaches are used to implement TQM are important in judging its effectiveness. For example, the Baldrige criteria evaluates the approach and deployment of applicants in areas dealing with leadership, customer focus, process management, and employee involvement and development. Information on such aspects of implementing TQM is not generally available publicly, making it difficulty for the market to assess the value of TQM. Although we use the winning of quality awards as a proxy for effective TQM implementation, it is plausible that market may not have considered this as a credible signal about effectiveness, as there is controversy surrounding the awards itself (see, for example, Garvin (1991)). The market is likely to wait for some signal that TQM has positively impacted earnings. Thus, it is plausible to expect that the market reaction to quality awards would be better measured over longer time horizons.

The theory of TQM suggests that focus on customer satisfaction, employee empowerment, and continuous improvement would lead to improvement in non-financial measures of performance (for example cycle time, defect rates, flexibility, and the ability to be innovative and responsive), which in turn should affect accounting and stock price performance. However, at the time of the TQM event, it is hard to predict how improvement in non-financial measures of performance would evolve in the future, and what impact this would have on cash flows. Under the TQM philosophy of continuous improvement and incremental change, it may take some time before the cumulative impact of these activities can become significant. Thus, it is not unreasonable to expect that the impact of TQM would be incorporated in stock prices over a longer time period.

If limited experience in evaluating TQM, controversy, and skepticism about the value of TQM are possible reasons for observing long-term abnormal performance, one would expect that the influence of these factors would diminish over time as the market learns more about TQM. Thus, one way to analyze the conflict between our results and market efficiency would be to examine the abnormal performance for early and later award winners. Table 10 presents the results obtained by segmenting the sample into firms that won their first quality award in 1980s (1989 or earlier) and those that won on the 1990s (1990 and later). Table 10 reports the 5-year BHARs for the post-implementation period for all the three matched control groups. The mean BHARs for the subsample representing 1980s award winners and the subsample 1990s award winners are positive and statistically significant. In two of the three control samples the firms that won their first awards in the 1980s do better than the firms that won their first award in the 1990s, and in one control sample it is the reverse. Although the results indicate that the average abnormal performance of the early award winners is higher and have stronger statistical significance than that of the later award winners, tests indicate that the difference in performance of the early and later award winners is not statistically significant. Our interpretation is that the market is still slow to respond to TQM benefits.

The persistence of abnormal returns is puzzling. Future research and extensions of our sample and time period of analysis will shed more light on whether positive abnormal returns are due to gradual learning by the market, anomaly, or inappropriate methodology. We would like to note that the robustness of the positive abnormal performance during the post-implementation period has been evaluated using a number of different benchmarks and methods. Nonetheless, it is still plausible that the conflict of our results with market efficiency could still simply be because of test misspecification or badmodel problem.

6. Summary

This study has examined the long-run stock price performance of firms with effective TQM programs. During the implementation period we do not find any significant difference in the stock price performance of effective TQM implementers and the various groups of matched control firms. During the post-implementation period we find that the sample of effective TQM implementers significantly outperform the various matched control groups. Depending on the control group used, the mean BHARs range from 38% to 46%. This level of outperformance is economically significant resulting in substantial wealth creation for the shareholders.

The results of this paper add to emerging literature on the positive impact of TQM. Effective implementation of TQM principles and philosophies does lead to improvement in long-term financial performance. Our results should alleviate some of the concerns regarding the value of quality award systems. Overall, these systems are valuable in terms of recognizing TQM firms, and promoting awareness of TQM.

Acknowledgements

We thank four reviewers for their constructive comments and suggestions.

References

ADAMS, G., G. MCQUEEN and K. SEAWRIGHT. 1996. “Quality Awards and Stock Prices: A Microanalysis”. Working Paper, Marriott School of Management, Brigham Young University, Provo, UT.

ANDERSON, S. A., J. D. DALY and M. F. JOHNSON. 1995. “The Value of Management Control Systems: Evidence on the Market Reaction to ISO 9000 Quality Assurance Certification”. Working Paper, School of Business, The University of Michigan, Ann Arbor, MI.

BANZ, R. W. 1981. “The Relationship Between Return and the Market Value of Common Stocks“. Journal of Financial Economics, 9, 3-18.

BARBER, B. M. and J. D. LYON. 1997. “Detecting Long-Run Abnormal Stock Returns: The Empirical Power and Specification of Test-Statistics”. Journal of Financial Economics, 43, 341-372.

BENSON, P. G., J. V. SARAPH and R. G. SCHROEDER. 1991. “The Effects of Organizational Context on Quality Management: An Empirical Investigation“. Management Science, 37:9, 1107-1124.

BYRNE, J. A. June 23, 1997. “Management Theory-or Fad of the Month?”, Business Week, 47.

CONRAD, J. and G. Kaul. 1993. “Long-term Market Overreaction or Biases in Computed Returns“. Journal of Finance, 48, 39-63.

DEMING, E. W. 1986. Out of Crisis. MIT Center for Advanced Engineering, Cambridge, MA.

Easton, G. S. 1993. “The 1993 State of US Total Quality Management: A Baldrige Examiners Perspective”, California Management Review, 35: 3, 32-54.

EASTON, G. S. and S. L. Jarrell. 1998. “The Effects of Total Quality Management on Corporate Performance: An Empirical Investigation“. Journal of Business, 71: 2, 253-307.

FAMA, E. F. and K. FRENCH. 1993. “Common Risk Factors in Returns on Stocks and Bonds”. Journal of Financial Economics, 33, 3-56.

FAMA, E. F. 1998. “Market Efficiency, long-term returns, and Behavioral Finance”. Journal of Financial Economics, 49, 283-306.

FLYNN, B. B., R. G. SCHROEDER, and S. SAKAKIBARA. 1995. “The Impact of Quality Management Practices on Performance and Competitive Advantage“. Decision Sciences, 26:5, 659- 691.

FUCHBERG, G. October 1, 1992. “Total Quality is Termed Only Partial Success“. Wall Street Journal, B1.

GARVIN, D. A. 1991 “How the Baldrige Award Really Works“. Harvard Business Review, 69:6, 80- 94.

GHOSH, S., R. B. HANDFIELD, and R. CALATONE. 1999. “A Structural Model Analysis of the Malcolm Baldrige National Quality Award Framework,” Working paper, Georgia Institute of Technology.

HAIM, A. 1993. “Does Quality Work? A Review of Relevant Studies“. Conference Board, Report Number 1043, New York.

HAYES, R. H., and G. P. PISANO. 1994. “Beyond World Class: The New Manufacturing Strategy,” Harvard Business Review, January- February, 77-86.

HENDRICKS, K. B. and V. R. SINGHAL. 1996. “Quality Awards and the Market Value of the Firm: An Empirical Investigation“. Management Science, 42:3, 415-436.

HENDRICKS, K. B. and V. R. SINGHAL. 1997. “Does Implementing an Effective TQM Program Actually Improve Operating Performance? Empirical Evidence From Firms that Have Won Quality Awards“. Management Science, 43:9, 1258-1274.

HENDRICKS, K. B. and V. R. SINGHAL. 1998. “Firm Characteristics, Effective TQM Programs, and Operating Performance: An Empirical Investigation“. Working paper, College of William and Mary and Georgia Institute of Technology.

HERTZ, H. S. 1997. “The Criteria: A looking Glass to Americans’ Understanding of Quality”. Quality Progress, 30:6, 46-48.

HOCKMAN, K. K. 1992. “Does the Baldrige Award Really Work“. Harvard Business Review, 70:1 137.

Imai. M. 1986. Kaizen: The Key to Japan’s Competitive Success, McGraw-Hill, New York, NY.

ITTNER, C. D. and D. F. LARCKER. 1996. “Measuring the Impact of Quality Initiatives on Firm Financial Performance“. In Advances in Management of Organization Quality, Vol. 1, edited by D. F. Fedor and S. Ghosh, JAI Press, 1-37.

JURAN, J. 1993. “Made In U.S.A.: A Renaissance in Quality“ Harvard Business Review, 71:4, 42-50.

KING, R. October 26, 1992. “Using Total Quality Management to Improve Bottom-Line Results”. GOAL/QPC 9th Annual Conference Boston, MA.

KOTHARI, S. P. and J. B. WARNER. 1996. “Measuring Long-Horizon Security Price Performance”. Journal of Financial Economics, 43, 301-339.

LYONS, J. D., B. M. BARBER, and C. TSAI. 1999. “Improved Tests of Long-run Abnormal Stock Returns”. Journal of Finance, 54, 165-201.

MITCHELL, M. L., and E. Stafford. 1998. “Managerial Decisions and Long-term Stock Performance”, Working paper, Graduate School of Business, University of Chicago.

NATIONAL INSTITUTE OF TECHNOLOGY AND STANDARDS. 1999. “Baldrige Index Outperforms S&P 500 for Fifth Year,” Press release, NIST 99-02, February 4, 1999, Washington DC.

PORTER, M. E. 1996. “What is Strategy?,” Harvard Business Review, November-December, 61-78.

POWELL, T. C. 1995. “Total Quality Management as Competitive Advantage: A Review and Empirical Study”. Strategic Management Journal, 16, 15-37.

RITTER, J. R 1991. “The Long-Run Performance of Initial Public Offerings”, Journal of Finance, 46, 3-27.

STRATTON, B. February 1993. “Why You Can’t Link Quality Improvement to Financial Performance”. Quality Progress, 5.

THE ECONOMIST. January 14, 1995. “The Straining of Quality“. 55-56.

UNITED STATES GENERAL ACCOUNTING OFFICE. 1991. “Management Practices, U.S. Companies Improve Performance Through Quality Efforts“. GAO/NSIAD-91-190, Washington DC.

USA TODAY. October 17, 1995. “Is TQM Dead.” B1-B2.



Presentation
Consulting
Books
Products
Free Services
   
 
Feedback
Sitemap
Caddie
Quickscan
QM Scan
Articles
Search articles
Display article
Printable version NL FR EN