CHINA JOURNAL OF ACCOUNTING STUDIES 2019, VOL. 7, NO. 3, 407–437 https://doi.org/10.1080/21697213.2019.1701259 ARTICLE a b c Wei Xu , Zhenye Yao and Donghua Chen a b School of Accounting, Nanjing University of Finance and Economics, Nanjing, China; School of Management, Shandong University, Jinan, China; School of Business, Nanjing University, Nanjing, China ABSTRACT KEYWORDS Chinese annual reports; Among existing researches, readability is an important research readability; measure; test direction as a text feature of annual reports. But almost all relevant studies are based on English. In fact, readability in Chinese environ- ment may be a problem worth studying from the perspective of motive and consequence. Accordingly, based on the analysis of the readability indexes of Chinese and English annual reports, this paper constructs three readability indexes and conducts empirical tests. Results show the average word number of each clause, the proportion of adverbs and conjunctions in each sentence, and the arithmetic mean of them are probably a reasonable set of measures. As measures aforementioned, they better meet theoretical expecta- tion and are consistent with relevant research results. Further examinations of other text features found that length of text and count of numbers and tables may not be good measures. Work in this paper provide some basic reference for subsequent researches. 1. Introduction Annual report is a critical part of the current information disclosure mechanism. Since Ball and Brown (1968), researchers have noticed its great inﬂuence on the capital market. Subsequent researches generally follow two paths. One focuses on disclosed contents, such as management discussion (Xue, Xiao, & Pan, 2010), social responsibility report (Wang, Yu, & An, 2014), internal control report (Hammersley, Myers, & Shakespeare, 2008), etc. These studies aim at exploring the impact of the disclosure on the capital market and the determinants of the content. The others attend more to its linguistic features, e.g. the inﬂuence of text readability (Li, 2008) and text intonation (Xie & Lin, 2015) on information reception, and explores the origins for these linguistic features. Annual report readability has long been discussed. Jones and Shoemaker (1994) have noticed the large inﬂuence of the accounting report readability. Li (2008) introduces a readability index in the English linguistics into the empirical research of annual report, which lays a hands-on technical foundation for subsequent related research and pro- motes the empirical research of text readabizlity. Although there are still some studies that put forward new English readability measurements (such as Loughran & McDonald, 2014), Fog Index becomes the mainstream measure of English readability in the existing CONTACT Zhenye Yao firstname.lastname@example.org Department of Accounting, School of Management, Shandong University, Jinan 250100, China Paper accepted by Kangtao Ye. © 2019 Accounting Society of China 408 W. XU ET AL. empirical studies. Using Fog Index, subsequent research indicates that readability can aﬀect investors’ understanding of the disclosed information, which can signiﬁcantly inﬂuence the capital market (Franco, Hope, Vyas, & Zhou, 2015; Lee, 2012; Lehavy, Li, & Merkley, 2011). Therefore, text readability becomes a substantial topic in the research of information disclosure. Correspondingly, the annual report readability in the Chinese context may also be a crucial issue. Firstly, it is diﬃcult to read the Chinese annual report text due to its large text size. Statistics suggests the average length of annual report of A-share listed compa- nies in 2014 is 160 pages, including 78,000 Chinese characters, 6,000 ﬁgures and 242 tables. Sun (2004a) ﬁnds that these reports are also quite diﬃcult to read, which only 39% of non-professionals and 55% of semi-professionals can understand the contents. Secondly, theoretically speaking, only the information is eﬃciently transmitted then investors have the motivation and ability to fully understand it. (Hu, Rao, Chen, & Li, 2003; Yan & Sun, 2002). Text readability aﬀects the extent to which investors receive information and further their judgements (Tan, Wang, & Zhou, 2015), hence it has an impact on the information transmission of the annual report. Thirdly, studies have proven that lower readability may be the result of management manipulation (Li, 2008;Li & Zhang, 2015; Lo, Ramos, & Rogo, 2017). Compared with the mature Western economies, the earnings manipulation of the management in listed companies is a more serious problem in China (Wang, Wu, & Bai, 2005). Listed companies also interfere investors with learning the real situation by managing the text impression (Cheng, Liu, & Cheng, 2015; Sun, 2004b). Finally, compared with Western mature economies, China’s capital market has a larger proportion of retail investors (Xu & Hou, 2012) with lower maturity (Xu, Hong, Wu, & Xu, 2011). Retail investors are far less professional in ﬁnance than that of institu- tional investors and sophisticated investors, so they may be more vulnerable to the readability and cannot understand the annual report information timely and accurately. All of these show that the readability of annual report is a worthy issue of China’s capital market, not only from the objective situation of the annual report, but also from the motivation of management to manipulate information through readability, or from the possible perspective that investors may be aﬀected by readability. However, little atten- tion has been payed to the Chinese annual report’s readability in empirical studies, which may be resulted from there is no generally accepted measuring of Chinese readability. Therefore, this paper ﬁrst analyzes the existing readability indexes, then refer to the relevant research of Chinese linguistics, and puts forward a more concise and hand-on readability index, which is further tested in terms of its cause and result. Speciﬁcally, we employ a Chinese readability index that consists of two parts. The ﬁrst part is the average word number in each clause (readability1), and the second part is the proportion of adverbs and conjunctions in each sentence (readability2). Then, the two indexes are combined (readability 3) by referring to Fog Index. At the same time, con- sidering that some studies take the text length to measure English readability, we also test the text length of the annual report as an alternative readability index. Existing research suggests that the annual report readability may be the result of intentional management manipulation, which represents poor earnings quality (Li, 2008; Li & Zhang, 2015; Lo et al., 2017). On the other hand, the poor readability makes it diﬃcult for investors to under- stand the information, leading to insuﬃcient market response to annual report (Lee, 2012; Lehavy et al., 2011). Therefore, the indexes above are tested from these two perspectives. CHINA JOURNAL OF ACCOUNTING STUDIES 409 The empirical results indicate that the larger the average word number in each clause and the higher the proportion of adverbs and conjunctions in each sentence, the worse the earnings persistence, the smaller earnings response coeﬃcient (ERC) of annual report disclosure, little accuracy of analyst forecasts after disclosure, and the longer the forecasts updating time. The same results appear when the combination of the two indexes is taken as the measurement of readability. These ﬁndings are in line with the theoretical expecta- tion of the annual report readability and the conclusions of the existing researches, which suggests that the index we constructed is a feasible measurement of the Chinese annual report readability. But, the test of the text length does not meet the expectation and is diﬀerent from the empirical results of previous English annual report. Other possible measurements of readability are further tested, including the number of ﬁgures and tables in the annual report, which do not satisfy the expectation. Thus, we construct a measurement of the readability in the Chinese context, which not only lays a foundation for further research on the Chinese annual report readability, but also preliminarily conﬁrms the information function of the readability. The rest of this paper is arranged as follows: the second part focuses reviews the literature on readability; the third part analyzes the existing readability indexes and further proposes Chinese readability indexes; the fourth part presents the research design; the ﬁfth part is the tests of readability indexes and the last one is the summary. 2. Literature review Text readability has long been discussed. Soper and Dolphin (1964) have noticed that the annual report readability is a critical factor to attract readers by making it more under- standable. Li (2008) ﬁrst introduces the concept of readability into empirical researches. He adopts Fog Index and text length as indexes to measure readability and studies the relationship between English annual report readability and accounting earnings, which takes the perspective of that the management has the motivation to obscure bad accounting earnings information. He speculates that the company provides poor annual report readability when it has poor earnings. Therefore, the low readability possibly means poor earnings quality, which is conﬁrmed by the empirical results, i.e. there is a negative correlation between accounting earnings and readability. From the perspec- tive of motivation and performance, it explains why the readability varies among diﬀerent companies. Li and Zhang (2015) extend the perspective from management manipulation of readability to short selling. They excepted that the company will lower the readability when disclosing bad news under the pressure of short selling, so that investors could not accurately identify the bad news and the company can avoid the stock being short sold. Using Fog Index and the text length as the readability measurements and the natural experiment of the U.S. stock market to relax the short selling limit, they obtain the empirical results consistent with the expectation. Lo et al. (2017) carry on and focus on the management discussion and analysis of the annual report to test the correlation between earnings management and the readability. They found that poor readability appears when the company has zero or small positive earnings, small positive earnings but positive accruals or real earnings management, the company has ﬁnancial report restatement or misstatement. It extended Li’s research and further veriﬁes the motivation of readability and its relationship with accounting earnings. Those studies have similar 410 W. XU ET AL. ﬁndings from diﬀerent perspectives, that is a ﬁrm may confuse investors by manipulating readability when it has poor earnings. This phenomenon not only reveals the reasons for the diﬀerences in readability but also implies a same logical assumption that investors are aﬀected by readability and lack response to the text information with poor readability. Whether this hypothesis is true is also a focus, which leads to another research direction of readability. Tan et al. (2015) suggest that readability aﬀects readers’ judge- ment on information. Similarly, You and Zhang (2009) test the information eﬀect of the readability and ﬁnd that stock price drift is more serious within 12 months of annual report disclosure with poor readability. This result implies that investors cannot timely understand the information with poor readability, so the stock price reﬂects the annual report information slowly over a long period. Taking Fog Index and text length as the proxy variables of readability, Lee (2012) continues this work and directly tests the impact of the quarterly report readability on market eﬃciency. She suggests that a more readable quarterly report can facilitate investors’ understanding, so that the earnings information can enter the stock price faster and reduce the stock price drift after earnings announce- ment. Loughran and McDonald (2014) adopt the computer storage size of ﬁnancial reports to measure readability. They ﬁnd that ﬁnancial reports with high readability can make the company’s value more accurate and reduce information asymmetry, which is manifested in lower stock price volatility and smaller analyst forecast error after the disclosure of ﬁnancial reports. As far as the capital market is concerned, these studies suggest that the annual report (or the quarterly report) text readability can aﬀect investors’ understanding on informa- tion. Poor readability interferes investors in understanding information accurately and timely, resulting in insuﬃcient response in a short period and persistent response in the long run. Can sophisticated investors overcome the obstacle of readability to accurately understand information? Lehavy et al. (2011) answer this question by taking test the relevance between readability and analysts. The results indicate that it is diﬃcult for ordinary investors to understand the annual report with poor readability, which increases the demand for analysts. But even analysts may be interfered by the text readability so that they may make inaccurate forecasts for companies with poor readability and spend longer time updating forecasts after the disclosure. Meanwhile, poor readability also causes greater forecast errors and uncertainty. Qiu, Zheng, and Deng (2016) study similar issues from the perspective of Chinese annual report. They ﬁrst build a measurement of the complexity of the Chinese annual report text, which suggests that the more complex the text, the more often the analysts will track the information, showing the market demands for the interpretation for the complex text. However, they do not ﬁnd an obvious correlation between text complexity, forecast quality and information content. Text complexity and readability are a group of similar concepts. There is no doubt that the research is an important attempt to study Chinese text readability. But in terms of the results, they do not prove a close relationship between the Chinese annual report complexity and analyst forecast, which seems to be inconsistent with the English research and is not further explained. In addition to ﬁnancial reports, the inﬂuence of text read- ability also exists in other information documents. Franco et al. (2015) test the conse- quences of analyst reports’ readability and ﬁnd that those with better readability can get more attention and attract more investors to participate in the transaction after being released. CHINA JOURNAL OF ACCOUNTING STUDIES 411 In short, existing studies prove that the readability aﬀects investors’ information reception and understanding, leading to various market phenomena. Because of this mechanism, the management has the motivation to manipulate the readability to obscure the information. Poor readability indicates low earnings information quality. This logic has been well veriﬁed in the existing researches based on the English annual report readability. In this paper, we follow the logical as basis line to test whether our Chinese annual report readability indexes are appropriate. 3. Readability indexes We ﬁrst make a brief analysis and review of the existing readability indexes, and then put forward a more concise Chinese annual report readability index. Since Li (2008), Fog Index has been widely applied as a measurement of the English text readability. Fog Index origins from Gunning’s(1952) research on English linguistics, which consists of two parts. The ﬁrst part is the word number in each sentence, and the second is the proportion of complex words in a sentence. Fog Index = 0.4*(word number in each sentence + proportion of complex words in each sentence) Word number of each sentence is a direct index with less dispute. Complex words refer to the word containing letters more than a certain number. This index covers both sentence and word and is reasonable in the English context to some extent. But it is challenged by whether it is appropriate to set ‘the proportion of complex words in each sentence’ as a part of the readability index. As Loughran and McDonald (2014) said, words with lots of letters are not necessarily complex words. As some long words have been widely used for a long time, they become no longer complex. The results also suggest that the regression between the proportion of complex words in each sentence and informa- tion eﬃciency is not ideal, so it may not be a desirable readability index. Moreover, we cannot directly use the measurement in the Chinese context. As an alphabetical language, the number of letters in an English word can be used to measure the word complexity while it is not applicable in Chines since it is a pictograph language. Text length is often used to measure readability as well. But the challenge is that text length is often associated with information richness. A longer text length means that the company discloses more information in annual reports, which may lead to logical incon- sistencies in relevant studies. The existing studies suggest that more voluntary disclosure information often means better information transparency and reﬂects a better corporate governance (Eng & Mak, 2003). Francis, Nanda, and Olsson (2010) believe that the enterprise with better earnings quality makes more voluntary information disclosure, which will naturally increase the length and words. But if the text length is employed to measure readability, the increase in the word number is a sign of low earnings quality (Lo et al., 2017). So, it suggests that text length can be caused by multiple factors, not only by readability manipulation but also the more disclosure in the annual report. From the perspective of market consequences, disclosing more information often increases the information content and investors can understand the company more accurately. For example, Dhaliwal, Radhakrishnan, Tsang, and Yang (2012) ﬁnd that the disclosure of social responsibility information transmits more information to the market, which will improve the accuracy of analyst forecast on the company’s future earnings. However, 412 W. XU ET AL. ceteris paribus, incorporating the social responsibility information into the annual report naturally leads to a longer text length. If text length is regarded as the readability, the result is obviously inconsistent with Lehavy et al. (2011). Therefore, from the perspective of motivation and consequence, we ﬁnd that the longer text length may be the result of multiple motivations, whose eﬀect may come from diﬀerent directions. From this point of view, there may be some defects in taking text length to measure readability. Besides, it tends to intensify the endogeneity of relevant researches. For example, Li (2008) associate readability with the management’s reporting bad news. However, bad news often needs more words to explain, so the longer text length cannot be simply regarded as the results of the management’s attempt to obscure information (Bloomﬁeld, 2008). Loughran and McDonald (2014) develop a readability index based on the storage size (byte) on the computer in the English context. They believe that byte is a better measure- ment because it can capture both the text length and the features of complex words in the text microstructure. This index is still not acclimatised to the Chinese context. Each letter or number in English is recorded as a byte (bt). To a certain extent, the storage size represents the number and the complexity of words, as well as the ﬁgures contained in the annual report. However, a Chinese character is recorded as two bytes and a number is recorded as one byte. Therefore, the number of bytes and the text length can almost substitute each other, which also faces the contradiction of the richness and readability. The main indexes of English readability provided by previous studies show their own advantages and disadvantages, but they face the problem of how to match the Chinese context. Therefore, there is rare research on Chinese readability. Qiu et al. (2016) make a helpful attempt by building a measurement for Chinese annual reports readability as follows: readability1 ¼ 13:90963 þ 1:54461 Fullsen þ 39:01497 Wordlist 2:52206 strokes þ 0:29809 count5 þ 0:36192 count12 þ 0:99363 count22 1:6467 count25 readability2 ¼ 14:95961 þ 1:11506 Fullsen þ 39:07746 Wordlist 2:48491 strokes Fullsen is the proportion of complete sentences in all sentences. Complete sentences refer to sentences containing subject, predicate, and object. Wordlist is the proportion of basic vocabulary that comes from the vocabulary of the Chinese Proﬁciency Test from level 1 to 3; Strokes is the average stroke number of Chinese characters; countN is the proportion of characters with strokes of N, where N represents 5, 12 and 25, respectively. This index is mainly related to the proportion of complete sentences, basic vocabulary and stroke number, which captures the features of the Chinese language. However, it still has some defects. Firstly, each weight is relatively complex, and some indexes weight far more than others. For instance, the weight of basic vocabulary is 39, while the weight of other indexes is mostly between 1 and 2, which seems to lack suﬃcient reasons. Secondly, it is diﬃcult to determine the subject, predicate and object because of the ﬂexible syntax of Chinese (Shi, 2000). The stroke number does not indicate the complexity of a single word, e.g. it is hard to distinguish the diﬃculty for investors between ‘计量’ (measure, jiliang) and ‘衡量’ (measure, hengliang), ‘获得’ (obtain, huode) and ‘赢得’ (win, yingde). It is obvious that the latter’s strokes are obviously more complex. Thirdly, in practice, this CHINA JOURNAL OF ACCOUNTING STUDIES 413 index needs to accurately segment every phrase of the full text to determine the subject, predicate and object and basic vocabulary. It also needs to calculate the strokes of each Chinese character, leading to a large workload. Finally, in terms of the empirical results, as mentioned in the literature review, the results about the analyst behaviour seem not to be consistent with the current study when measuring readability by this index. Additionally, Luo, Li, and Chen (2018) use the number of pages, words and chapters of the annual report as the indexes to measure the readability and test the relationship between the company agency problem and the readability. In essence, the number of pages, words and chapters all represent the text capacity, and they are highly related. The setting of the indexes follows the methodology of English readability index, i.e. to use the text capacity to indicate readability, which is quite practical. As mentioned above, text capacity as a measurement of readability makes sense because longer text can have higher cost in reading. Its disadvantage lies in that, on one hand, as Bloomﬁeld (2008) points out, it is diﬃcult to make sure that whether certain annual reports really need more space to express. On the other hand, in terms of its impact, larger text capacity may represent more information disclosure and it reduces the information asymmetry. It is another factor to obscure the information apart from diﬃcult text. In addition, it is more perceptual to say that a long but clearly expressed text is not necessarily more diﬃcult to read and understand than an confusing short text. Therefore, compared with the macro perspective, the problems above can be better avoided by studying the readability from the microstructure. Inspired by the previous researches, we construct a group of relatively concise indexes to measure the readability based on the text microstructure. The approach is that the word number in each clause in the ﬁrst part of Fog Index is less controversial, and that the existing Chinese researches prove measuring the average sentence length may be an eﬀective method for text diﬃculty classiﬁcation (Zhang, 2000). However, it should be noted that the diﬃculty of a sentence is obviously aﬀected by the sentence structure in the Chinese context. If a sentence has been given a set number of words, those without clauses are often diﬃcult to punctuate while those with clauses are easier to read. Therefore, we take the word number contained in a clause as one of the basic readability index (readability1). The other perspective is from the logical structure of a single sen- tence. Chinese researches indicate that the more function words are contained in a sentence, the more complex the logical relationship is and the less understandable the text is (Zhang & Peng, 2013; Zuo & Zhu, 2014). Function words include ﬁve categories: adverbs, prepositions, conjunctions, auxiliary words and interjections. Adverbs are words ‘used before verbs or adjectives to express actions, behaviors, nature, degree of state, scope, negation, etc.’ such as ‘非常’ (very, feichang), ‘更加’ (more, gengjia), ‘未必’ (not necessarily, weibi). These words often have the functions of aggravating, mitigating, negating, and double negating. Conjunctions are words that ‘connects two words or units that are larger than a word’. Typical conjunctions are ‘或者’ (or, huozhe), ‘而且’ (and, erqie), ‘但是’ (but, danshi), ‘虽然’ (though, suiran). The more conjunctions often mean more complicated progressions, transitions, etc. in contexts. Therefore, we take the proportion of adverbs and conjunctions in a sentence as the other index of readability (readability2). Prepositions (e.g. ‘向’ (xiang), ‘在’ (zai), etc.) that represent ‘action behavior object’ and auxiliary words and interjections (e.g. ‘啊’ (a), ‘吧’ (ba), ‘的’ (de), ‘呢’ (ne), etc.) that ‘act as auxiliary’ are not included. 414 W. XU ET AL. Therefore, we construct three Chinese readability indexes. The ﬁrst one is the average word number in each clause (readability1). The larger the index, the worse the readability. We collect the data by using a program to count the word number and sentences in each annual report and further get the average word number in each clause. The second is the proportion of adverbs and conjunctions in each sentence (readability2). The larger the similar index, the worse the readability. The thesaurus of adverbs and conjunctions comes from the dictionary of Modern Chinese Function Words edited by Wang (1998) and 800 Modern Chinese Words written by Lv (1999), both of which are commonly referred in lexicology. The third is to combine the ﬁrst two indexes referring to Fog Index. Since the setting of readability indexes in the Chinese context is exploratory, we do not give a special weight to certain index in the combination but adopt equal weight. The speciﬁc index is calculated as: readability3 =(readability1 + readability2) * 0.5. The larger the index, the worse the readability and the more complex the text. Considering the industrial feature of annual report readability, we further adjust these three indexes with the industry mean by referring to Lee (2012). Finally, in the empirical test, we take the text length as an alternative readability index and test it together. 4. Research design 4.1. Test logic As summarised in the literature review, the diﬀerence of the readability may be the result of management’s manipulation of information disclosure. Poor readability represents a lower quality of earnings information. Simultaneously, the readability aﬀects investors’ understanding of information, which leads to the lack of market response to the informa- tion disclosure. The theoretical predictions of these two directions have been supported by lots of empirical results. From the perspective of measurement, an appropriate read- ability index should be consistent with these two theoretical predictions. Therefore, it is feasible to verify the rationality of the indexes by testing whether there is consistency between the indexes and the prediction as well as the existing empirical results. This is also the logical approach adopted by Loughran and McDonald (2014) when they test the English annual report readability. Speciﬁcally, this paper tests the rationality of indexes through the following ways: The ﬁrst concrete test starts from the causes of the diﬀerences in the annual report readability. It is for sure that many factors can lead to the diﬀerence. However, for the annual report, a special text, a signiﬁcant motivation lies in the management’s attempt to cover up the lower information quality by manipulating the readability. This gives us the possibility to test from the perspective of the motivation of the readability diﬀerence. First, earnings persistence is an important performance of information quality. Earnings information with higher persistence also has higher decision-making eﬀectiveness (Dechow, Ge, & Schrand, 2010; Frankel & Litov, 2009). Relevant empirical results also prove that earnings information with higher persistence has higher information quality and can better forecast the stock price (Richardson, Sloan, Soliman, & Tuna, 2005). When a company discloses an unsustainable positive proﬁt, it has the motivation to reduce the readability in order to reduce investors’ prediction of future proﬁt decline. This prediction is also veriﬁed by Li (2008) in the English context. Therefore, from the perspective of CHINA JOURNAL OF ACCOUNTING STUDIES 415 earnings persistence, we ﬁrst test the Chinese readability indexes constructed in this paper. Additionally, the management may manipulate the readability when managing earnings, which makes it diﬃcult for investors to ﬁnd the tools and means of earnings management. For example, Lo et al. (2017) believe that enterprises' current earnings slightly exceeding the expected earnings are more likely to achieve the result that the reported performance exceeds the expected earnings through earnings management. Their empirical results imply that these companies are indeed more motivated to control the readability, so that investors cannot easily ﬁnd their earnings management. This gives us a second chance to verify the Chinese annual report readability from the perspective of earnings management motivation. The second test is the inﬂuence of readability on information users, which is divided into two directions: earnings response coeﬃcient and analyst behaviour. The former is the overall impact on the market, and the latter is the impact on sophisticated investors. As repeatedly stated above, investors may be aﬀected by the poor readability, resulting in incomplete understanding of the disclosed information. Therefore, when the annual report is disclosed, the stock price of ﬁrm with poor readability may not be able to reﬂect the disclosed content timely, resulting in a low earnings response coeﬃcient, which has also been veriﬁed by Lee (2012) based on the readability test of English annual report. Therefore, this is the third way to test the readability of Chinese annual report. Besides, as the professional information user, analysts may also be aﬀected by the readability. Facing the poor readability, it takes longer for them to update the forecast after the annual report is disclosed and result in lower forecast accuracy, which is also supported by the empirical results of Lehavy et al. (2011). Therefore, we take the relationship between analyst behaviour after the disclosure and the readability as the fourth way to test the Chinese annual report readability. 4.2. Sample and model In order to eliminate the external impact of the new and old accounting standards on the readability, we set the sample starting point as after the implementation of the new standards. We collect the annual reports of all A-share listed companies from 2007 to 2014 and analyse the text features with computer programs. After removing the samples of ﬁnancial industry or with lack of annual report, transaction data and control variables, 7959 company year observations are obtained . It should be noted that the lack of readability index is resulted from the fact that some annual reports are saved as PDF pictures and that words cannot be recognised. The lack of CAR is mainly due to our requirement that there should be at least 250 trading days before the disclosure date, and the trading delay after the disclosure should not be more than 1 week (mainly for the suspended companies). The missing earnings forecasts can have a great impact on the sample because unexpected earnings cannot be obtained without eﬀective analyst earnings forecasts. As for business complexity, we use the number of company segments. As voluntary disclosure items, some companies don’t disclose this data, which leads to a signiﬁcant lack of samples. Earnings volatility requires at least 3 years of listing time, which also leads to the lack of some companies. The speciﬁc sample screening process is shown in Table 1. 416 W. XU ET AL. Table 1. Sample screening process. Total Sample 17,025 Exclude Readability 1552 Exclude CAR 1631 Exclude Earnings Forecast 2323 Exclude Business Complexity 2312 Exclude Earnings Volatility 1046 Exclude Institutional Shareholding 202 In order to eliminate the inﬂuence of extreme values, all continuous variables are processed by winsorise with up and down at 1%. The data of A-share transaction and corporate ﬁnance are sourced from CSMAR. The text readability data is the index we built by referring to the existing text readability research, which is described in detail before. In Test I, we test the correlation between readability and earnings persistence to verify whether the Chinese readability indexes we construct are reasonable. By referring to Li (2008), the test model is: earning ¼ @ þ @ earning þ @ readabilty þ @ earning readabilty þ control tþ1 0 1 t 2 3 t (1) earning is the company proﬁtinperiodTand earning is the company earnings in t tþ1 period T + 1. The earnings calculation is the same with Li’s(2008), which is the operating proﬁt divided by total assets. readability is the Chinese readability index constructed in this paper, including the average word number in each clause (read- ability1), the average number of function words in each sentence (readability2)and the average of the two indexes (readability3). Moreover, we test the annual report text size (txt size) as an alternative readability index, which is the natural logarithm of text words. We predict that when the earnings are disclosed as positive, the coeﬃ- cient @ of the cross-multiplication term between earning and readability is signiﬁ- 3 t cantly negative. We mostly follow Li (2008) in terms of controlling variables, including company size (size), growth (growth), ﬁnancial leverage (lev), business complexity (b_cplx), ﬁnancial complexity (f_cplx), earnings volatility (sd_e), stock return volatility (sd_ret), listing age (listage), and company information environment (anﬂlow). Besides, in order to control the impact of trading rule diﬀerences on the stock price, we control whether the company is ST; in order to control the systematic diﬀerences of annual report characteristics of diﬀerent sectors, we control whether it is main board listing (MBOAD); ﬁnally, we also take corresponding measures to control the common inﬂuence of additional allotment of shares (SEO) and mergers and acquisitions (MA) on readability and earnings persistence. In Test II, we refer to Lo et al. (2017) and verify the rationality of the indexes by testing whether the annual report of the company whose current earnings is slightly higher than the expected earnings is less readable. Accordingly, the test model is: readability ¼ @ þ @ MBE þ control (2) t 0 1 t The deﬁnition of MBE follows that of Lo et al. (2017). Speciﬁcally, when the current reported EPS is slightly higher than the previous one, take 1, otherwise take 0. In this CHINA JOURNAL OF ACCOUNTING STUDIES 417 paper, it is deﬁned as being slightly higher is when the current EPS exceeds the previous one by 0.01 yuan, 0.02 yuan, and 0.03 yuan per share, respectively . Among the control variables, we ﬁrst control whether it is an ST company or not. This is because in China’s system, ST company may be delisted if it continues to lose money (Jiang & Wang, 2005). It also has a strong motivation for earnings management. Other control variables include company size (size), current earnings (Earnings), book to market ratio (MB), ﬁnancial complexity (f_cplx), stock return volatility (sd_ret), whether there is additional allotment of shares (SEO) and mergers and acquisitions (MA). In Test III, we aim at verifying the rationality of the constructed Chinese readability indexes through the relationship between annual report readability and earnings response coeﬃcient (ERC). Therefore, referring to Collins and Kothari (1989) and Lee (2012), we set the regression model as follows: CAR ¼ a þ a UE þ a readability þ a UE readability þ a X þ a UE X (3) 0 1 2 3 4 5 CAR is the cumulative abnormal return in the disclosure time window of annual report. The cumulative abnormal return is: the diﬀerence between the return per-shares and the estimated return obtained from the estimation window from – 210 days to – 30 days before the event. The estimation model of the normal return of individual shares is R ¼ @ þ β R , in which R is the actual return of ﬁrm i on day T and R is the overall it 1 mt it mt market return on day T. UE is the unexpected earnings, which is calculated by dividing the diﬀerence between the actual announced earnings and the average forecast earnings of analysts by the stock price before the event. Readability is a measurement of text read- ability that is consistent with Model 1. We hope that the coeﬃcient a is signiﬁcantly negative, i.e. the readability greatly reduces the market response of earnings information. X is another control variable. Like previous studies, we multiply all control variables and unexpected earnings (UE*X). Referring to Li (2008) and Lee (2012), we have basically kept the same control variables in Model 3 and Model 1. Considering the possible inﬂuence of earnings management on market reaction and readability, we have added control vari- able DA to alleviate this problem. The calculation of DA comes from Modiﬁed Jones Model (Dechow, Sloan, & Sweeney, 1995). In Test IV, we examine whether the Chinese readability set in this paper is reasonable through the relationship between analyst behaviour and the readability after the dis- closure. Referring to Lehavy et al. (2011), we set the model as follows: ftime ¼ α þ α readability þ control (4) 0 1 ferror ¼ α þ α readability þ control (5) 0 1 In Model 4, ftime is the number of days (natural logarithm) required for analysts to make the ﬁrst forecast after the annual report disclosure. The deﬁnition of readability remains unchanged. The diﬀerence in Model 5 from Lehavy et al. (2011) and Qiu et al. (2016) is that for the deﬁnition of ferror, we refer to Wang, Li, and Xiao (2017) and set it as the error of the ﬁrst analyst forecast after the disclosure. If there are multiple forecasts on the same day, the diﬀerence between the average value and the real earnings will be In the main test, we report the results based on the current EPS exceeding the previous EPS by 0.02 yuan, while in the robustness test, we report the results based on the current EPS exceeding the previous one by 0.01 yuan and 0.03 yuan, and the relevant results are consistent. 418 W. XU ET AL. calculated. The advantage is that it is the ﬁrst report can reduce the interference of other information to a certain extent, and it can alleviate the inﬂuence of subsequent analyst forecasts by previous analysts. The speciﬁc calculation is consistent with the practice of Lehavy et al. (2011), which is the square of the diﬀerence between the forecast EPS and the real EPS, divided by the closing price of the stock in the previous year. The control variables are basically consistent with those of Lehavy et al. (2011), including company size (size), growth (growth), business complexity (b_cplx), ﬁnancial complexity (f_cplx), earnings volatility (sd_e), stock return volatility (sd_ret), institutional investors’ shareholding ratio (inhold), R & D investment (RD), annual report information content (news). In addition, for similar reasons as before, we control whether the company is ST (ST), whether it is main board listing (MBOAD), whether there is additional allotment of shares (SEO) and mergers and acquisitions (MA), and earnings management (DA). In addition, we also test other indexes that may represent the annual report readability. It mainly includes the ﬁgure number (txt nmb) and the table number (table). The detailed variable deﬁnition is shown in Table 2. 5. Empirical test 5.1. Descriptive statistics Table 3 presents the descriptive statistics of the main variables. Among the readability indexes constructed in this paper, the average word number in each clause adjusted by the industry mean (Readability1)is −0.07; the 25th quantile is −4.77; the 75th quantile is 3.84; the average number of conjunctions and adverbs in each sentence adjusted by the industry mean (Readability2) is 0, the 25th quantile is −0.04; the 75th quantile is 0.03. The mean of text size (txt_size)is0.01.Themeanvaluesofthesefourgroupsofindexesarerelativelysmallafterbeing adjusted by the industry mean. The average table number of annual reports (table)without industry adjustment is 5.18; the mean of ﬁgure number (txt_nmb) is 8.64. The mean of excess cumulative income during the earnings announcement, one of the independent variable (CAR[−2,2]), is 0, and the median is −0.01, which is within a reasonable range. The mean of unexpected earnings is −0.02. Most of the other variables are in the range of three standard deviations and remain reasonable when compared with other relevant research values. Apart from the descriptive statistics of the main variables, we are also concerned about the correlation between the Chinese readability indexes and other annual report text feature indexes. Their correlation coeﬃcients are shown in Table 4. As shown in Table 4, the three readability indexes of Chinese annual reports are signiﬁcantly positively correlated with each other. In particular, the correlation coeﬃcient between Index 1, the word number in each clause (Readability 1), and Index 2, the proportion of conjunctions and adverbs in each sentence (Readability 2), is 0.5359, which is signiﬁcant at the level of 1%. It implies that there is consistency between the indexes constructed from two diﬀerent dimensions, and to some extent, our Chinese readability indexes may be reasonable. Meanwhile, there is a signiﬁcant negative correla- tion between the text length and the readability indexes, which supports the previous CHINA JOURNAL OF ACCOUNTING STUDIES 419 Table 2. Deﬁnition of main variables. Variable Symbol Variable Name Variable Deﬁnition Readability Metrics Readability1 Readability1 Average word number of each clause in annual report minus industry mean Readability2 Readability2 Proportion of conjunctions and adverbs in each sentence of annual report minus industry mean Readability3 Readability3 Arithmetic mean of indexes 1 and 2 Txt_size Text length Natural logarithm of the word number in the annual report minus the industry mean Other Complexity Characteristics of Annual Report Txt_nmb Word number Natural logarithm of the number of words Table Table number Natural logarithm of number of the tables Dependent Variable Involved in the Test Earnings Accounting earnings Operating proﬁt divided by total assets CAR Excess cumulative The excess cumulative income of individual stocks during the disclosure income window estimated by [- 210, – 30]; we report [- 2,2] as event window in the main test and other event windows in the robustness test Ftime Analyst forecast time The diﬀerence between the time of the analyst’s ﬁrst forecast and the annual report disclosure time, and the natural logarithm is taken after adding 1 Ferror Analyst forecast error The square of the diﬀerence between the EPS in the ﬁrst forecast of the analyst and the actual EPS, then divided by the stock price at the end of the previous year; if there are multiple reports, they shall be calculated after taking the average. Control Variables UE Unexpected earnings Announced earnings minus the average of analysts’ forecast, then divided by the stock price at the end of the previous period Size Size Natural logarithm of stock market value at the end of last period Lev Leverage ratio Total liabilities of the company divided by total assets at the end of the previous period Growth Growth Growth rate of sales revenue B_cplx Business complexity Natural logarithm of number of company segments F_cplx Financial complexity Natural logarithm of the number of accounting items involved in the company in the balance sheet, proﬁt statement and cash ﬂow statement Sd_E Earnings volatility Standard deviation of earnings in the past three years SD_RET Stock price volatility Standard deviation of monthly return of stock in the past year Listage Listing date Years of listing Analyst The number of The number of analysts tracking the company last year plus one and take the following analysts natural logarithm ST ST company or not If the company is ST, then the value is 1, otherwise it is 0 MBOARD Main board listing If the company is listed on the main board, then it is 1, otherwise 0 company or not SEO Additional allotment of If the company issues SOE during the accounting period, it is 1, otherwise it is 0 shares MA Mergers and If the company has a merger and acquisitions in the current accounting period, acquisitions the value is 1, otherwise it is 0. RD RD R & D investment divided by sales revenue earnings management level for the current accounting period, DA Earnings management The company’s calculated according to the Modiﬁed Jones Model News Information content The annual report announces the excess cumulative return of the company’s stock during the period [−1,1] Inhold Institutional The proportion of institutional shareholding in this accounting period shareholding analysis. The text length may also represent the management’s strong willingness to disclose and higher information transparency of the enterprise, so it has a negative correlation with the text readability that represents the management’s motivation to obscure information. The ﬁgure number in the annual report (Txt_nmb) also shows similar results and may represent a better information disclosure quality. Of course, this is only 420 W. XU ET AL. Table 3. Descriptive statistics of main variables. Sample Mean Standard Deviation P25 Median P75 Readability1 7959 −0.07 6.65 −4.77 −0.85 3.84 Readability2 7959 0 0.06 −0.04 −0.01 0.03 Readability3 7959 −0.03 3.34 −2.4 −0.43 1.93 Txt_size 7959 0.01 0.2 −0.13 0 0.14 Txt_nmb 7959 8.64 0.27 8.46 8.62 8.79 table 7959 5.18 0.4 4.96 5.27 5.46 earning 7959 0.05 0.05 0.02 0.04 0.07 CAR[−2,2] 7959 0 0.06 −0.04 −0.01 0.03 UE 7959 −0.02 0.03 −0.02 −0.01 0 Size 7959 22.39 0.97 21.69 22.25 22.93 MB 7959 1.93 1.56 0.85 1.53 2.48 Lev 7959 0.44 0.21 0.28 0.45 0.61 Listage 7959 7.8 5.4 3 7 12 Analyst 7959 1.65 1.17 0.69 1.61 2.56 F_cplx 7959 5.36 0.2 5.27 5.29 5.55 SD_RET 7959 0.13 0.05 0.1 0.12 0.16 B_CPLX 7959 1.24 0.48 0.69 1.1 1.61 SD_E 7959 0.23 0.19 0.11 0.18 0.29 DA 7959 0.07 0.08 0.02 0.04 0.08 SEO 7959 0.13 0.34 0 0 0 MA 7959 0.11 0.32 0 0 0 Ferror 6703 0.01 0.02 0 0 0.01 Ftime 6703 1.03 1.75 0 0 1.1 News 6703 0.04 0.04 0.01 0.03 0.05 RD 6703 0.02 0.05 0 0.01 0.04 In_hold 6703 8.25 10.38 2.09 5.25 10.29 Table 4. Correlation coeﬃcients of annual report text features. Readability1 Readability2 Readability3 Txt_size Txt_nmb table Readability1 1 Readability2 0.5359*** 1 (0.000) Readability3 0.9000*** 0.5424*** 1 (0.000) (0.000) txt_size −0.1966*** −0.1675*** −0.1971*** 1 (0.000) (0.0000) (0.0000) Txt_nmb −0.0291*** −0.1465*** −0.0303*** 0.7726*** 1 (0.003) (0.000) (0.002) (0.000) Table 0.1719*** 0.0222** 0.1713*** 0.2372*** 0.2694*** 1 (0.000) (0.025) (0.000) (0.000) (0.000) P values are in the brackets. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. a preliminary test of the annual report text features, which is further veriﬁed by the subsequent regression results. 5.2. Regression results First, we test whether the Chinese readability indexes constructed in this paper are reasonable by the correlation between readability and earnings persistence. According to previous studies, annual report with poor readability should represent low earnings quality, and poorer readability turns out when companies disclose the more unsustain- able positive earnings (refer to Li, 2008). Therefore, we expect that our Chinese readability indexes can reﬂect this relationship. We use the samples with positive earnings during the sample period to regress between Chinese readability and earnings persistence. In order CHINA JOURNAL OF ACCOUNTING STUDIES 421 Table 5. Test on readability indexes based on earning persistence. 12 34 Dependent var = earning tþ1 earning 0.4447*** 0.4460*** 0.4447*** 0.4421*** (12.37) (12.30) (12.37) (12.14) earning readability1 −0.0027** (−2.57) Readability1 0.0003*** (5.27) earning readability2 −0.4184** (−2.33) Readability2 0.0372*** (4.23) earning readability3 −0.0054*** (−2.60) Readability3 0.0007*** (5.31) earning txt size 0.0620 (1.18) txt_size −0.0100*** (−3.25) size 0.0015** 0.0016** 0.0015** 0.0023*** (2.20) (2.39) (2.20) (3.26) growth 0.0085*** 0.0086*** 0.0085*** 0.0087*** (9.14) (9.33) (9.14) (9.42) lev −0.0292*** −0.0284*** −0.0292*** −0.0288*** (−9.03) (−8.76) (−9.03) (−8.75) listage 0.0002** 0.0003*** 0.0002** 0.0003** (2.41) (2.65) (2.41) (2.23) analyst 0.0066*** 0.0066*** 0.0066*** 0.0065*** (10.57) (10.36) (10.56) (10.61) SD_E 0.0015 0.0014 0.0015 0.0020 (0.76) (0.73) (0.76) (1.00) SD_RET −0.0473*** −0.0467** −0.0473*** −0.0464*** (−2.59) (−2.57) (−2.59) (−2.58) B_coplx 0.0003 0.0005 0.0003 0.0005 (0.29) (0.42) (0.29) (0.46) F_cplx −0.0013 0.0014 −0.0013 0.0005 (−0.13) (0.15) (−0.13) (0.05) mboard 0.0062*** 0.0063*** 0.0062*** 0.0059*** (4.73) (4.83) (4.73) (4.43) SEO −0.0052*** −0.0052*** −0.0052*** −0.0051*** (−3.49) (−3.42) (−3.49) (−3.38) MA −0.0011 −0.0009 −0.0011 −0.0011 (−1.42) (−1.21) (−1.41) (−1.43) ST 0.0228*** 0.0221*** 0.0227*** 0.0232*** (3.22) (3.30) (3.22) (3.20) Year & industry Control Control Control Control Intercept −0.0010 −0.0186 −0.0011 −0.0357 (−0.02) (−0.38) (−0.02) (−0.72) N 5471 5471 5471 5471 adj. R-sq 0.644 0.644 0.644 0.644 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. to control the residual autocorrelation problem, we adopt the clustering standard error of two dimensions: company and year (Thompson, 2011), as is shown in Table 5. In Table 5, there is a signiﬁcant positive correlation between current earnings (earning ) and future earnings (earning ), which is consistent with the previous study (Richardson tþ1 et al., 2005), indicating that there is a persistence between earnings. In Column 1, the 422 W. XU ET AL. coeﬃcient of earning readability1 is signiﬁcantly negative at the level of 5% (coeﬃcient is −0.0027, t value is −2.57), which supports the theoretical expectation that companies with poor readability have poor earnings persistence. The coeﬃcient in Column 2 is −0.4184, which is signiﬁcant at the level of 5% and in line with the expectation as well. The regression coeﬃcient of the cross product of the comprehensive index readability3 in Column 3 is −0.0054, which is signiﬁcant at the level of 1% and supports the expectation. The regression results above are consistent with the ﬁndings obtained by Li (2008) using Fog Index as the readability index of English annual reports. From the perspective of the readability cause, our readability indexes for Chinese annual reports are proved practical. Column 4 of Table 5 reports the relationship between text size (txt_size) and earnings persistence; the coeﬃcient of earning txt size is 0.062; t value is 1.18, which is not signiﬁcant and inconsistent with Li’s ﬁndings (2008). One possible explanation is that, as we analyse earlier, the text size of annual reports may not only reﬂect higher reading diﬃculty but also more information disclosure. It infers that, at least from the perspective of earnings persistence that reﬂects the management’s motivation to manipulate read- ability, the text size of Chinese annual report does not seem to be a reliable index. Secondly, we use Model 2 to test the relationship between earnings management and annual report readability. Previous studies have shown that ﬁrms which are more likely to achieve a certain performance goal through earnings management are also more likely to cover up their earnings management behaviour by manipulating the readability (Lo et al., 2017). Therefore, we adopt a similar approach to test the readability indexes in this paper, as is presented in Table 6. Table 6. Test on readability indexes based on earnings management. Readability1 Readability2 Readability3 txt_size MBE = 1 if Δeps 2 [U 0; U 0:02] MBE 0.8033*** 0.0041* 0.4037*** −0.0119 (2.91) (1.66) (2.91) (−1.58) ST 2.4266*** 0.0158*** 1.2212*** −0.0229 (4.21) (2.91) (4.21) (−1.56) size 1.0856*** −0.0000 0.5428*** 0.0717*** (8.25) (−0.02) (8.21) (17.81) roa −2.6752 −0.0179 −1.3466 −0.2435*** (−0.83) (−0.56) (−0.83) (−2.60) mb −0.2670*** −0.0012 −0.1341*** 0.0125*** (−2.78) (−1.31) (−2.78) (4.18) SD_RET 6.3623*** −0.0040 3.1791*** 0.1640*** (3.36) (−0.21) (3.33) (3.00) F_cplx 14.4803*** −0.0103 7.2350*** −0.0895** (12.69) (−0.81) (12.60) (−2.32) SEO −0.2749 −0.0103*** −0.1426 0.0434*** (−1.38) (−5.16) (−1.42) (7.44) MA 0.5164** −0.0018 0.2573** 0.0086 (2.32) (−0.82) (2.30) (1.34) Industry & Year Control Control Control Control Intercept −98.8081*** 0.0582 −49.3749*** −1.1203*** (−14.33) (0.84) (−14.23) (−5.15) N 7959 7959 7959 7959 adj. R-sq 0.077 0.003 0.076 0.163 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. CHINA JOURNAL OF ACCOUNTING STUDIES 423 As shown in Columns 1 to 3 of Table 6, the test variable MBE of earnings management is signiﬁcantly positively correlated with readability1, readability2 and readability3, the readability indexes constructed in this paper, i.e. the more likely the enterprise is to carry out earnings management, the worse the annual report readability. Another variable, ST, is also positively correlated with the readability indexes, which is also signiﬁcant at the level of 1%. In the real situation of China, ST companies have stronger motivation to turn around losses in order to avoid being delisted, and these companies also have greater possibility of earnings management, hence they are more likely to provide annual reports with poor readability to confuse investors. These results are consistent with the previous studies, and once again support the rationality of the constructed indexes in this paper. Besides, Column 4 of Table 6 reports the test results using text length (word number) as the readability index. No signiﬁcant correlation is found between txt_size and the other variables, whether performance is slightly beyond expectations or ST is taken as the index of earnings management, and the regression coeﬃcient is negative. To some extent, it indicates that text length may not measure readability well. Thirdly, we test the Chinese readability indexes through the market response when the annual report is disclosed. One of the essence of the inﬂuence of readability is that the text reading diﬃculty will aﬀect the investors’ understanding of the annual report, and the poor readability will delay their response to the information, resulting in insuﬃcient response to the disclosure, which has also been veriﬁed by Lee (2012) based on empirical evidence of English readability. Therefore, whether the readability indexes in this paper can reﬂect this inﬂuence is tested through the market response during the annual report disclosure period, and the results are shown in Table 7. The independent variable in Table 7 is the cumulative abnormal return of earnings announcement of [−2,2]. We also report the regression results of [−1,1], [−3,3], [−5,5] in the robustness test. Column 1 of Table 6 suggests that the regression coeﬃcient of the Table 7. Test on readability indexes based on earnings response coeﬃcient. Dependent var = car[−2,2] UE 0.2538 0.4750 0.2548 0.8364 (0.19) (0.37) (0.19) (0.63) UE*readability1 −0.0068*** (−2.59) readability1 −0.0002 (−0.97) UE*readability2 −0.5689* (−1.93) readability2 −0.0359* (−1.96) UE*readability3 −0.0136*** (−2.59) readability3 −0.0004 (−0.98) UE*txt_size 0.2260 (1.62) txt_size 0.0111 (1.59) Size −0.0028* −0.0028** −0.0028* −0.0038*** (−1.96) (−2.01) (−1.96) (−3.44) UE*size −0.0001 0.0012 −0.0001 −0.0168 (Continued) 424 W. XU ET AL. Table 7. (Continued). (−0.00) (0.03) (−0.00) (−0.34) growth −0.0018 −0.0017 −0.0018 −0.0015 (−1.56) (−1.56) (−1.56) (−1.42) UE*growth 0.0904** 0.0918** 0.0904** 0.0954** (2.10) (2.18) (2.10) (2.25) lev −0.0011 −0.0018 −0.0011 −0.0027 (−0.13) (−0.20) (−0.13) (−0.31) UE*lev 0.4588* 0.4487** 0.4588* 0.4152* (1.95) (1.98) (1.95) (1.82) SD_E 0.0043 0.0042 0.0043 0.0041 (0.71) (0.69) (0.71) (0.69) UE*SD_E 0.1176** 0.1241** 0.1175** 0.1323** (2.09) (2.31) (2.09) (2.41) F_cplx −0.0056 −0.0075 −0.0055 −0.0068 (−0.38) (−0.55) (−0.38) (−0.50) UE*F_cplx −0.1092 −0.1486 −0.1094 −0.1452 (−0.87) (−1.23) (−0.88) (−1.20) listage 0.0001 0.0001 0.0001 0.0002 (0.50) (0.52) (0.50) (0.51) UE*listage 0.0072** 0.0060** 0.0072** 0.0068** (2.57) (2.14) (2.57) (2.01) SD_RET −0.0342 −0.0357 −0.0342 −0.0367 (−1.34) (−1.46) (−1.34) (−1.44) UE*SD_RET 0.0127 −0.1109 0.0119 −0.0473 (0.03) (−0.26) (0.03) (−0.11) B_coplx −0.0026* −0.0029** −0.0026* −0.0029** (−1.83) (−2.05) (−1.84) (−1.99) UE*B_coplx 0.0432 0.0411 0.0431 0.0463 (1.30) (1.18) (1.30) (1.26) ST −0.0059 −0.0057 −0.0059 −0.0058 (−0.82) (−0.80) (−0.82) (−0.82) UE*ST −0.1362 −0.1397 −0.1362 −0.1395 (−1.30) (−1.32) (−1.30) (−1.31) Analyst 0.0065*** 0.0064*** 0.0065*** 0.0065*** (4.07) (4.12) (4.07) (3.98) UE*Analyst 0.0389 0.0388 0.0389 0.0371 (1.20) (1.21) (1.20) (1.19) Mboard −0.0058 −0.0056 −0.0058 −0.0053 (−0.86) (−0.83) (−0.86) (−0.87) UE*Mboard −0.1972 −0.2061 −0.1971 −0.1956 (−1.40) (−1.49) (−1.40) (−1.39) DA 0.0027 0.0028 0.0027 0.0020 (0.38) (0.39) (0.38) (0.27) UE*DA 0.0703 0.0674 0.0703 0.0372 (0.46) (0.42) (0.46) (0.23) MA 0.0014 0.0012 0.0014 0.0012 (1.04) (0.98) (1.04) (0.92) UE*MA 0.0497 0.0446 0.0497 0.0447 (0.83) (0.76) (0.83) (0.74) SEO −0.0000 −0.0004 −0.0000 −0.0005 (−0.00) (−0.12) (−0.00) (−0.14) UE*SEO 0.0680 0.0590 0.0679 0.0511 Industry & Year Control Control Control Control Intercept 0.1029 0.1152 0.1027 0.1322 (1.10) (1.32) (1.10) (1.62) N 7959 7959 7959 7959 adj. R-sq 0.016 0.016 0.016 0.016 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. CHINA JOURNAL OF ACCOUNTING STUDIES 425 Table 8. Test on readability indexes based on analyst behaviour after annual report disclosure (updating time). Dependent var = Ftime Readability1 0.0160*** (4.31) Readability2 1.2376** (2.46) Readability3 0.0320*** (4.29) txt_size −0.2303* (−1.76) size −0.4051*** −0.3904*** −0.4051*** −0.3732*** (−11.72) (−10.85) (−11.72) (−10.19) b_coplx 0.1260** 0.1403** 0.1261** 0.1407** (1.98) (2.22) (1.99) (2.22) News −1.0942** −1.0706** −1.0936** −1.0889** (−2.37) (−2.37) (−2.37) (−2.37) SD_RET 1.4630* 1.6054** 1.4634* 1.6660** (1.92) (2.12) (1.92) (2.20) F_cplx −0.7078** −0.4594 −0.7069** −0.4896 (−2.16) (−1.41) (−2.16) (−1.47) growth −0.1623** −0.1618** −0.1622** −0.1583** (−2.56) (−2.58) (−2.56) (−2.57) SD_E −0.3510** −0.3492** −0.3508** −0.3477** (−2.31) (−2.36) (−2.42) (−2.37) win_hold −0.0169*** −0.0167*** −0.0169*** −0.0164*** (−5.33) (−5.33) (−5.33) (−5.19) RD −4.4426*** −4.5513*** −4.4424*** −4.4214*** (−5.87) (−5.77) (−5.87) (−5.77) ST 0.7964*** 0.8123*** 0.7963*** 0.8248*** (4.13) (4.29) (4.13) (4.41) Mboard 0.2967*** 0.3157*** 0.2965*** 0.3225*** (3.38) (3.67) (3.38) (3.95) DA 0.4783** 0.4503** 0.4783** 0.4508** (2.47) (2.38) (2.47) (2.10) MA −0.0838 −0.0769 −0.0838 −0.0783 (−1.18) (−1.05) (−1.18) (−1.07) SEO 0.0665 0.0709 0.0666 0.0619 (0.84) (0.91) (0.84) (0.79) Industry & Year Control Control Control Control Intercept 13.8262*** 12.1440*** 13.8202*** 11.7229*** (6.38) (5.54) (6.37) (5.31) N 6703 6703 6703 6703 adj. R-sq 0.108 0.106 0.108 0.105 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. intersection item UE * readability1 of unexpected earnings UE and word number in each clause (readability1)is −0.0068, which is signiﬁcant at the level of 1%; the regression coeﬃcient of the intersection item UE * readability2 of Column 2 is −0.5689, which is signiﬁcant at the level of 10%; Column 3 shows a similar result of −0.0136, which is signiﬁcant at the level of 1%. These results support the theoretical prediction, the annual report with poor readability will lead to investors’ failure to respond to the information in time, resulting in a smaller market response when the annual report is disclosed, which is also consistent with the empirical results of Lee (2012) on the English annual report readability. This also supports the rationality of the constructed readability indexes from the perspective of investors’ understanding of annual reports. 426 W. XU ET AL. Table 9. Test on readability indexes based on analyst’ behaviour after annual report disclosure (forecast error). Dependent var = Ferror Readability1 0.0001* (1.66) Readability2 0.0068 (1.35) Readability3 0.0002* (1.66) txt_size −0.0041** (−2.39) size 0.0029*** 0.0030*** 0.0029*** 0.0033*** (9.61) (9.95) (9.61) (10.06) b_coplx −0.0011 −0.0010 −0.0011 −0.0009 (−1.52) (−1.41) (−1.52) (−1.31) News 0.0239*** 0.0242*** 0.0239*** 0.0244*** (2.79) (2.83) (2.79) (2.85) SD_RET 0.0198** 0.0205*** 0.0198** 0.0212*** (2.53) (2.63) (2.53) (2.72) F_cplx 0.0059* 0.0072** 0.0059* 0.0068** (1.70) (2.12) (1.71) (2.00) growth −0.0014* −0.0014* −0.0014* −0.0013* (−1.82) (−1.81) (−1.82) (−1.67) SD_E 0.0034* 0.0034* 0.0034* 0.0035** (1.95) (1.96) (1.95) (2.03) win_hold 0.0000 0.0000 0.0000 0.0000 (0.43) (0.45) (0.43) (0.51) RD −0.0208* −0.0213* −0.0207* −0.0184 (−1.77) (−1.81) (−1.77) (−1.56) ST 0.0048** 0.0049** 0.0048** 0.0049** (2.31) (2.35) (2.31) (2.35) Mboard −0.0013 −0.0012 −0.0013 −0.0014 (−0.91) (−0.97) (−1.07) (−0.97) DA 0.0233*** 0.0231*** 0.0233*** 0.0231*** (5.10) (5.07) (5.10) (5.07) MA −0.0001 −0.0001 −0.0001 −0.0001 (−0.12) (−0.08) (−0.12) (−0.05) SEO −0.0011 −0.0011 −0.0011 −0.0010 (−1.23) (−1.21) (−1.23) (−1.12) Industry & Year Control Control Control Control Intercept −0.0867*** −0.0951*** −0.0867*** −0.1016*** (−4.28) (−4.84) (−4.28) (−5.12) N 6703 6703 6703 6703 adj. R-sq 0.043 0.043 0.043 0.044 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. In Column2 of Table 7, the regression coeﬃcient of unexpected earnings UE and text size (txt_size) is 0.226 and t value is 1.62, which is close to be signiﬁcant at the 10% level. It seems that the text size does not show the characteristics that readability should have but indicates that enterprises with larger annual report text size can get a greater market response in their information disclosure. This phenomenon seems to be more consistent with the prediction that the text size reﬂects the information content of the annual report. Fourth, we aim at testing the relationship between the annual report readability and analysts’ forecast to ﬁgure out whether the readability indexes are reasonable. As sophis- ticated information users, analysts may also be aﬀected by the readability and poor reception of the annual report information, resulting in a longer time for updating the CHINA JOURNAL OF ACCOUNTING STUDIES 427 forecast and larger forecast error (Lehavy et al., 2011). Therefore, we test the relationship between the readability indexes constructed in this paper and the time for analysts to update forecast and forecast error after the disclosure to verify the rationality of the indexes. The regression results based on the updating time are shown in Table 8. Table 8 presents the regression between the time required for analysts to make their ﬁrst forecast after disclosure and the readability. In Column1, the average word number in each clause (readability1) is signiﬁcantly positively correlated with the time for analysts to update their forecasts (coeﬃcient is 0.016, t value is 4.31); in Column2, the number of conjunctions and adverbials in each sentence (readability2) of annual report is also positively correlated with the update time, and is signiﬁcant at the level of 5%. In Column3, the regression coeﬃcients of the arithmetic mean of the two indexes (read- ability3) are also signiﬁcantly positively correlated at the level of 1%. These results suggest that our readability indexes are consistent with the theoretical prediction and the empiri- cal results of English readability provided by previous researches (Lehavy et al., 2011), thus they may be reasonable. Column4 of able8 suggests there is a negative correlation between the annual report text size and the forecast updating time after the disclosure, which is signiﬁcant at the level of 10%. This result also indicates that the text size is more likely to represent the information quality, rather than the readability of the annual report. In the meantime, we also test the forecast accuracy (forecast error) after the disclosure, and the regression results are presented in Table 9. Column1 of Table 9 reports that readability1 of annual report is signiﬁcantly positively correlated with the forecast error after disclosure; readability2 is positively correlated with the forecast error of analysts, but the t value is 1.35, which is not signiﬁcant; the regression coeﬃcient of the comprehensive index readability3 is 0.0002, and the t value is 1.66, which is signiﬁcant at the level of 10%. This result provides weak support for the rationality of Chinese readability indexes. Interestingly, in Column4 of Table 8, there is a signiﬁcant negative correlation between text length (txt_size) and forecast error (coeﬃcient −0.0041; Table 10. Other text features and earnings persistence. Dependent var = earning tþ1 earning 0.4101*** 0.4087*** (11.23) (11.06) earning txt nmb 0.0005** (2.46) Txt_nmb −0.0034*** (−2.62) earning table 0.0009** (2.56) table 0.0003 (0.37) Other Control Variable Yes Yes Year & industry Control Control Intercept 0.0110 −0.0032 (0.22) (−0.07) N 5376 5376 adj. R-sq 0.644 0.644 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. 428 W. XU ET AL. Table 11. Other features and earnings response coeﬃcient (ﬁgure number). Car[−1,1] Car[−2,2] Car[−3,3] Car[−5,5] 1 234 UE −1.1072* −0.9530 −0.3614 −0.3103 (−1.78) (−0.89) (−0.30) (−0.18) UE*txt_nmb 0.1722 0.2309 0.2202 0.2132 (1.35) (1.35) (1.63) (1.52) Txt_nmb 0.0032 0.0061 0.0070 0.0101 (0.47) (0.80) (0.85) (1.29) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept 0.0143 0.0777 0.1364 0.2148 (0.16) (0.70) (1.08) (1.62) N 7952 7952 7952 7952 adj. R-sq 0.013 0.016 0.017 0.021 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 12. Other features and to earnings response coeﬃcient (table number). Car[−1,1] Car[−2,2] Car[−3,3] Car[−5,5] 12 3 4 UE −0.0805 0.6172 1.0260 0.9629 (−0.10) (0.48) (0.68) (0.46) UE*table −0.0197 0.0752 0.0211 −0.0033 (−0.36) (1.40) (0.35) (−0.07) Txt_table 0.0026 0.0025 −0.0000 0.0022 (0.99) (0.89) (−0.00) (0.75) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept 0.0231 0.1038 0.1642* 0.2478*** (0.33) (1.16) (1.73) (2.66) N 7952 7952 7952 7952 adj. R-sq 0.013 0.015 0.016 0.021 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. t value −2.39). This result is consistent with the previous one, i.e. the longer the text length, the more information may be disclosed, thus helping analysts improve the forecast accuracy. To sum up, the ﬁve tests are carried out from the perspectives of causes of readability, investors’ understanding of disclosure information, and sophisticated investors to, respec- tively, examine the Chinese annual report readability indexes constructed in this paper. The results indicate that these indexes are in line with the theoretical prediction of the management’s motivation to the readability and interfere in investors’ understanding of information, and is also consistent with the previous empirical evidence of the English annual report readability (Lee, 2012; Lehavy et al., 2011; Li, 2008; Lo et al., 2017). Therefore, it may be a more concise and reasonable measurement of Chinese annual report read- ability. On the other hand, the test of text size seems to show more of information content, which may not be a desirable proxy variable for readability here. CHINA JOURNAL OF ACCOUNTING STUDIES 429 Table 13. Test on readability indexes based on other text features and analyst behaviour after annual report disclosure. Department var = Ferror Department var = Ftime 123 4 Txt_nmb −0.0031** −0.0372 (−2.13) (−0.40) table −0.0008 0.0970 (−0.83) (1.00) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept −0.0715*** −0.0897*** 12.4573*** 12.6746*** (−3.22) (−4.48) (5.63) (5.23) N 6703 6703 6703 6703 adj. R-sq 0.042 0.042 0.105 0.105 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 14. Test on readability indexes based on earnings persistence (EPS). Dependent var = EPS tþ1 EPS 0.5230*** 0.4588*** 0.5228*** (53.11) (51.32) (53.11) EPS readability1 −0.0152*** (−17.74) Readability1 0.0092*** (11.82) EPS readability2 −1.1015*** (−10.07) Readability2 0.7385*** (8.26) EPS readability3 −0.0302*** (−17.72) Readability3 0.0183*** (11.82) Other Control Variable Yes Yes Yes Year & industry Control Control Control Intercept −0.4472 −0.6515** −0.4471 (−1.50) (−2.19) (−1.50) N 5471 5471 5471 adj. R-sq 0.541 0.524 0.541 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. 5.3. Further test In addition to the sentence vocabulary and structure and the text size, we also test whether other annual report text feature indexes can be used as a proper measure of readability. Theoretically, according to the logic of measuring text size, more numbers and tables in the text may bring diﬃculties in understanding the information, indicating a lower readability. On the other hand, in terms of motivation, more numbers and tables may also mean more information disclosure, representing higher quality of information 430 W. XU ET AL. Table 15. Robustness test on earnings management: when EPS exceeds expectation by 0.01 yuan. 123 4 Readability1 Readability2 Readability3 txt_size MBE = 1 if Δeps 2 [U 0; U 0:01] MBE 0.6994** 0.0046 0.3520** −0.0135 (1.98) (1.49) (1.98) (−1.46) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept −98.8279*** 0.0580 −49.3850*** −1.1200*** (−14.32) (0.84) (−14.22) (−5.15) N 7959 7959 7959 7959 adj. R-sq 0.076 0.003 0.076 0.163 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 16. Robustness test on earnings management: when EPS exceeds expectation by 0.03 yuan. 123 4 Readability1 Readability2 Readability3 txt_size MBE = 1 if Δeps 2 [U 0; U 0:03] MBE 0.6776*** 0.0045** 0.3411*** −0.0149** (2.80) (2.00) (2.80) (−2.22) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept −98.9328*** 0.0573 −49.4378*** −1.1169*** (−14.35) (0.83) (−14.25) (−5.14) N 7959 7959 7959 7959 adj. R-sq 0.077 0.004 0.076 0.164 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 17. Earnings response coeﬃcient of other time windows (readability1). Car[−1,1] Car[−3,3] Car[−5,5] 12 3 UE −0.1858 0.8007 0.6058 (−0.25) (0.52) (0.27) UE*readability1 −0.0046 −0.0059*** −0.0118*** (−1.59) (−7.04) (−3.36) Readability1 −0.0002 −0.0002 −0.0002 (−1.35) (−0.93) (−0.80) Other Control Variable Yes Yes Yes Industry & Year Control Control Control Intercept 0.0203 0.1704* 0.2681*** (0.28) (1.74) (2.78) N 7959 7959 7959 adj. R-sq 0.013 0.017 0.021 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. disclosure. For instance, Chen, Miao and Shevlin (2015) ﬁnd that the disclosure of more accounting subjects in annual reports means higher quality of information disclosure, which obviously means more numbers or tables. CHINA JOURNAL OF ACCOUNTING STUDIES 431 Table 18. Earnings response coeﬃcient of other time windows (readability2). Car[−1,1] Car[−3,3] Car[−5,5] UE −0.0442 0.9866 0.9792 (−0.06) (0.67) (0.46) UE*readability2 −0.2234 −0.4576* −0.7988** (−0.74) (−1.78) (−1.96) Readability2 −0.0293*** −0.0379** −0.0592** (−2.79) (−2.21) (−2.27) Other Control Variable Yes Yes Yes Industry & Year Control Control Control Intercept 0.0336 0.1812* 0.2809*** (0.48) (1.95) (3.09) N 7959 7959 7959 adj. R-sq 0.014 0.017 0.022 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 19. Earnings response coeﬃcient of other time windows (readability3). Car[−1,1] Car[−3,3] Car[−5,5] 12 3 UE −0.1847 0.8016 0.6080 (−0.25) (0.52) (0.27) UE*readability3 −0.0092 −0.0117*** −0.0235*** (−1.58) (−7.09) (−3.35) Readability3 −0.0004 −0.0003 −0.0005 (−1.36) (−0.94) (−0.81) Other Control Variable Yes Yes Yes Industry & Year Control Control Control Intercept 0.0202 0.1702* 0.2677*** (0.28) (1.74) (2.78) N 7959 7959 7959 adj. R-sq 0.013 0.017 0.021 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Therefore, we also do the same test for the ﬁgure number (txt_nmb) and table number (table) in the text to see whether they can represent the readability. The results are shown in Tables 10–13, which are only related to the main variables due to the limited space. Table 10 suggests that the number of both ﬁgures (txt_nmb) and tables (table) in the annual report is positively correlated with earnings persistence, and the former is sig- niﬁcant at the level of 5%. It seems that the disclosure of more ﬁgures and tables in the annual report represents better information quality. This is also consistent with the conclusions of Chen, Miao and Shevlin (2015), i.e. with respect to motivation, ﬁgures (txt_nmb) and tables (table) may not be consistent with the direction of measuring readability. Then, we test the two indexes from the perspective of investors’ understanding of information, which is presented in Tables 11 and 12. In Table 12, both the intersection item of the ﬁgure number (txt_nmb) and the unexpected earnings (UE) in the annual report and the regression coeﬃcients in diﬀerent 432 W. XU ET AL. Table 20. Test on the standardised-and -averaged readability indexes (based on earnings persistence). Dependent var = earning tþ1 earning 0.4445*** (12.41) earning readability4 −0.0135*** (−2.71) Readability4 0.0018*** (2.88) Other Control Variable Yes Year & industry Control Intercept −0.0100 (−0.20) N 5471 adj. R-sq 0.644 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 21. Test on the standardised-and -averaged readability indexes (based on earnings manage- ment probability). MBE = 1 if Δeps 2 [U 0; U 0:01] Δeps 2 [U 0; U 0:02] Δeps 2 [U 0; U 0:03] Readability4 Readability4 Readability4 MBE 0.0899* 0.0935** 0.0871*** (1.93) (2.56) (2.69) Other Control Variable Yes Yes Yes Industry & Year Control Control Control Intercept −6.8830*** −6.8800*** −6.8964*** (−7.10) (−7.10) (−7.12) N 7959 7959 7959 adj. R-sq 0.027 0.027 0.027 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. time windows are greater than 0 but not signiﬁcant, and the t value is 1.35, 1.35, 1.63 and 1.52, respectively. However, the regression coeﬃcient seems to be neither stable nor signiﬁcant in diﬀerent time windows. It may be not ﬁrmly supported that the ﬁgure number and the table number in the annual report can represent readability, and that the former may represent the information content to some degree. We also test the relationship between the ﬁgure number (txt_nmb) and the table number (table) and analysts’ forecast, which is reported in Table 13. Table 13 suggests that ﬁgure number (txt_nmb) is negatively correlated with analysts’ forecast error, which is similar to the result of text size, while other results are not signiﬁcant. The tests above indicate that the ﬁgure number (txt_nmb) and the table number (table) are not reliable indexes for measuring the Chinese annual report readability. CHINA JOURNAL OF ACCOUNTING STUDIES 433 Table 22. Test on the standardised-and -averaged readability indexes (based on ERC). Car[−1,1] Car[−2,2] Car[−3,3] Car[−5,5] 12 3 4 UE −0.1058 0.3562 0.8894 0.7975 (−0.14) (0.27) (0.58) (0.36) UE*readability4 −0.0284 −0.0508** −0.0427*** −0.0808*** (−1.23) (−2.37) (−4.17) (−2.83) Readability4 −0.0020** −0.0023 −0.0023* −0.0035 (−2.19) (−1.50) (−1.65) (−1.59) Other Control Variable Yes Yes Yes Yes Industry & Year Control Control Control Control Intercept 0.0209 0.1019 0.1673* 0.2605*** (0.30) (1.13) (1.76) (2.86) N 7959 7959 7959 7959 adj. R-sq 0.014 0.016 0.017 0.022 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. Table 23. Test on the standardised-and -averaged readability indexes (based on analysts’ forecast). Ferror Ftime Readability4 0.0006* 0.1200*** (1.71) (3.39) Other Control Variable Yes Yes Industry & Year Control Control Intercept −0.0904*** 13.0547*** (−4.56) (6.04) N 6703 6703 adj. R-sq 0.043 0.108 T values are in the brackets; there are two-dimensional clusters of company and year. ***, **, and * indicate signiﬁcance at the 1%, 5%, and 10% levels, respectively. 5.4. Robustness test Similar to Li’s method (2008), we have operating proﬁt divided by total assets to measure the earnings in Test 1. In the robustness test, we replace the earnings with earnings per share (EPS), which are more commonly used, as is shown in Table 14. Table 14 supports that our results remain robust after adopting EPS to measure earnings. In Test 2, we consider the performance as being slightly beyond expectation when the current EPS exceeds the previous EPS by 0.02 yuan. In order to make the result more robust, we use 0.01 yuan and 0.03 yuan, respectively, as the standard and re-examine the Model 2, as is shown in Tables 15 and 16. The results of Tables 15 and 16 are generally consistent with those in Table 5, and the regression results of three readability indexes in this paper and MBE are still positive. In Test 3, referring to Li and Zhang (2015), we employ the annual report disclosure time window of [−2,2]. In order to exclude the inﬂuence of event time window on the result, we refer to Yu, Tian, and Zhang (2012) and adopt diﬀerent time windows of [−1,1], [−3,3], and [−5,5] for robustness test, which are reported in Tables 17–19. Tables 17–19 indicate that in diﬀerent time windows, the regression coeﬃcient of intersection item of both the constructed Chinese annual report readability indexes and 434 W. XU ET AL. the unexpected earnings are negative; the regression results are weak signiﬁcance in [−1,1] while those in [−2,2], [−3,3], and [−5,5] are all signiﬁcantly negative, which further supports the rationality of the constructed indexes. Finally, in the construction of readability3, we simply arithmetically averaged read- ability1 and readability. This exploratory approach may bring about a problem that there may be an order of magnitude inequality between the two parts, so that in fact, a certain sub-index occupies too much weight. This problem may also exist in Fog Index. In the ﬁrst part of Fog Index, the word number in each sentence is not equal to the proportion of complex words in each sentence of the second part, which makes the former have a higher weight. To solve this problem, we standardise readability1 and readability2 and then average them to form the variable readability4. In this part, we use readability4 as part of the robustness test, which is reported in Tables 20–23. It can be found in Tables 20–23 that results of the test on readability indexes after being standardised and averaged still remain robust. It suggests that the two indexes constructed in this paper, i.e. word number in each clause and the proportion of con- junctions and adverbials in each sentence, are reliable to some extent. Besides, we do other main robust test as following: First, we add another robustness test in order to control the individual diﬀerence of the cross-listing companies (B-share, H-share), we employ variable BH deﬁned as the value of samples with B/H shares is 1, otherwise is 0 to capture the cross-listing eﬀect. The results have shown that after controlling BH, though the results are slightly weakened, the main conclusion remains robust. Second, the management’s discussion and analysis (MA&D) is another critical part for readability manipulation. Therefore, we collect the readability indexes of this part through the measurement in this paper and repeat the main tests and the results based on the discussion and analysis of the management remain robust when using the readability indexes of this paper. Third, we consider the ﬁrm ﬁxeﬀect of all main regression in this paper, and we found most results are still robust. Finally, other robustness tests include replacing some measurement indexes of the control variables, and the results are still robust. 6. Conclusions and limitations Since the initiative research of Li (2008), the annual report readability has become a signiﬁcant topic in empirical accounting research. Existing research suggests that the management will manipulate the readability for the purpose of confusing information; meanwhile, the investors will not able to make an accurate response in time due to the poor readability. From these two points of view, a large number of empirical studies based on the English annual report readability have been carried out. In contrast, there are few studies based on the Chinese annual reports readability. A possible reason may be the lack of readability measurement indexes. All regression result in this part are not included in the main document for space limitation and can be got by email if any interested. CHINA JOURNAL OF ACCOUNTING STUDIES 435 Based on the analysis of the existing readability indexes, this paper constructs three readability indexes for Chinese annual reports by referring to the Fog Index commonly applied in English readability measurement and the ﬁndings of Chinese linguistics. The ﬁrst index is the average word number in each clause in the annual report, the second is the proportion of adverbs and conjunctions in each sentence, and the third is the simple arithmetic average of the ﬁrst two. Referring to the previous research ﬁndings, we empirically test each Chinese annual report readability indexes regarding motivation and consequence to verify whether the constructed readability indexes are reasonable. It is proved that the three indexes are in good accordance with the theoretical expecta- tions, and the empirical results are consistent with the previous ﬁndings about English readability, so they may be the practical tools for measurement. At the same time, we also test the text length, the number of ﬁgures and tables in the annual report as alternative readability indexes. The results indicate that these three indexes are not ideal for text readability. Through the measurement and tests, this paper can provide a basic reference for further research on the motivation and consequences of the Chinese annual report readability. This paper also has some limitations. First of all, this paper still needs to be further explored in linguistics. Due to the limited attention to readability in Chinese linguistics studies, the readability index construction here may lack a complete theoretical founda- tion, making the construction look exploratory to some degree. Secondly, as a research of measurement, although we try our best to test the indexes, we still cannot exhaust all the possibilities and hence limit the index applicability. Finally, although this paper focuses on the index construction, just like other readability studies, there may be endogenous problems that are diﬃcult to solve. All these limitations also point to the direction for future studies. Disclosure statement No potential conﬂict of interest was reported by the authors. Funding This work was supported by the The Fundamental Research Funds of Shandong University ; National Natural Science Foundation of China . References Ball, R., & Brown, P. (1968). An empirical evaluation of accounting income numbers. Journal of Accounting Research, 6(2), 159–178. Bloomﬁeld, R. (2008). Discussion of “annual report readability, current earnings, and earnings persistence”. Journal of Accounting and Economics, 45(2–3), 248–252. Chen, S., Miao, B., & Shevlin, T.J. (2015). A new measure of disclosure quality: The level of disag- gregation of accounting data in annual reports. Journal of Accounting Research, 53(5), 1017–1054. Cheng, X., Liu, J., & Cheng, Y. (2015). A supplement or another Lie: Earnings manipulation and non- ﬁnancial information disclosure in MD＆A. Accounting Research, 8,11–18. Collins, D.W., & Kothari, S.P. (1989). An analysis of intertemporal and cross-sectional determinants of earnings response coeﬃcients. Journal of Accounting and Economics, 11(2–3), 143–181. 436 W. XU ET AL. Dechow, P.M., Ge, W., & Schrand, C.M. (2010). Understanding earnings quality: A review of the proxies, their determinants and their consequences. Journal of Accounting and Economics, 50 (2–3), 344–401. Dechow, P.M., Sloan, R.G., & Sweeney, A.P. (1995). Detecting earnings management. Accounting Review, 70(2), 193–225. Dhaliwal, D.S., Radhakrishnan, S., Tsang, A., & Yang, Y.G. (2012). Nonﬁnancial disclosure and analyst forecast accuracy: International evidence on corporate social responsibility disclosure. The Accounting Review, 87(3), 723–759. Francis, J., Nanda, D., & Olsson, P. (2010). Voluntary disclosure, earnings quality, and cost of capital. Journal of Accounting Research, 46(1), 53–99. Franco, G., Hope, O.K., Vyas, D., & Zhou, Y. (2015). Analyst report readability. Contemporary Accounting Research, 32(1), 76–104. Frankel, R., & Litov, L. (2009). Earnings persistence. Journal of Accounting & Economics, 47(1–2), 182–190. Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill. Hammersley, J.S., Myers, L.A., & Shakespeare, C. (2008). Market reactions to the disclosure of internal control weaknesses and to the characteristics of those weaknesses under Section 302 of the Sarbanes Oxley Act of 2002. Review of Accounting Studies, 13(1), 141–165. Hu, Y., Rao, Y., Chen, Y., & Li, P. (2003). Survey on the understandability of information by securities analysts. Accounting Research, 11,14–20. Jiang, G., & Wang, H. (2005). Should ﬁrms with two consecutive annual losses be specially treated (ST)? Economic Research Journal, 3, 100–107. Jones, M.J., & Shoemaker, P.A. (1994). Accounting narratives: A review of empirical studies of content and readability. Journal of Accounting Literature, 13, 142. Eng, L.L., & Mak, Y.T. (2003). Corporate governance and voluntary disclosure. Journal of Accounting & Public Policy, 22(4), 325–345. Lee, Y.J. (2012). The eﬀect of quarterly report readability on information eﬃciency of stock prices. Contemporary Accounting Research, 29(4), 1137–1170. Lehavy, R., Li, F., & Merkley, K. (2011). The eﬀect of annual report readability on analyst following and the properties of their earnings forecasts. The Accounting Review, 86(3), 1087–1115. Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45(2–3), 221–247. Li, Y., & Zhang, L. (2015). Short selling pressure, stock price behavior, and management forecast precision: Evidence from a natural experiment. Journal of Accounting Research, 53(1), 79–117. Lo, K., Ramos, F., & Rogo, R. (2017). Earnings management and annual report readability. Journal of Accounting and Economics, 63(1), 1–25. Loughran, T., & McDonald, B. (2014). Measuring readability in ﬁnancial disclosures. The Journal of Finance, 69(4), 1643–1671. Luo, J.-H., Li, X., & Chen, H. (2018). Annual report readability and corporate agency costs. China Journal of Accounting Research, 11(3), 187–212. Lv, S. (1999). Eight hundred words in modern Chinese. The Commercial Press. Qiu, X., Zheng, X., & Deng, K. (2016). Can analysts play an eﬀective role in professional information interpretation?: Evidence based on a complexity/readability index of Chinese corporate annual reports. China Economic Quarterly, 15, 1483–1506. Richardson, S.A., Sloan, R.G., Soliman, M.T., & Tuna, İ.(2005). Accrual reliability, earnings persistence and stock prices. Journal of Accounting and Economics, 39(3), 437–485. Shi, D. (2000). The ﬂexibility of Chinese syntax and the theory of syntax. Contemporary Linguistics, 2, 18–26. Soper, F.J., & Dolphin, R. (1964). Readability and corporate annual reports. The Accounting Review, 39 (2), 358–362. Sun, M. (2004a). A research on the understandability of annual reports of listed companies. Accounting Research, 12,23–28. Sun, M. (2004b). Impression management behavior in information disclosure of listed companies. Accounting Research, 3,40–45. CHINA JOURNAL OF ACCOUNTING STUDIES 437 Tan, H., Wang, E.Y., & Zhou, B. (2015). How does readability inﬂuence investors’ judgments? Consistency of benchmark performance matters. The Accounting Review, 90(1), 371–393. Thompson, S.B. (2011). Simple formulas for standard errors that cluster by both ﬁrm and time. Journal of Financial Economics, 99(1), 1–10. Wang, X., Li, Y., & Xiao, M. (2017). Risk information disclosure in annual report, heterogeneous beliefs and analysts’ earnings forecasts. Accounting Research, 10,37–43. Wang, Y., Yu, L., & An, R. (2014). Does non-ﬁnancial information improve information environment? Evidence from disclosure of corporate social responsibility. Journal of Financial Research, 8, 178–191. Wang, Y., Wu, L., & Bai, Y. (2005). Frequency and magnitude of earnings management of listed companies in China. Economic Research Journal, 12, 102–112. Wang, Z. (1998). A dictionary of function words in modern Chinese. Shanghai Lexicographical Publishing House. Xie, D., & Lin, L. (2015). Do management tones help to forecast ﬁrms’ future performance: A textual analysis based on annual earnings communication conferences of listed companies in China. Accounting Research, 2,20–27. Xu, H., & Hou, Y. (2012). Information transparency and trading choice of retail investors. Journal of Financial Research, 3, 180–192. Xu, N., Hong, T., Wu, S., & Xu, X. (2011). Information ﬂow model, investor psychological bias and stock price comovement. Economic Research Journal, 4, 135–146. Xue, S., Xiao, Z., & Pan, M. (2010). Does management discussion and analysis provide useful information? – Based on the empirical research of loss listed companies. Management World, 2, 130–140. Yan, D., & Sun, M. (2002). Study on the readability of share B’ s annual reports in the Shenzhen exchange. Accounting Research, 5,10–17. You, H., & Zhang, X.-J. (2009). Financial reporting complexity and investor underreaction to 10-K information. Review of Accounting Studies, 14(4), 559–586. Yu, Z., Tian, G., & Zhang, Y. (2012). Media coverage, institutional development and market reaction to earnings: A further investigation about market pressure hypothesis. Accounting Research, 9, 40–51. Zhang, L., & Peng, Y. (2013). A study on measuring the language diﬃculty of intermediate-advanced Chinese visual-audio-oral textbooks adapted from ﬁlms and TV series. Chinese Teaching in the World, 2, 254–267. Zhang, N. (2000). A quantitative analysis of the diﬃculty of corpus in Chinese textbooks. Chinese Teaching in the World, 3,83–88. Zuo, H., & Zhu, Y. (2014). Research on Chinese readability formula of texts for intermediate level European and American students. Chinese Teaching in the World, 2, 263–276.
China Journal of Accounting Studies
– Taylor & Francis
Published: Jul 3, 2019
Keywords: Chinese annual reports; readability; measure; test