Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Thematic Proximity in Content Analysis:

Thematic Proximity in Content Analysis: This article explains how to calculate thematic proximity within a mixed methods content analysis approach. Thematic proximity of two themes can indicate the presence of meta-themes. Meta-themes are themes which acquire their meaning through the systematic co-occurrence of two or more other themes. By combining qualitative and quantitative techniques of content analysis, the researcher can reveal these latent text patterns. Using a study on Jihadi media as a showcase, the article describes how to detect meta-themes through content analysis. To this end, the article introduces a novel theme-correlation coefficient that adds valuable information to traditional theme relation metrics. It enables researchers to make new empirical observations in text data. Keywords content analysis, concept mapping, qualitative content analysis, mixed methods, communication studies, communication, social sciences, human communication, Jihadi ideology, Jihadism Introduction Literature Review: Content Analysis and Theme Relation Metrics There are different ways to measure thematic proximity and code relations in content analysis. This article reviews In practice, many researchers combine different strands of some of them and introduces on this basis a new theme content analysis into hybrid (inductive-deductive; Fereday & relation coefficient. A theme is a generalized and sum- Muir-Cochrane, 2006) and mixed methods (qualitative and marizing description for a set of interrelated issues. In the quantitative) designs. technical sense, “a theme is an outcome of coding [and] Content includes practices as diverse as fully automated categorisation” (Saldana, 2013, p. 14), whereby codes, text mining approaches (Angus, Rintel, & Wiles, 2013; A. E. categories, and themes represent different levels of the Smith, 2003; A. E. Smith & Humphreys, 2006; Stockwell, researcher’s abstraction from the original data. “Data” in Colomb, Smith, & Wiles, 2009) and hermeneutic approaches content analysis can be anything that has “content,” but (Rantala & Hellström, 2001). Often its purpose is to sum- this article exclusively focuses on text data. Following maries, retrieve, and analyze information from documents. A the hierarchical order of codes, categories, and themes, core task therefore is to identify meaningful clusters of infor- this article explains how to analyze relations between mation often referred to as themes, concepts, codes, or cate- these analytical units. It refers to these relations as “the- gories. There are numerous interpretative and algorithm-based matic relations.” techniques to do so but there are only two directions from Used in combination with existing theme relation coeffi- which a researcher can apply these techniques: Themes can cients, the proposed coefficient can reveal how frequent, be identified following inductive (observation based) coding consistent, and elaborated themes, categories, and codes and deductive (theory based) coding (Glaser, 1978; Glaser & relate to each other. This information helps researchers to Strauss, 1967; Mayring, 2000). It is also possible to approach identify meta-themes; themes that are implicitly rather than explicitly stated in textual data. The analysis of thematic proximity inquires the subtext 1 National Center for Crime Prevention (NZK), Bonn, Germany of verbal information in a standardized fashion. As a Corresponding Author: showcase, the article presents a content analysis study on Andreas Armborst, National Center for Crime Prevention (NZK), c/o Jihadi statements from al-Qaeda (AQ) leaders and demon- German Federal Ministry of the Interior, Graurheindorfer Straße 198, strates how the detection of meta-themes works in research 53117 Bonn, Germany. practice. Email: andreas.armborst@bmi.bund.de Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open the data from both directions which is then referred to as Heuristic 2: Internal homogeneity: to maximize cohesive hybrid (observation and theory based) coding of text data. validity (all content of one particular unit is clearly similar). Heuristic 3: External heterogeneity: to maximize dis- criminant validity (the content of two different units is Observation and Theory-Based Coding clearly about two different things). The purpose of explorative (inductive) content analysis is to Heuristic 4: Code overlaps: Following Heuristics 2 identify analyzable units (codes) in primary or secondary and 3, codes overlap as sparse as possible and as often as text data (newspapers, office documents, interview tran- necessary. scripts, field notes, etc.) and to summaries them under meaningful labels (categories). Depending on how complex In explorative content analyses, the researcher usually has the material under investigation is, the researcher has to to code the entire data set, or at least substantial parts of it, decide how to organize the units of analysis. There are at several times until the coding scheme becomes stable. During least two approaches to this: According to the Coding these iterations, the coder creates, modifies, deletes, and Manual for Qualitative Researchers (Saldana, 2013) on the merges the units in accordance to the coding heuristics, first level, the researcher attaches a code to certain segments thereby steadily developing the taxonomy. of text. On the second level, he organizes interrelated codes When conducting theory-based (deductive) coding, the into categories thereby creating a taxonomy or category researcher starts with a given set of analytical units (codes, scheme with different categories and subcategories. On the categories themes), that is, the number and the label of units third level, the researcher arranges groups of categories into is fixed through the theory from which they derive. Coding themes. On top of this pyramid stands a “theory” about the Heuristic 1 therefore does not apply to theory-based coding, subject as the result of the analysis. but Heuristics 2 to 4 do. If internal and external homogeneity Other approaches allow for coding themes directly into cannot be achieved, then this indicates a mismatch between the data, without the process of extracting codes and building the theory and the data. categories. This is called thematic coding (or “themeing” the data according to Saldana, 2013, p. 175). Thematic Proximity According to one common practice of thematic coding (Attride-Stirling, 2001), one can use three hierarchical lev- Coding is a time-consuming and tedious process and of els, or category classes named basic, organizing, and global course not an end in itself. One purpose of coding is to reduce themes to discriminate in different degrees between rather complex text structures to analyzable units. A fully coded set abstract and rather concrete content. A basic theme is “the of documents enables the researcher to address a wide range most basic or lowest-order theme that is derived from the of research questions with a large repertoire of (qualitative textual data,” an organizing theme is “a middle-order theme and quantitative) analytical approaches. One of them is the that organizes the Basic Themes into clusters of similar analysis of thematic proximity (the relation between units). issues,” and global themes “are super-ordinate themes that Its purpose is to identify latent patterns in the content that encompass the principal metaphors in the data as a whole” cannot be observed by simply reading the material. The anal- (Attride-Stirling, 2001, p. 388). ysis of latent patterns is called relational content analysis. So the main point of difference between the two Relational content analysis usually combines qualitative approaches is whether the researcher can apply (or identify) and statistical interpretation of verbal data into one coher- an analytical unit from the third level directly to data. The ent instrument (Bos & Tarnai, 1999). Still it is not a mixed common ground of both approaches, and this is the decisive methodology in the strict sense of the term, insofar as it point of the methodology proposed here, is that these ana- does not necessarily require collecting “both quantitative lytical units have a hierarchical order. and qualitative data” (Creswell & Plano Clark, 2011, Next to this organizational structure, the coding procedure p. 276; Onwuegbuzie & Teddlie, 2003). The “mix” is the also requires coding heuristics: standardized rules that guide stage of analysis where a numeric coefficient indicates how the decision of the researcher about when to create a new ana- strong two themes are related to each other. Still, the detec- lytical unit, how to label it, and how to separate codes, catego- tion and interpretation of meta-themes goes beyond “quan- ries, and themes from each other. It is important to spell out titative analysis of qualitative information” (Fakis, Hilliam, rules and thereby make the coding and classification process Stoneley, & Townend, 2014), and is more than just “two as transparent and replicable as possible. For calculating the separate approaches to studying the same phenomena” theme-correlation coefficient, use the following heuristics (Symonds & Gorad, 2008, p. 11). (Heuristics 1-3; taken from Kelle & Kluge, 2010). There are different means available to determine the the- matic proximity of two descriptive units. The work of Oleinik Heuristic 1: Sparseness: to use as few analytical units as (2011) provides a useful overview. Cosine similarity, for possible and as much as necessary to capture all content example, is a vector-based method often used in automated adequately. text mining, such as Leximancer. Armborst 3 To a certain extent, theme relation coefficients resemble Proposed Methodology: The t- metrics for intercoder reliability, such as Krippendorff’s Coefficient alpha (Krippendorff, 1995, 2004; Neuendorf, 2017). Both The c-coefficient measures how often and consistently two indicate code overlapping, however, for different purposes. It codes (units) co-occur throughout the entire text sample but could be worthwhile to “hack” alpha coefficients in a way disregards how elaborated their relation is. This, however, is that they indicate thematic proximity instead of intercoder a valuable piece of information about thematic structure of reliability. the content. The proposed coefficient therefore indicates Another means to determine thematic proximity is to ana- how much content two descriptive units actually share with lyze the frequency and pattern of theme co-occurrences each other in terms of words frequencies. This can be rele- within a given set of documents. The c-coefficient used in vant because the information about how frequent and how content analysis software ATLAS.ti (Friese, 2014) measures consistent two themes co-occur does not necessarily tell any- how often and consistently two codes co-occur or overlap thing about how important or elaborated the thematic link is throughout the entire text sample. The c-coefficient repre- within the research context. Taken together, the two coeffi- sents the frequencies and patterns of code co-occurrence cients can reveal latent structures in text samples that might “similar to a correlation coefficient statistics” (Friese, 2014, constitute a meaningful meta-theme. I refer to the proposed p. 189). It is based on the Jaccard similarity coefficient. coefficient as the t-coefficient (t for theme). It is defined as Many content analysis software packages do not provide this function, but the researcher can use the code retrieval func-   1 n n tion “near within one paragraph” to determine the number of 12 12 t =+ ,   co-occurrences, and then calculate the c-coefficient manu- 2 n n  1 2  ally using the formula (Friese, 2014, p. 190): with = the total number of words classified with Code 1, n n 12 = the total number of words classified with Code 2, and c = , n = the number of intersecting words between Code 1 and nn + − n 12 () 12 12 Code 2. with n = the number of occurrences of Code A, n = the The t-coefficient measures the average proportion of con- 1 2 number of occurrences of Code B, and n = the number of tent that two descriptive units share with one another. It can co-occurrences of both codes. take values between 0 (indicating mutual exclusiveness of Co-occurrence means that that both codes either code the coded text segments) and 1 (indicating complete overlap same segment or overlapping segments. The coefficient can [congruency] of coded text segments). take values between 0 (indicating perfect independence) and A t-coefficient of 0.10, for example, provides the informa- 1 (indicating perfect relation). The greater the discrepancy tion that each of the two units shares on average 10% of its between n and n , the smaller are the highest possible val- content with the other unit (i.e., usually none of the two cat- 1 2 ues of c. For example, if the number of code occurrence of egories shares exactly 10% unless they both have the same Code A are twice as high as those of Code B ( nn =× 2 ), the number of words). In combination with the c-coefficient we maximum value of c is 0.5 indicating that Code B occurs can also say how often and how consistently this co-occur- always in combination with Code A, whereas Code A occurs rence appears. Large t-values combined with low c-values in 50% of occurrences together with Code B. indicate that the link is elaborated but not frequent and con- The c-coefficient has two important limitations: First, it sistent, whereas large t-values combined with large c-values can underestimate the strength of a thematic relation when indicate that the thematic link is elaborated, frequent, and one analytical unit has considerable more codings (the num- consistent. The t-values greater than .5 should be interpreted ber of discrete text segments that are associated with a given with caution because it might indicate a lack of discriminant code) than the other. The c-coefficient does not take into validity, that is, two themes are so closely related, that they consideration the proportion of overlapping content. are not distinguishable and may actually represent the same Therefore, it can remain low although the thematic link theme. If this happens, then it could indicate a violation of might be quite elaborated in terms of word frequencies. the coding Heuristic 3 (see above). Second, it is not standardized and disregards the overall cod- ing pattern of the data set making it difficult to compare Standardizing the Theme Relation Coefficient c-coefficients from different studies. To prevent this loss of information, the following section It is important to note that the sample size, coding heuristics, proposes an additional coefficient that takes into account not and number of descriptive units can affect the t-coefficient. the frequency of code co-occurrences but the proportions of What may be a high coefficient within the scope of one study text intersections based on word frequencies. Taken together, may indicate a rather weak thematic relation within the other. these two coefficients can better assess the qualitative and This obstructs comparability of the t-coefficient between two quantitative relation of two themes. studies. To eliminate the influence of the coding practice on 4 SAGE Open the results, it is necessary to calculate the standardized Interpreting t-Coefficients t-coefficient t . The standardized t-coefficient eliminates the There are two ways to judge whether a given t-value indi- influence stemming from different coding practices, namely, cates a weak, moderate, or a strong relation. First is to com- the overall proportion of multiply coded content, and the pare different t-values with each other. As can be seen in number of units in the coding scheme. The more content is Table 1, the highest correlation between two categories is coded by multiple units and the fewer the number of units in .129 between the theme “theological justifications for the use the coding scheme are, the higher are the average values of of force” and the narrative about the “global conflict.” the t-coefficient (and vice versa). This is due to the fact that Compared with other thematic links, this is strong. The the unstandardized t-coefficient is based on the proportion of c-coefficient (.058) is also comparably high signaling that intersecting words between units. To compare the coefficient this link is also more frequent and consistent than most other between different content analysis studies, it is therefore nec- links in the table. essary to consider the net effect caused by coding practices. Another way to judge the strength of the correlation is to The standardized t-coefficient is adjusted in regard to compare observed t-values against the unobserved t-values these two general coding patterns. It is calculated in four of two mutually independent themes (here U = .0002). A steps. First is to calculate the proportion of text retrievals t-value of .129 then indicates that the correlation is signifi- with more than one coding in relation to the sum of all text cantly different from independence. It is also possible to base retrievals: this benchmark test on randomly, instead of equally distrib- uted content. This would in some way resemble the statistical P = , test for significance and could be the method of choice in quantitative content analysis. The standardized coefficient t is always smaller than the where p = the word frequency of text retrievals with more 6 observed coefficient t, but it has the advantage that it is not than one code in the entire sample, and p = the word fre- 7 affected by coding practices, such as the number of units in quency of all retrieved text segments. the category scheme, and therefore is more suitable to com- P states the extent of multiply coded text in all docu- pare results. ments of the sample. The P value in the showcase study is P = .62 and states that 62% of all coded words are coded with more than one unit. P has to be interpreted in rela- Detecting Meta-Themes, or How to “Read tion to the degrees of freedom, that is, the number of all Between the Lines” of Qualitative Data possible (yet, not measured) bivariate correlations. The higher the number of categories within the coding scheme, When two or more descriptive units systematically co-occur and with that, the number of possible bivariate correla- in the text data and when the co-occurrence is not only fre- tions, the lower is the average influence of P on any given quent but also elaborated in terms of word frequencies, then bivariate correlation. this can indicate the presence of a meta-theme. The two coef- The degrees of freedom are determined by the number of ficients therefore are quantitative indicators for meta-themes. potential correlations between the k categories. They are cal- Meta-themes are themes which acquire their meaning culated by dividing all fields of the code-correlation matrix through the systematic co-occurrence of two or more other ( k ) minus the fields in the diagonal by two. themes. The prefix “meta” means that these themes are themes of a higher informational order, or in other words, kk − they are not explicitly but implicitly communicated within () df = . the content. A meta-theme might mark subconscious com- munication and tells the researcher something about the The next step is to calculate the adjustment coefficient U. source, namely, that it systematically refers to two distinct It indicates the average bivariate correlation if all multiply themes. coded text were equally distributed among all units of the The statistical coefficients should always be interpreted in category scheme. It works as a baseline comparison for the combination with a qualitative assessment of the meta-con- observed correlation t. tent. Not every thematic correlation is necessarily a meta- The adjustment coefficient U is UP = / df . theme. Likewise, the detection of a meta-theme does not The standardized t-coefficient is: tU =− () 1 t . necessarily reveal the reason why the originator communi- It is in the judgment of the researcher to decide whether to cates subconsciously and not explicitly and intentional. This report every single standardized t-coefficient. In some cases, question can be answered only within the context of a par- it might be sufficient just to report the overall adjustment ticular study. coefficient U, namely, when U is so small that it hardly To give one example of subconscious communication, we affects the difference between t and t . now turn to the showcase study. s Armborst 5 Table 1. Theme Relations. Diagnostic frame Reference system Themes Apostasy Global Secular Factual evidence Theological evidence issues (39,128) (20,730) (10,428) (46,641) (21,832) Instrumentality of force (7,657) Strategic benefits (4,973) Intersection 533 488 187 1,131 137 c-coefficient .036 .045 .048 .031 .009 t-coefficient .060 .061 .028 .126 .017 Religious benefits (2,684) Intersection 603 82 283 312 71 c-coefficient .018 .005 .017 .006 .026 t-coefficient .120 .017 .066 .061 .015 Justification for the use of force (13,075) Political justifications (3,010) Intersection 15 518 69 132 51 c-coefficient .011 .066 .016 .024 .003 t-coefficient .003 .099 .015 .023 .010 Theological justifications (10,065) Intersection 1,925 1,753 1,61 1,033 1,182 c-coefficient .050 .058 .023 .016 .110 t-coefficient .120 .129 .016 .062 .086 guidelines of academic conduct” (Hellmich, 2008, p. 111). Worked Example The availability of primary sources coincided with the Within Islamic studies, the “unusual combination of logic, “post-9/11 money surge into terrorism studies” for which religion, politics and violence” of Islamism has been Marc Sageman (2014) provocatively diagnosed “deleterious acknowledged (Jansen, 1997, p. xvi). This “dual nature of effect” (p. 566). Although there are also examples of good Islamic Fundamentalism” (Cozzens, 2007; Sedgwick, 2004) scientific practice, terrorism studies have not yet exploited is the point of departure for this showcase study. Jihadi ideol- the full potential of content analysis approaches. ogy comprises not only strategic thinking, rational argument, Authors of studies who apply content analysis tech- and common sense logic but also doctrine, theological rea- niques often remain descriptive. Eveslage (2013), for soning, and religious fanaticism. To date, there is no system- instance, counted the number of threats against domestic atic empirical research on the question how exactly both and foreign targets within 23 public statements of the rationalities are connected. The showcase study demon- Nigerian Jihadi group Boko Haram. Torres, Jordán, and strates how the analysis of meta-themes in Jihadi ideological Horsburgh (2006) used qualitative and quantitative the- statements can shed light on this link. Its objective is to matic coding to summaries a sample of 2,878 documents explore the ideological origins of religiously inspired vio- from AQ. Salem, Reid, and Chen (2008) classified 706 lence through content analysis of public statements from media files produced by Jihadi groups in regard to their AQ’s leadership. production features, purpose and usage as documentary, propaganda, operational, hostage, executions, statement/ communique, tribute/eulogy, training, and instructional Literature Review: Content Analysis of Jihadi videos. Pennebaker and Chung (2008) described differ- Media ences in linguistic styles between bin Laden and Zawahiri, Over the last 15 years, the Jihadi movement has produced an and Beutel and Ahmad (2011) inferred from their analysis abundance of media and propaganda material, and the aca- of 49 bin Laden speeches, that the now deceased leader of demic community was not idle to investigate this material the Jihadi movement cited policy-based grievances for his with a great deal of interest. Despite the wealth of available militancy twice as often as religious-based ones. data and scholarly work, systematic content analysis of this Descriptive content analysis of Jihadi media gave material is still the exception. It seems that the availability researchers a first glance into the wealth of data but to come of highly interesting and politically relevant research mate- to more generic conclusions about the groups who communi- rial was conductive for an atmosphere in which “the terror- cate these messages, more sophisticated analysis is needed. ism studies community seems to have deviated from the A common approach in terrorism studies therefore is to 6 SAGE Open compare extremist groups who engage in violence with those who do not (A. G. Smith, 2004). For example, A. G. Smith (2008) and A. G. Smith, Suedfeld, Conway, and Winter (2008) applied three psychological measurement constructs (value reference, motive imagery, integrative complexity) to media content of violent and nonviolent Islamist groups, and identified those variables that are statistically significant pre- dictors to distinguish between groups. Conway, Gornick, Figure 1. Category classes. Houck, Towgood, and Conway (2011) investigated “hidden Note. AQ = al-Qaeda. implications of radical group rhetoric” by analyzing random text samples with integrative complexity coding from violent and nonviolent Islamist groups. Pennebaker (2011) identi- represent the most general characteristics of Jihadi ideology. fied in a text sample of 296 documents statistically signifi- Discourse, frame, narrative, theme, and issue represent dif- cant predictors for a violent attack in the 2 to 6 months ferent hierarchical levels the coding scheme. They represent following the statement of the group. Rieger, Frischlich, and the functional elements of ideologies—the mechanisms Bente (2013) integrated ethnographic content analysis of through which they frame the world—but they do not tell Jihadi and right wing media into a randomized experimental anything about the actual grievances, claims, positions, strat- design to investigate the individual’s response to ideological egies, and visions of the movement that embraces this ideol- messaging. ogy. Each level therefore has a certain number of descriptive categories that summaries the actual meaning of the ideology and represent the substantial elements. In the sample of 31 Methodology of the Showcase Study video statements, I identified one discourse, four frames, 11 Sampling. The text documents of the showcase study (tran- narratives, 26 themes, and 55 issues. scripts of AQ video statements) were sampled in several The level of “discourse” is the most comprehensive and stages. Although desirable, representative sampling of docu- general one. In fact, all content belongs to it. Its purpose is to ments was not feasible because an exhaustive register of acknowledge that Jihadism is not mutually exclusive from Jihadi media does not exist. As a work-around for this prob- other Islamist ideologies but remains in a constant discursive lem, I sampled documents from a pool of Jihadi statements relation with them, and therefore can be analyzed as such, for compiled by experts. The selected statements are therefore instance, when conducting a discourse analysis of statements representative of the Jihadi ideology to a certain extent published by AQ vis-à-vis statements from the Islamic State (although this extent is not quantifiable). The final sample or the Muslim Brotherhood. For the purpose of this article, consists of 31 transcripts of AQ video messages (about the analytical unit “discourse” has no further function. 178.000 words). The level “frames” has four descriptive units borrowed from Social Movement Theory (Snow & Benford, 1988; hierarchical levels of analytical units. Using software MAX- Wilson, 1973). Social Movement Theory has an intuitive QDA, I combined a theory-based coding with explorative appeal for the analysis of Islamist movements and has been coding into a hybrid coding design. Therefore, the coding used for this purpose across disciplines (Lohlker, 2013; structure includes both theoretically and empirically driven Snow & Byrd, 2007; Wiktorowicz, 2004a, 2004b). It states units of analysis, also referred to as deductive and inductive that all ideologies are comprised of three principal compo- categories (Mayring, 2000). It has five hierarchical levels: nents, also called frames: The “diagnostic frame” of an ideol- ogy describes (perceived and actual) social problems (i.e., 1. Ideology as discourse (theory driven) “the war on Islam”) and specifies alleged political, economic, 2. Frame (theory driven) and social reasons for these problems. The “prognostic 3. Narrative (global themes) frame” describes the goals the movement pursues, namely, to 4. Theme (organizing themes) replace the unjust status quo with an auspicious alternative 5. Issues (basic themes) (i.e., “the caliphate”) and the “motivational frame” describes strategies how the goals can be achieved (e.g., “jihad”). For Basic, organizing, and global themes (Attride-Stirling, coding purposes, I used a fourth frame (reference frame) as 2001) or codes, categories, and themes/concepts (Saldana, an auxiliary unit to designate all content that is nongenuine, 2013) represent the empirically driven units. To utilize these that is, when the authors of the statements refer to external units for the particular purpose of studying ideologies, I call sources to substantiate their socioreligious positions, claims, them “issues,” “themes,” and “narratives.” They discrimi- and grievances. For instance, Jihadi leaders use theological nate in different degrees between rather abstract and rather evidence (references to Quran and Sunnah) to substantiate concrete content within the Jihadi statements (see Figure 1). their theological argumentation, factual evidence (references Frames and discourse are theory-driven units of analysis and to mainstream media or governmental reports) to back up Armborst 7 their political claims, and aesthetic “evidence” (Islamic journalism or even scholarly argument. It is beyond the pur- poems and lyric) to increase the “narrative fidelity” (Snow & pose of this article to describe all these aspects in detail. The Benford, 1988, p. 210) of their message. important point here is to show the application of the theme When conducting hybrid coding, one can start the coding relation coefficient. procedure top-down by coding the most general units into Figure 2 and Table 1 present some of the quantitative the data, or bottom-up by looking for the smallest informa- results of the showcase study. Within the motivational frame tional units first. Starting with the most general (theoretically of AQ ideology, two narratives and four themes are of par- driven) unit has the advantage that it usually requires little ticular interest in regard to the research objectives: the narra- prior knowledge about the content. It also gives the coder a tive about the (1) “instrumentality of force” in which the first glance into the material so that he gets a rough idea authors describe what they think the movement can actually about the thematic complexity and the approximate number achieve through the use of force. These expectations are fur- of empirically driven (inductive) themes present in the mate- ther detailed within the two themes (1.1) “strategic benefits” rial. In the study of Jihadi media, it was straightforward to and (1.2) “religious benefits.” The other narrative is the (2) recognize whether the author of the statement describes the “justification for the use of force” with its two themes: (2.1) status quo, talks about his vision or utopia, or advices follow- “political justifications” and (2.2) “theological ers to take action. In the most simplistic manner, coding justifications.” frames into ideological statements follows the ABC model To operationalize the broader research objective, I formu- (Account, Better World, Change) of Mark Sedgwick (2012). lated the following working question: Which other narra- Unlike the empirically driven units, frames must be mutually tives, themes, and issues co-occur systematically with (1) exclusive. However, the empirically based subunit of frames and (2), and how strong are the thematic relations between can cut across two or even three frames. them in terms of quantity and quality? Relational analysis The next task is to identify the empirically driven themes. helps to assess how the rationale of violence is embedded in Here the researcher starts from the scratch with nothing else the wider narrative structure of AQ’s ideology, not only in than the four coding heuristics (see above) to guide him. terms of statistical co-occurrence but also in terms of elabo- Processing one statement after the other in no specific order, ration and meaning. I created, modified, deleted, and merged the descriptive units Figure 2 shows the absolute and relative word frequencies in accordance to the coding heuristics, thereby steadily of selected categories. Beginning with the most extensive developing the coding structure. After working through 10 narrative (about apostasy), categories are ranked and grouped statements, the coding structure began to stabilize, meaning according to the hierarchy of the coding structure (frames, that fewer new units emerged and that fewer modifications narratives, themes). The information about word frequencies were necessary to satisfy the coding heuristics. At the end of helps to put the qualitative description of each frame, narra- the first coding iteration, the coding scheme was entirely tive, and theme into a broader perspective about the general stable and the last few documents did not trigger any more outline and composition of Jihadi ideology. It empirically modifications. This indicates that the coding structure repre- supports the observation made in other studies that Jihadism sents the content adequately and also that the sample is satu- is mainly about Islamic rivalry (the near enemy) and to a rated. A second coding iteration was necessary to adjust the lesser degree concerned with geopolitical affairs (the far content of the first processed documents to the finally devel- enemy), but both aspects are certainly connected, as the rela- oped scheme. The final version of the scheme has four tional analysis shows. frames, 11 narratives, 26 themes, and 55 issues. To visualize The coefficients in Table 1 reveal how frequent and how the thematic structure, I created a mind map that depicts all strong categories are linked. It displays c- and t-coefficients 96 categories (Armborst, 2013). for the thematic relation between the four themes about the rationale of violence (rows) and the three narratives within the diagnostic frame (columns). The numbers in the table can Results: Interpreting t-Coefficients and Detecting be interpreted in a similar way than a crosstab with categori- Meta-Themes in Jihadi Media cal variables. To give a reading example of the numbers in the The systematic content analysis approach has helped to clar- table, the narrative about the instrumentality of force (7,657 ify and dissect the otherwise rather indistinct bulk of ideo- words) has two subthemes: strategic benefits (4,973 words) logical messages. The main research objective of the study and religious benefits (2,684 words). These subthemes cor- was to explore the ideological origins of religiously inspired relate to different degrees with the three narratives in the violence in Jihadism. The analysis shows that Jihadism is a diagnostic frame (apostasy, global conflict, and secular gov- complex ideology that touches on a plethora of explicit ernance) and are backed up to different degrees by factual and socioreligious issues. The main thematic structure of the ide- theological evidence from the reference system (nongenuine ology consists of four frames, 11 narratives, 26 themes, and content). To pick out one example, the two categories “strate- 55 issues. It contains rigorous theological argumentation gic benefits” and “apostasy” share 533 words, which corre- mixed with political analysis expressed in the language of spond to a c-coefficient of .036 and a t-coefficient of .060, 8 SAGE Open Figure 2. Text proportion for frames, narratives, and themes. indicating a moderately elaborated and rather infrequent the- aspect of the ideological message. It is powerful because matic relation. only this way AQ can credibly claim religious supremacy As noted before, the coefficients should always be inter- over competing Islamist groups. And it is a vulnerable point preted in connection with a qualitative assessment of the the- because AQ can be (and actually is) criticized for being stra- matic link. When reviewing the meta-content (533 words) tegically and militarily ineffective and therefore not worth of cutting across the themes “strategic benefits” and the “apos- support. Therefore, this aspect of the ideology has to be com- tasy” narrative, it reveals a tacit message: AQ asserts quite municated in a subtle way as to disguise its contradiction. plainly that jihad is as much a matter of strategic choice as it is This observation is an intriguing and important aspect of a matter of Islamic law and individual duty. What they claim AQ’s ideology, much more important than the rather moder- rather implicitly is that this distinction makes them superior to ate correlation of t = .060 would suggest. This demonstrates competing Islamist groups who act much more strategically that it is important to review the statistical results always in (“opportunistic” in the view of AQ). AQ promotes active par- combination with a qualitative assessment. ticipation in jihad, even against all strategic odds, to demon- Other themes systematically co-occur without carrying strate its pristine interpretation of Islam and to claim religious any implicit message. For example, the theme about the stra- supremacy over competing Islamist movements (often labeled tegic benefits of jihad are backed up quite strongly (t = .126) apostates) who refrain from the alleged duty of jihad for purely by factual evidence but not so strongly by theological refer- political and strategic considerations. ences to Quran and Sunnah (t = .017). There is nothing more But why is this claim communicated implicitly rather than to conclude from this observation other than the Jihadists use directly? A plausible explanation is that the strategic flaws of rational (factual) instead of theological reasoning when Jihadi military doctrine are both a powerful and vulnerable describing the strategic utility of Jihadi warfare. Armborst 9 Finally I want to use the showcase study to give an exam- The methodology proposed in this article is applicable in ple of how to interpret the unstandardized coefficient t various scenarios of content analysis and with different types together with the standardized coefficient t and how both of data (interviews, field notes, public documents, and other are affected through the coding pattern. The adjustment coef- text data). The standardized version of the t-coefficient ficient in this study is almost 0 (p = 62/df = 4,560) indicating makes the results from different studies comparable. This is that there are few code overlaps in relation to the overall important because differences in sample sizes and research- number of units. As much as 62% of all retrieved content er’s coding practice can affect the values of conventional ( P ) is coded with more than one code () P , but there are theme relation metrics. The standardized coefficient offsets s r also many degrees of freedom (categories among which the this potential bias and enables researchers to compare results multiply coded text can freely distribute). Between the 96 regardless of sample size, number of extracted categories, units of the coding scheme, there are df = 4,560 possible and extent of code overlaps. (though not observed) bivariate correlations to accommodate Declaration of Conflicting Interests the P = 132.732 words that appear multiple times in the text retrieval. If all 96 units were perfectly independent from The author(s) declared no potential conflicts of interest with respect each other (in other words, if all multiply coded text were to the research, authorship, and/or publication of this article. equally distributed among all 96 units), then any bivariate correlation would be close to .001 indicating that an observed Funding coefficient t, for example t = 0.1, is significantly higher than The author(s) received no financial support for the research, author- the average correlation between two units. Therefore, the ship, and/or publication of this article. standardized coefficients t = 0,9998 × t will take almost the same values as t and must not be reported. Notes 1. Thanks to the anonymous reviewers who gave me instructive feedback. Discussion 2. It is important to note that these coding heuristics deliberately allow two units to overlap. This has the purpose to designate The most important limitation in the use of the proposed text passages that relate to more than one theme and to des- coefficient is to keep in mind, that the statistical “facts” it ignate text passages that lead from one theme to the next. produces are eventually contingent upon coding decisions. This content is of particular interest for the detection of meta- Despite the use of clearly spelled out coding heuristics, there themes. Allowing for overlapping codes also alleviates decision remains some interpretative leeway. It is therefore good sci- making: Coders are not enforced to make potentially arbitrary entific practice to involve several coders and then test inter- either/or choices in cases where content is equivocally associ- coder reliability. ated with more than one theme. But depending on the research Without the aid of content analysis software, it is not context, it may also be appropriate to rule out code overlapping. possible to systematically read between the lines of large 3. “The c-index (structurally resembling the Tanimoto and Jaquard coefficient . . .) assumes separate non-overlapping text samples and to detect latent structures. The proposed text entities” (Friese, S. [2013] p.291. Atlas.ti 7 user guide and theme relation coefficient enables researchers to discover reference. ATLAS.ti Scientific Software Development, Berlin. subtle patterns in verbal content. It allows the researcher http://atlasti.com/de/handbuecher/). to draw analytical conclusions about his study object 4. ATLAS.ti software notifies the user if the ration between two through a transparent and replicable methodology. To sub- codes exceeds a certain threshold (i.e., when one code has been stantiate this claim, this article uses an empirical study on used five times as often as the other). Thanks to an anonymous Jihadi media to demonstrate how the application of the reviewer for this hint. coefficients has produced more generic information about 5. To determine the words frequencies in MAXQDA, the user the ideology of Jihadism as it is communicated in a sam- can use the MAXDictio module. To determine the word fre- ple of Jihadi media. quencies of text that two units share proceed as follows:. Use Unlike conventional co-occurrence (Friese, 2014) or text retrieval function “intersection” or “intersection (Set)” and then use “code the results with new code” to delete mul- code relation metrics that show how often and how consis- tiply coded text passages in the retrievals (credits to Stefan tent two themes co-occur within the text sample, the new Rädiker, for giving me this decisive hint in the support forum). proposed coefficient indicates how much content two units Then retrieve the new code and let Dictio count the word fre- actually share with each other and how elaborated their the- quencies of the retrieval. I recommend to create a copy of the matic link is. The combined use of both coefficients can file once the coding procedure is finished and to perform all add important information to conventional analysis because subsequent analysis (including the creation of new codes as the observation how often and how consistent two themes describes above) with this file. co-occur in the data is not necessarily an indicator for how 6. To obtain this number in MAXQDA, calculate (total word fre- important, relevant, and meaningful this thematic relation quencies in retrieved segment) MINUS (proportion of content is within the research context. with only one code). 10 SAGE Open 7. Again one has to improvise to determine this number in Friese, S. (2014). Qualitative data analysis with ATLAS.ti. London, MAXQDA. The MAXDictio does not count multiply coded England: Sage. text retrievals (except for code–subcode intersections). To Gibbs, G. R. (2007). Media review: Atlas.ti software to assist with obtain the total word frequencies in retrieved segment (i.e., the qualitative analysis of data. Journal of Mixed Methods to deliberately count multiply coded text), retrieve the text of Research, 1, 103-104. all units and copy the retrievals into a new document. Then, Glaser, B. G. (1978). Theoretical sensitivity: Advances in the meth- activate only this document and chose “word frequencies” and odology of grounded theory. Mill Valley, CA: Sociology Press. “only for activated documents.” Glaser, B. G., & Strauss, A. L. (1967) The discovery of grounded 8. P = 132,732 / P = 213,414. theory. Chicago: Aldine Transaction. s s 9. For details about the sampling strategy of the showcase study, Hellmich, C. (2008). Creating the ideology of al Qaeda: From hyp- see Armborst (2013, 64). ocrites to Salafi-Jihadists. Studies in Conflict & Terrorism, 31, 10. Popular alternatives to this software are ATLAS.ti (Bell, 2013; 111-124. Gibbs, 2007), QDA Miner, WordStat, InVivo, or Ethnograph. Jansen, J. J. (1997). The dual nature of Islamic fundamentalism. There are also a number of open source products such as the Ithaca, NY: Cornell University Press. Coding Analysis Toolkit (CAT). Kelle, U., & Kluge, S. (2010). Vom Einzelfall zum Typus. 11. In the showcase study, two methods for code validation were Fallvergleich und Fallkontrastierung in der qualitativen used: second coder and automated text mining. The results are Sozialforschung [From case to type. Case comparison in quali- discussed in Armborst (2013). tative research]. Wiesbaden, Germany: Springer. Krippendorff, K. (1995). On the reliability of unitizing continuous data. Sociological Methodology, 25, 47-76. References Krippendorff, K. (2004). Measuring the reliability of qualitative Angus, D., Rintel, S., & Wiles, J. (2013). Making sense of big text analysis data. Quality & Quantity, 38, 787-800. text: A visual-first approach for analysing text data using Lohlker, R. (2013). Jihadism: Online discourses and representa- Leximancer and Discursis. International Journal of Social tions (Vol. 2). Vienna: Vienna University Press. Research Methodology, 16, 261-267. doi:10.1080/13645579. Mayring, P. (2000). Qualitative content analysis. Qualitative 2013.774186 Sozialforschung, 1, 1-10. Armborst, A. (2013). Jihadi violence: A study of al-Qaeda’s media. Neuendorf, K. A. (2017). The content analysis guidebook. CA, Berlin, Germany: Duncker & Humblot. Thousand Oaks: Sage. Attride-Stirling, J. (2001). Thematic networks: An analytic tool for Oleinik, A. (2011). Mixing quantitative and qualitative content qualitative research. Qualitative Research, 1, 385-405. analysis: Triangulation at work. Quality & Quantity, 45, 859- Bell, D. (2013). Book review: Susanne Friese, Qualitative data 873. doi:10.1007/s11135-010-9399-4 analysis with ATLAS.ti. Qualitative Research, 13, 382-384. Pennebaker, J. W. (2011). Using computer analyses to identify lan- Beutel, A., & Ahmad, I. a. D. (2011). Examining Bin Ladin’s state- guage style and aggressive intent: The secret life of function ments: A quantitative content analysis from 1996 to 2011. words. Dynamics of Asymmetric Conflict, 4, 92-102. Bethesda, MD: Minaret of Freedom Institute. Pennebaker, J. W., & Chung, C. K. (2008). Computerized text anal- Bos, W., & Tarnai, C. (1999). Content analysis in empirical social ysis of Al-Qaeda transcripts. In K. Krippendorf & M. A. Bock research. International Journal of Educational Research, 31, (Eds.), The content analysis reader (pp. 453-465). Thousan 659-671. Oaks, CA: Sage Conway, L. G., III, Gornick, L. J., Houck, S., Towgood, K. H., Rantala, K., & Hellström, E. (2001). Qualitative comparative analy- & Conway, K. R. (2011). The hidden implications of radical sis and a hermeneutic approach to interview data. International group rhetoric: Integrative complexity and terrorism. Dynamics Journal of Social Research Methodology, 4, 87-100. of Asymmetric Conflict, 4, 155-165. Rieger, D., Frischlich, L., & Bente, G. (2013). Propaganda 2.0: Cozzens, J. B. (2007). Approaching al-Qaeda’s warfare: Function, Psychological effects of right-wing and Islamic extremist inter- culture and grant strategy. In M. Ranstorp (Ed.), Mapping ter- net videos. Munich, Germany: Luchterhand. rorism research (pp. 127-163). New York, NY: Routledge. Sageman, M. (2014). The stagnation in terrorism research. Creswell, J. W., & Plano Clark, V. L. (2011). Designing and con- Terrorism and Political Violence, 26, 565-580. ducting mixed methods research. London, England: Sage. Saldana, J. (2013). The coding manual for qualitative researchers Eveslage, B. S. (2013). Clarifying Boko Haram’s transnational (2nd ed.). Thousand Oaks, CA: Sage. intentions, using content analysis of public statements in 2012. Salem, A., Reid, E., & Chen, H. (2008). Multimedia content cod- Perspectives on Terrorism, 7(5), 47-67. ing and analysis: Unraveling the content of Jihadi extremist Fakis, A., Hilliam, R., Stoneley, H., & Townend, M. (2014). groups’ videos. Studies in Conflict & Terrorism, 31, 605-626. Quantitative analysis of qualitative information from inter- Sedgwick, M. (2004). Al-Qaeda and the nature of religious terror- views: A systematic literature review. Journal of Mixed ism. Terrorism and Political Violence, 16, 795-814. Methods Research, 8, 139-161. Sedgwick, M. (2012). Jihadist ideology, Western counter-ideology, Fereday, J., & Muir-Cochrane, E. (2006). Demonstrating rigor using and the ABC model. Critical Studies on Terrorism, 5, 359-372. thematic analysis: A hybrid approach of inductive and deduc- Smith, A. E. (2003). Automatic extraction of semantic networks tive coding and theme development. International Journal of from text using Leximancer. Paper presented at the Proceedings Qualitative Methods, 5, 80-92. of the 2003 Conference of the North American Chapter of Friese, S. (2013). Atlas.ti 7 user guide and reference. ATLAS.ti the Association for Computational Linguistics on Human Scientific Software Development, Berlin. http://atlasti.com/de/ Language Technology June 2003, Demonstrations-Volume 4. handbuecher/ Edmonton, Canada. Armborst 11 Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsuper- Symonds, J. E., & Gorad, S. (2008). The death of mixed meth- vised semantic mapping of natural language with Leximancer ods: Research labels and their casualties British Educational concept mapping. Behavior Research Methods, 38, 262-279. Research Association Annual Conference, Edinburgh. https:// Smith, A. G. (2004). From words to action: Exploring the relation- www.leeds.ac.uk/educol/documents/174130.pdf ship between a group’s value references and its likelihood of Torres, M. R., Jordán, J., & Horsburgh, N. (2006). Analysis and engaging in terrorism. Studies in Conflict & Terrorism, 27, evolution of the global Jihadist movement propaganda. 409-437. Terrorism and Political Violence, 18, 399-421. Smith, A. G. (2008). The implicit motives of terrorist groups: How Wiktorowicz, Q. (2004a). Framing jihad: Intramovement fram- the needs for affiliation and power translate into death and ing contests and al-Qaeda’s struggle for sacred author- destruction. Political Psychology, 29, 55-75. ity. International Review of Social History, 49(Suppl. 12), Smith, A. G., Suedfeld, P., Conway, L. G., III, & Winter, D. G. 159-177. (2008). The language of violence: Distinguishing terrorist from Wiktorowicz, Q. (2003). Islamic activism: A social movement nonterrorist groups by thematic content analysis. Dynamics of theory approach. Bloomington, IN: Indiana University Asymmetric Conflict, 1, 142-163. Press. Snow, D. A., & Benford, R. D. (1988). Ideology, frame resonance, Wilson, J. (1973). Introduction to social movements. New York, and participant mobilization. International Social Movement NY: Basic Books. Research, 1, 197-217. Snow, D. A., & Byrd, S. (2007). Ideology, framing processes, and Author Biography Islamic terrorist movements. Mobilization: An International Quarterly, 12, 119-136. Andreas Armborst is a criminologist and head of the National Stockwell, P., Colomb, R. M., Smith, A. E., & Wiles, J. (2009). Use Center for Crime Prevention in Bonn, Germany. Previously he has of an automatic content analysis tool: A technique for seeing been A Marie Curie Fellow at the School of Law, University Leeds, both local and global scope. International Journal of Human- and a researcher at the Max Planck Institute for Foreign and Computer Studies, 67, 424-436. International Criminal Law. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png SAGE Open SAGE

Thematic Proximity in Content Analysis:

SAGE Open , Volume 7 (2): 1 – Jun 16, 2017

Loading next page...
 
/lp/sage/thematic-proximity-in-content-analysis-eL3M0b03oa
Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Inc, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
2158-2440
eISSN
2158-2440
DOI
10.1177/2158244017707797
Publisher site
See Article on Publisher Site

Abstract

This article explains how to calculate thematic proximity within a mixed methods content analysis approach. Thematic proximity of two themes can indicate the presence of meta-themes. Meta-themes are themes which acquire their meaning through the systematic co-occurrence of two or more other themes. By combining qualitative and quantitative techniques of content analysis, the researcher can reveal these latent text patterns. Using a study on Jihadi media as a showcase, the article describes how to detect meta-themes through content analysis. To this end, the article introduces a novel theme-correlation coefficient that adds valuable information to traditional theme relation metrics. It enables researchers to make new empirical observations in text data. Keywords content analysis, concept mapping, qualitative content analysis, mixed methods, communication studies, communication, social sciences, human communication, Jihadi ideology, Jihadism Introduction Literature Review: Content Analysis and Theme Relation Metrics There are different ways to measure thematic proximity and code relations in content analysis. This article reviews In practice, many researchers combine different strands of some of them and introduces on this basis a new theme content analysis into hybrid (inductive-deductive; Fereday & relation coefficient. A theme is a generalized and sum- Muir-Cochrane, 2006) and mixed methods (qualitative and marizing description for a set of interrelated issues. In the quantitative) designs. technical sense, “a theme is an outcome of coding [and] Content includes practices as diverse as fully automated categorisation” (Saldana, 2013, p. 14), whereby codes, text mining approaches (Angus, Rintel, & Wiles, 2013; A. E. categories, and themes represent different levels of the Smith, 2003; A. E. Smith & Humphreys, 2006; Stockwell, researcher’s abstraction from the original data. “Data” in Colomb, Smith, & Wiles, 2009) and hermeneutic approaches content analysis can be anything that has “content,” but (Rantala & Hellström, 2001). Often its purpose is to sum- this article exclusively focuses on text data. Following maries, retrieve, and analyze information from documents. A the hierarchical order of codes, categories, and themes, core task therefore is to identify meaningful clusters of infor- this article explains how to analyze relations between mation often referred to as themes, concepts, codes, or cate- these analytical units. It refers to these relations as “the- gories. There are numerous interpretative and algorithm-based matic relations.” techniques to do so but there are only two directions from Used in combination with existing theme relation coeffi- which a researcher can apply these techniques: Themes can cients, the proposed coefficient can reveal how frequent, be identified following inductive (observation based) coding consistent, and elaborated themes, categories, and codes and deductive (theory based) coding (Glaser, 1978; Glaser & relate to each other. This information helps researchers to Strauss, 1967; Mayring, 2000). It is also possible to approach identify meta-themes; themes that are implicitly rather than explicitly stated in textual data. The analysis of thematic proximity inquires the subtext 1 National Center for Crime Prevention (NZK), Bonn, Germany of verbal information in a standardized fashion. As a Corresponding Author: showcase, the article presents a content analysis study on Andreas Armborst, National Center for Crime Prevention (NZK), c/o Jihadi statements from al-Qaeda (AQ) leaders and demon- German Federal Ministry of the Interior, Graurheindorfer Straße 198, strates how the detection of meta-themes works in research 53117 Bonn, Germany. practice. Email: andreas.armborst@bmi.bund.de Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open the data from both directions which is then referred to as Heuristic 2: Internal homogeneity: to maximize cohesive hybrid (observation and theory based) coding of text data. validity (all content of one particular unit is clearly similar). Heuristic 3: External heterogeneity: to maximize dis- criminant validity (the content of two different units is Observation and Theory-Based Coding clearly about two different things). The purpose of explorative (inductive) content analysis is to Heuristic 4: Code overlaps: Following Heuristics 2 identify analyzable units (codes) in primary or secondary and 3, codes overlap as sparse as possible and as often as text data (newspapers, office documents, interview tran- necessary. scripts, field notes, etc.) and to summaries them under meaningful labels (categories). Depending on how complex In explorative content analyses, the researcher usually has the material under investigation is, the researcher has to to code the entire data set, or at least substantial parts of it, decide how to organize the units of analysis. There are at several times until the coding scheme becomes stable. During least two approaches to this: According to the Coding these iterations, the coder creates, modifies, deletes, and Manual for Qualitative Researchers (Saldana, 2013) on the merges the units in accordance to the coding heuristics, first level, the researcher attaches a code to certain segments thereby steadily developing the taxonomy. of text. On the second level, he organizes interrelated codes When conducting theory-based (deductive) coding, the into categories thereby creating a taxonomy or category researcher starts with a given set of analytical units (codes, scheme with different categories and subcategories. On the categories themes), that is, the number and the label of units third level, the researcher arranges groups of categories into is fixed through the theory from which they derive. Coding themes. On top of this pyramid stands a “theory” about the Heuristic 1 therefore does not apply to theory-based coding, subject as the result of the analysis. but Heuristics 2 to 4 do. If internal and external homogeneity Other approaches allow for coding themes directly into cannot be achieved, then this indicates a mismatch between the data, without the process of extracting codes and building the theory and the data. categories. This is called thematic coding (or “themeing” the data according to Saldana, 2013, p. 175). Thematic Proximity According to one common practice of thematic coding (Attride-Stirling, 2001), one can use three hierarchical lev- Coding is a time-consuming and tedious process and of els, or category classes named basic, organizing, and global course not an end in itself. One purpose of coding is to reduce themes to discriminate in different degrees between rather complex text structures to analyzable units. A fully coded set abstract and rather concrete content. A basic theme is “the of documents enables the researcher to address a wide range most basic or lowest-order theme that is derived from the of research questions with a large repertoire of (qualitative textual data,” an organizing theme is “a middle-order theme and quantitative) analytical approaches. One of them is the that organizes the Basic Themes into clusters of similar analysis of thematic proximity (the relation between units). issues,” and global themes “are super-ordinate themes that Its purpose is to identify latent patterns in the content that encompass the principal metaphors in the data as a whole” cannot be observed by simply reading the material. The anal- (Attride-Stirling, 2001, p. 388). ysis of latent patterns is called relational content analysis. So the main point of difference between the two Relational content analysis usually combines qualitative approaches is whether the researcher can apply (or identify) and statistical interpretation of verbal data into one coher- an analytical unit from the third level directly to data. The ent instrument (Bos & Tarnai, 1999). Still it is not a mixed common ground of both approaches, and this is the decisive methodology in the strict sense of the term, insofar as it point of the methodology proposed here, is that these ana- does not necessarily require collecting “both quantitative lytical units have a hierarchical order. and qualitative data” (Creswell & Plano Clark, 2011, Next to this organizational structure, the coding procedure p. 276; Onwuegbuzie & Teddlie, 2003). The “mix” is the also requires coding heuristics: standardized rules that guide stage of analysis where a numeric coefficient indicates how the decision of the researcher about when to create a new ana- strong two themes are related to each other. Still, the detec- lytical unit, how to label it, and how to separate codes, catego- tion and interpretation of meta-themes goes beyond “quan- ries, and themes from each other. It is important to spell out titative analysis of qualitative information” (Fakis, Hilliam, rules and thereby make the coding and classification process Stoneley, & Townend, 2014), and is more than just “two as transparent and replicable as possible. For calculating the separate approaches to studying the same phenomena” theme-correlation coefficient, use the following heuristics (Symonds & Gorad, 2008, p. 11). (Heuristics 1-3; taken from Kelle & Kluge, 2010). There are different means available to determine the the- matic proximity of two descriptive units. The work of Oleinik Heuristic 1: Sparseness: to use as few analytical units as (2011) provides a useful overview. Cosine similarity, for possible and as much as necessary to capture all content example, is a vector-based method often used in automated adequately. text mining, such as Leximancer. Armborst 3 To a certain extent, theme relation coefficients resemble Proposed Methodology: The t- metrics for intercoder reliability, such as Krippendorff’s Coefficient alpha (Krippendorff, 1995, 2004; Neuendorf, 2017). Both The c-coefficient measures how often and consistently two indicate code overlapping, however, for different purposes. It codes (units) co-occur throughout the entire text sample but could be worthwhile to “hack” alpha coefficients in a way disregards how elaborated their relation is. This, however, is that they indicate thematic proximity instead of intercoder a valuable piece of information about thematic structure of reliability. the content. The proposed coefficient therefore indicates Another means to determine thematic proximity is to ana- how much content two descriptive units actually share with lyze the frequency and pattern of theme co-occurrences each other in terms of words frequencies. This can be rele- within a given set of documents. The c-coefficient used in vant because the information about how frequent and how content analysis software ATLAS.ti (Friese, 2014) measures consistent two themes co-occur does not necessarily tell any- how often and consistently two codes co-occur or overlap thing about how important or elaborated the thematic link is throughout the entire text sample. The c-coefficient repre- within the research context. Taken together, the two coeffi- sents the frequencies and patterns of code co-occurrence cients can reveal latent structures in text samples that might “similar to a correlation coefficient statistics” (Friese, 2014, constitute a meaningful meta-theme. I refer to the proposed p. 189). It is based on the Jaccard similarity coefficient. coefficient as the t-coefficient (t for theme). It is defined as Many content analysis software packages do not provide this function, but the researcher can use the code retrieval func-   1 n n tion “near within one paragraph” to determine the number of 12 12 t =+ ,   co-occurrences, and then calculate the c-coefficient manu- 2 n n  1 2  ally using the formula (Friese, 2014, p. 190): with = the total number of words classified with Code 1, n n 12 = the total number of words classified with Code 2, and c = , n = the number of intersecting words between Code 1 and nn + − n 12 () 12 12 Code 2. with n = the number of occurrences of Code A, n = the The t-coefficient measures the average proportion of con- 1 2 number of occurrences of Code B, and n = the number of tent that two descriptive units share with one another. It can co-occurrences of both codes. take values between 0 (indicating mutual exclusiveness of Co-occurrence means that that both codes either code the coded text segments) and 1 (indicating complete overlap same segment or overlapping segments. The coefficient can [congruency] of coded text segments). take values between 0 (indicating perfect independence) and A t-coefficient of 0.10, for example, provides the informa- 1 (indicating perfect relation). The greater the discrepancy tion that each of the two units shares on average 10% of its between n and n , the smaller are the highest possible val- content with the other unit (i.e., usually none of the two cat- 1 2 ues of c. For example, if the number of code occurrence of egories shares exactly 10% unless they both have the same Code A are twice as high as those of Code B ( nn =× 2 ), the number of words). In combination with the c-coefficient we maximum value of c is 0.5 indicating that Code B occurs can also say how often and how consistently this co-occur- always in combination with Code A, whereas Code A occurs rence appears. Large t-values combined with low c-values in 50% of occurrences together with Code B. indicate that the link is elaborated but not frequent and con- The c-coefficient has two important limitations: First, it sistent, whereas large t-values combined with large c-values can underestimate the strength of a thematic relation when indicate that the thematic link is elaborated, frequent, and one analytical unit has considerable more codings (the num- consistent. The t-values greater than .5 should be interpreted ber of discrete text segments that are associated with a given with caution because it might indicate a lack of discriminant code) than the other. The c-coefficient does not take into validity, that is, two themes are so closely related, that they consideration the proportion of overlapping content. are not distinguishable and may actually represent the same Therefore, it can remain low although the thematic link theme. If this happens, then it could indicate a violation of might be quite elaborated in terms of word frequencies. the coding Heuristic 3 (see above). Second, it is not standardized and disregards the overall cod- ing pattern of the data set making it difficult to compare Standardizing the Theme Relation Coefficient c-coefficients from different studies. To prevent this loss of information, the following section It is important to note that the sample size, coding heuristics, proposes an additional coefficient that takes into account not and number of descriptive units can affect the t-coefficient. the frequency of code co-occurrences but the proportions of What may be a high coefficient within the scope of one study text intersections based on word frequencies. Taken together, may indicate a rather weak thematic relation within the other. these two coefficients can better assess the qualitative and This obstructs comparability of the t-coefficient between two quantitative relation of two themes. studies. To eliminate the influence of the coding practice on 4 SAGE Open the results, it is necessary to calculate the standardized Interpreting t-Coefficients t-coefficient t . The standardized t-coefficient eliminates the There are two ways to judge whether a given t-value indi- influence stemming from different coding practices, namely, cates a weak, moderate, or a strong relation. First is to com- the overall proportion of multiply coded content, and the pare different t-values with each other. As can be seen in number of units in the coding scheme. The more content is Table 1, the highest correlation between two categories is coded by multiple units and the fewer the number of units in .129 between the theme “theological justifications for the use the coding scheme are, the higher are the average values of of force” and the narrative about the “global conflict.” the t-coefficient (and vice versa). This is due to the fact that Compared with other thematic links, this is strong. The the unstandardized t-coefficient is based on the proportion of c-coefficient (.058) is also comparably high signaling that intersecting words between units. To compare the coefficient this link is also more frequent and consistent than most other between different content analysis studies, it is therefore nec- links in the table. essary to consider the net effect caused by coding practices. Another way to judge the strength of the correlation is to The standardized t-coefficient is adjusted in regard to compare observed t-values against the unobserved t-values these two general coding patterns. It is calculated in four of two mutually independent themes (here U = .0002). A steps. First is to calculate the proportion of text retrievals t-value of .129 then indicates that the correlation is signifi- with more than one coding in relation to the sum of all text cantly different from independence. It is also possible to base retrievals: this benchmark test on randomly, instead of equally distrib- uted content. This would in some way resemble the statistical P = , test for significance and could be the method of choice in quantitative content analysis. The standardized coefficient t is always smaller than the where p = the word frequency of text retrievals with more 6 observed coefficient t, but it has the advantage that it is not than one code in the entire sample, and p = the word fre- 7 affected by coding practices, such as the number of units in quency of all retrieved text segments. the category scheme, and therefore is more suitable to com- P states the extent of multiply coded text in all docu- pare results. ments of the sample. The P value in the showcase study is P = .62 and states that 62% of all coded words are coded with more than one unit. P has to be interpreted in rela- Detecting Meta-Themes, or How to “Read tion to the degrees of freedom, that is, the number of all Between the Lines” of Qualitative Data possible (yet, not measured) bivariate correlations. The higher the number of categories within the coding scheme, When two or more descriptive units systematically co-occur and with that, the number of possible bivariate correla- in the text data and when the co-occurrence is not only fre- tions, the lower is the average influence of P on any given quent but also elaborated in terms of word frequencies, then bivariate correlation. this can indicate the presence of a meta-theme. The two coef- The degrees of freedom are determined by the number of ficients therefore are quantitative indicators for meta-themes. potential correlations between the k categories. They are cal- Meta-themes are themes which acquire their meaning culated by dividing all fields of the code-correlation matrix through the systematic co-occurrence of two or more other ( k ) minus the fields in the diagonal by two. themes. The prefix “meta” means that these themes are themes of a higher informational order, or in other words, kk − they are not explicitly but implicitly communicated within () df = . the content. A meta-theme might mark subconscious com- munication and tells the researcher something about the The next step is to calculate the adjustment coefficient U. source, namely, that it systematically refers to two distinct It indicates the average bivariate correlation if all multiply themes. coded text were equally distributed among all units of the The statistical coefficients should always be interpreted in category scheme. It works as a baseline comparison for the combination with a qualitative assessment of the meta-con- observed correlation t. tent. Not every thematic correlation is necessarily a meta- The adjustment coefficient U is UP = / df . theme. Likewise, the detection of a meta-theme does not The standardized t-coefficient is: tU =− () 1 t . necessarily reveal the reason why the originator communi- It is in the judgment of the researcher to decide whether to cates subconsciously and not explicitly and intentional. This report every single standardized t-coefficient. In some cases, question can be answered only within the context of a par- it might be sufficient just to report the overall adjustment ticular study. coefficient U, namely, when U is so small that it hardly To give one example of subconscious communication, we affects the difference between t and t . now turn to the showcase study. s Armborst 5 Table 1. Theme Relations. Diagnostic frame Reference system Themes Apostasy Global Secular Factual evidence Theological evidence issues (39,128) (20,730) (10,428) (46,641) (21,832) Instrumentality of force (7,657) Strategic benefits (4,973) Intersection 533 488 187 1,131 137 c-coefficient .036 .045 .048 .031 .009 t-coefficient .060 .061 .028 .126 .017 Religious benefits (2,684) Intersection 603 82 283 312 71 c-coefficient .018 .005 .017 .006 .026 t-coefficient .120 .017 .066 .061 .015 Justification for the use of force (13,075) Political justifications (3,010) Intersection 15 518 69 132 51 c-coefficient .011 .066 .016 .024 .003 t-coefficient .003 .099 .015 .023 .010 Theological justifications (10,065) Intersection 1,925 1,753 1,61 1,033 1,182 c-coefficient .050 .058 .023 .016 .110 t-coefficient .120 .129 .016 .062 .086 guidelines of academic conduct” (Hellmich, 2008, p. 111). Worked Example The availability of primary sources coincided with the Within Islamic studies, the “unusual combination of logic, “post-9/11 money surge into terrorism studies” for which religion, politics and violence” of Islamism has been Marc Sageman (2014) provocatively diagnosed “deleterious acknowledged (Jansen, 1997, p. xvi). This “dual nature of effect” (p. 566). Although there are also examples of good Islamic Fundamentalism” (Cozzens, 2007; Sedgwick, 2004) scientific practice, terrorism studies have not yet exploited is the point of departure for this showcase study. Jihadi ideol- the full potential of content analysis approaches. ogy comprises not only strategic thinking, rational argument, Authors of studies who apply content analysis tech- and common sense logic but also doctrine, theological rea- niques often remain descriptive. Eveslage (2013), for soning, and religious fanaticism. To date, there is no system- instance, counted the number of threats against domestic atic empirical research on the question how exactly both and foreign targets within 23 public statements of the rationalities are connected. The showcase study demon- Nigerian Jihadi group Boko Haram. Torres, Jordán, and strates how the analysis of meta-themes in Jihadi ideological Horsburgh (2006) used qualitative and quantitative the- statements can shed light on this link. Its objective is to matic coding to summaries a sample of 2,878 documents explore the ideological origins of religiously inspired vio- from AQ. Salem, Reid, and Chen (2008) classified 706 lence through content analysis of public statements from media files produced by Jihadi groups in regard to their AQ’s leadership. production features, purpose and usage as documentary, propaganda, operational, hostage, executions, statement/ communique, tribute/eulogy, training, and instructional Literature Review: Content Analysis of Jihadi videos. Pennebaker and Chung (2008) described differ- Media ences in linguistic styles between bin Laden and Zawahiri, Over the last 15 years, the Jihadi movement has produced an and Beutel and Ahmad (2011) inferred from their analysis abundance of media and propaganda material, and the aca- of 49 bin Laden speeches, that the now deceased leader of demic community was not idle to investigate this material the Jihadi movement cited policy-based grievances for his with a great deal of interest. Despite the wealth of available militancy twice as often as religious-based ones. data and scholarly work, systematic content analysis of this Descriptive content analysis of Jihadi media gave material is still the exception. It seems that the availability researchers a first glance into the wealth of data but to come of highly interesting and politically relevant research mate- to more generic conclusions about the groups who communi- rial was conductive for an atmosphere in which “the terror- cate these messages, more sophisticated analysis is needed. ism studies community seems to have deviated from the A common approach in terrorism studies therefore is to 6 SAGE Open compare extremist groups who engage in violence with those who do not (A. G. Smith, 2004). For example, A. G. Smith (2008) and A. G. Smith, Suedfeld, Conway, and Winter (2008) applied three psychological measurement constructs (value reference, motive imagery, integrative complexity) to media content of violent and nonviolent Islamist groups, and identified those variables that are statistically significant pre- dictors to distinguish between groups. Conway, Gornick, Figure 1. Category classes. Houck, Towgood, and Conway (2011) investigated “hidden Note. AQ = al-Qaeda. implications of radical group rhetoric” by analyzing random text samples with integrative complexity coding from violent and nonviolent Islamist groups. Pennebaker (2011) identi- represent the most general characteristics of Jihadi ideology. fied in a text sample of 296 documents statistically signifi- Discourse, frame, narrative, theme, and issue represent dif- cant predictors for a violent attack in the 2 to 6 months ferent hierarchical levels the coding scheme. They represent following the statement of the group. Rieger, Frischlich, and the functional elements of ideologies—the mechanisms Bente (2013) integrated ethnographic content analysis of through which they frame the world—but they do not tell Jihadi and right wing media into a randomized experimental anything about the actual grievances, claims, positions, strat- design to investigate the individual’s response to ideological egies, and visions of the movement that embraces this ideol- messaging. ogy. Each level therefore has a certain number of descriptive categories that summaries the actual meaning of the ideology and represent the substantial elements. In the sample of 31 Methodology of the Showcase Study video statements, I identified one discourse, four frames, 11 Sampling. The text documents of the showcase study (tran- narratives, 26 themes, and 55 issues. scripts of AQ video statements) were sampled in several The level of “discourse” is the most comprehensive and stages. Although desirable, representative sampling of docu- general one. In fact, all content belongs to it. Its purpose is to ments was not feasible because an exhaustive register of acknowledge that Jihadism is not mutually exclusive from Jihadi media does not exist. As a work-around for this prob- other Islamist ideologies but remains in a constant discursive lem, I sampled documents from a pool of Jihadi statements relation with them, and therefore can be analyzed as such, for compiled by experts. The selected statements are therefore instance, when conducting a discourse analysis of statements representative of the Jihadi ideology to a certain extent published by AQ vis-à-vis statements from the Islamic State (although this extent is not quantifiable). The final sample or the Muslim Brotherhood. For the purpose of this article, consists of 31 transcripts of AQ video messages (about the analytical unit “discourse” has no further function. 178.000 words). The level “frames” has four descriptive units borrowed from Social Movement Theory (Snow & Benford, 1988; hierarchical levels of analytical units. Using software MAX- Wilson, 1973). Social Movement Theory has an intuitive QDA, I combined a theory-based coding with explorative appeal for the analysis of Islamist movements and has been coding into a hybrid coding design. Therefore, the coding used for this purpose across disciplines (Lohlker, 2013; structure includes both theoretically and empirically driven Snow & Byrd, 2007; Wiktorowicz, 2004a, 2004b). It states units of analysis, also referred to as deductive and inductive that all ideologies are comprised of three principal compo- categories (Mayring, 2000). It has five hierarchical levels: nents, also called frames: The “diagnostic frame” of an ideol- ogy describes (perceived and actual) social problems (i.e., 1. Ideology as discourse (theory driven) “the war on Islam”) and specifies alleged political, economic, 2. Frame (theory driven) and social reasons for these problems. The “prognostic 3. Narrative (global themes) frame” describes the goals the movement pursues, namely, to 4. Theme (organizing themes) replace the unjust status quo with an auspicious alternative 5. Issues (basic themes) (i.e., “the caliphate”) and the “motivational frame” describes strategies how the goals can be achieved (e.g., “jihad”). For Basic, organizing, and global themes (Attride-Stirling, coding purposes, I used a fourth frame (reference frame) as 2001) or codes, categories, and themes/concepts (Saldana, an auxiliary unit to designate all content that is nongenuine, 2013) represent the empirically driven units. To utilize these that is, when the authors of the statements refer to external units for the particular purpose of studying ideologies, I call sources to substantiate their socioreligious positions, claims, them “issues,” “themes,” and “narratives.” They discrimi- and grievances. For instance, Jihadi leaders use theological nate in different degrees between rather abstract and rather evidence (references to Quran and Sunnah) to substantiate concrete content within the Jihadi statements (see Figure 1). their theological argumentation, factual evidence (references Frames and discourse are theory-driven units of analysis and to mainstream media or governmental reports) to back up Armborst 7 their political claims, and aesthetic “evidence” (Islamic journalism or even scholarly argument. It is beyond the pur- poems and lyric) to increase the “narrative fidelity” (Snow & pose of this article to describe all these aspects in detail. The Benford, 1988, p. 210) of their message. important point here is to show the application of the theme When conducting hybrid coding, one can start the coding relation coefficient. procedure top-down by coding the most general units into Figure 2 and Table 1 present some of the quantitative the data, or bottom-up by looking for the smallest informa- results of the showcase study. Within the motivational frame tional units first. Starting with the most general (theoretically of AQ ideology, two narratives and four themes are of par- driven) unit has the advantage that it usually requires little ticular interest in regard to the research objectives: the narra- prior knowledge about the content. It also gives the coder a tive about the (1) “instrumentality of force” in which the first glance into the material so that he gets a rough idea authors describe what they think the movement can actually about the thematic complexity and the approximate number achieve through the use of force. These expectations are fur- of empirically driven (inductive) themes present in the mate- ther detailed within the two themes (1.1) “strategic benefits” rial. In the study of Jihadi media, it was straightforward to and (1.2) “religious benefits.” The other narrative is the (2) recognize whether the author of the statement describes the “justification for the use of force” with its two themes: (2.1) status quo, talks about his vision or utopia, or advices follow- “political justifications” and (2.2) “theological ers to take action. In the most simplistic manner, coding justifications.” frames into ideological statements follows the ABC model To operationalize the broader research objective, I formu- (Account, Better World, Change) of Mark Sedgwick (2012). lated the following working question: Which other narra- Unlike the empirically driven units, frames must be mutually tives, themes, and issues co-occur systematically with (1) exclusive. However, the empirically based subunit of frames and (2), and how strong are the thematic relations between can cut across two or even three frames. them in terms of quantity and quality? Relational analysis The next task is to identify the empirically driven themes. helps to assess how the rationale of violence is embedded in Here the researcher starts from the scratch with nothing else the wider narrative structure of AQ’s ideology, not only in than the four coding heuristics (see above) to guide him. terms of statistical co-occurrence but also in terms of elabo- Processing one statement after the other in no specific order, ration and meaning. I created, modified, deleted, and merged the descriptive units Figure 2 shows the absolute and relative word frequencies in accordance to the coding heuristics, thereby steadily of selected categories. Beginning with the most extensive developing the coding structure. After working through 10 narrative (about apostasy), categories are ranked and grouped statements, the coding structure began to stabilize, meaning according to the hierarchy of the coding structure (frames, that fewer new units emerged and that fewer modifications narratives, themes). The information about word frequencies were necessary to satisfy the coding heuristics. At the end of helps to put the qualitative description of each frame, narra- the first coding iteration, the coding scheme was entirely tive, and theme into a broader perspective about the general stable and the last few documents did not trigger any more outline and composition of Jihadi ideology. It empirically modifications. This indicates that the coding structure repre- supports the observation made in other studies that Jihadism sents the content adequately and also that the sample is satu- is mainly about Islamic rivalry (the near enemy) and to a rated. A second coding iteration was necessary to adjust the lesser degree concerned with geopolitical affairs (the far content of the first processed documents to the finally devel- enemy), but both aspects are certainly connected, as the rela- oped scheme. The final version of the scheme has four tional analysis shows. frames, 11 narratives, 26 themes, and 55 issues. To visualize The coefficients in Table 1 reveal how frequent and how the thematic structure, I created a mind map that depicts all strong categories are linked. It displays c- and t-coefficients 96 categories (Armborst, 2013). for the thematic relation between the four themes about the rationale of violence (rows) and the three narratives within the diagnostic frame (columns). The numbers in the table can Results: Interpreting t-Coefficients and Detecting be interpreted in a similar way than a crosstab with categori- Meta-Themes in Jihadi Media cal variables. To give a reading example of the numbers in the The systematic content analysis approach has helped to clar- table, the narrative about the instrumentality of force (7,657 ify and dissect the otherwise rather indistinct bulk of ideo- words) has two subthemes: strategic benefits (4,973 words) logical messages. The main research objective of the study and religious benefits (2,684 words). These subthemes cor- was to explore the ideological origins of religiously inspired relate to different degrees with the three narratives in the violence in Jihadism. The analysis shows that Jihadism is a diagnostic frame (apostasy, global conflict, and secular gov- complex ideology that touches on a plethora of explicit ernance) and are backed up to different degrees by factual and socioreligious issues. The main thematic structure of the ide- theological evidence from the reference system (nongenuine ology consists of four frames, 11 narratives, 26 themes, and content). To pick out one example, the two categories “strate- 55 issues. It contains rigorous theological argumentation gic benefits” and “apostasy” share 533 words, which corre- mixed with political analysis expressed in the language of spond to a c-coefficient of .036 and a t-coefficient of .060, 8 SAGE Open Figure 2. Text proportion for frames, narratives, and themes. indicating a moderately elaborated and rather infrequent the- aspect of the ideological message. It is powerful because matic relation. only this way AQ can credibly claim religious supremacy As noted before, the coefficients should always be inter- over competing Islamist groups. And it is a vulnerable point preted in connection with a qualitative assessment of the the- because AQ can be (and actually is) criticized for being stra- matic link. When reviewing the meta-content (533 words) tegically and militarily ineffective and therefore not worth of cutting across the themes “strategic benefits” and the “apos- support. Therefore, this aspect of the ideology has to be com- tasy” narrative, it reveals a tacit message: AQ asserts quite municated in a subtle way as to disguise its contradiction. plainly that jihad is as much a matter of strategic choice as it is This observation is an intriguing and important aspect of a matter of Islamic law and individual duty. What they claim AQ’s ideology, much more important than the rather moder- rather implicitly is that this distinction makes them superior to ate correlation of t = .060 would suggest. This demonstrates competing Islamist groups who act much more strategically that it is important to review the statistical results always in (“opportunistic” in the view of AQ). AQ promotes active par- combination with a qualitative assessment. ticipation in jihad, even against all strategic odds, to demon- Other themes systematically co-occur without carrying strate its pristine interpretation of Islam and to claim religious any implicit message. For example, the theme about the stra- supremacy over competing Islamist movements (often labeled tegic benefits of jihad are backed up quite strongly (t = .126) apostates) who refrain from the alleged duty of jihad for purely by factual evidence but not so strongly by theological refer- political and strategic considerations. ences to Quran and Sunnah (t = .017). There is nothing more But why is this claim communicated implicitly rather than to conclude from this observation other than the Jihadists use directly? A plausible explanation is that the strategic flaws of rational (factual) instead of theological reasoning when Jihadi military doctrine are both a powerful and vulnerable describing the strategic utility of Jihadi warfare. Armborst 9 Finally I want to use the showcase study to give an exam- The methodology proposed in this article is applicable in ple of how to interpret the unstandardized coefficient t various scenarios of content analysis and with different types together with the standardized coefficient t and how both of data (interviews, field notes, public documents, and other are affected through the coding pattern. The adjustment coef- text data). The standardized version of the t-coefficient ficient in this study is almost 0 (p = 62/df = 4,560) indicating makes the results from different studies comparable. This is that there are few code overlaps in relation to the overall important because differences in sample sizes and research- number of units. As much as 62% of all retrieved content er’s coding practice can affect the values of conventional ( P ) is coded with more than one code () P , but there are theme relation metrics. The standardized coefficient offsets s r also many degrees of freedom (categories among which the this potential bias and enables researchers to compare results multiply coded text can freely distribute). Between the 96 regardless of sample size, number of extracted categories, units of the coding scheme, there are df = 4,560 possible and extent of code overlaps. (though not observed) bivariate correlations to accommodate Declaration of Conflicting Interests the P = 132.732 words that appear multiple times in the text retrieval. If all 96 units were perfectly independent from The author(s) declared no potential conflicts of interest with respect each other (in other words, if all multiply coded text were to the research, authorship, and/or publication of this article. equally distributed among all 96 units), then any bivariate correlation would be close to .001 indicating that an observed Funding coefficient t, for example t = 0.1, is significantly higher than The author(s) received no financial support for the research, author- the average correlation between two units. Therefore, the ship, and/or publication of this article. standardized coefficients t = 0,9998 × t will take almost the same values as t and must not be reported. Notes 1. Thanks to the anonymous reviewers who gave me instructive feedback. Discussion 2. It is important to note that these coding heuristics deliberately allow two units to overlap. This has the purpose to designate The most important limitation in the use of the proposed text passages that relate to more than one theme and to des- coefficient is to keep in mind, that the statistical “facts” it ignate text passages that lead from one theme to the next. produces are eventually contingent upon coding decisions. This content is of particular interest for the detection of meta- Despite the use of clearly spelled out coding heuristics, there themes. Allowing for overlapping codes also alleviates decision remains some interpretative leeway. It is therefore good sci- making: Coders are not enforced to make potentially arbitrary entific practice to involve several coders and then test inter- either/or choices in cases where content is equivocally associ- coder reliability. ated with more than one theme. But depending on the research Without the aid of content analysis software, it is not context, it may also be appropriate to rule out code overlapping. possible to systematically read between the lines of large 3. “The c-index (structurally resembling the Tanimoto and Jaquard coefficient . . .) assumes separate non-overlapping text samples and to detect latent structures. The proposed text entities” (Friese, S. [2013] p.291. Atlas.ti 7 user guide and theme relation coefficient enables researchers to discover reference. ATLAS.ti Scientific Software Development, Berlin. subtle patterns in verbal content. It allows the researcher http://atlasti.com/de/handbuecher/). to draw analytical conclusions about his study object 4. ATLAS.ti software notifies the user if the ration between two through a transparent and replicable methodology. To sub- codes exceeds a certain threshold (i.e., when one code has been stantiate this claim, this article uses an empirical study on used five times as often as the other). Thanks to an anonymous Jihadi media to demonstrate how the application of the reviewer for this hint. coefficients has produced more generic information about 5. To determine the words frequencies in MAXQDA, the user the ideology of Jihadism as it is communicated in a sam- can use the MAXDictio module. To determine the word fre- ple of Jihadi media. quencies of text that two units share proceed as follows:. Use Unlike conventional co-occurrence (Friese, 2014) or text retrieval function “intersection” or “intersection (Set)” and then use “code the results with new code” to delete mul- code relation metrics that show how often and how consis- tiply coded text passages in the retrievals (credits to Stefan tent two themes co-occur within the text sample, the new Rädiker, for giving me this decisive hint in the support forum). proposed coefficient indicates how much content two units Then retrieve the new code and let Dictio count the word fre- actually share with each other and how elaborated their the- quencies of the retrieval. I recommend to create a copy of the matic link is. The combined use of both coefficients can file once the coding procedure is finished and to perform all add important information to conventional analysis because subsequent analysis (including the creation of new codes as the observation how often and how consistent two themes describes above) with this file. co-occur in the data is not necessarily an indicator for how 6. To obtain this number in MAXQDA, calculate (total word fre- important, relevant, and meaningful this thematic relation quencies in retrieved segment) MINUS (proportion of content is within the research context. with only one code). 10 SAGE Open 7. Again one has to improvise to determine this number in Friese, S. (2014). Qualitative data analysis with ATLAS.ti. London, MAXQDA. The MAXDictio does not count multiply coded England: Sage. text retrievals (except for code–subcode intersections). To Gibbs, G. R. (2007). Media review: Atlas.ti software to assist with obtain the total word frequencies in retrieved segment (i.e., the qualitative analysis of data. Journal of Mixed Methods to deliberately count multiply coded text), retrieve the text of Research, 1, 103-104. all units and copy the retrievals into a new document. Then, Glaser, B. G. (1978). Theoretical sensitivity: Advances in the meth- activate only this document and chose “word frequencies” and odology of grounded theory. Mill Valley, CA: Sociology Press. “only for activated documents.” Glaser, B. G., & Strauss, A. L. (1967) The discovery of grounded 8. P = 132,732 / P = 213,414. theory. Chicago: Aldine Transaction. s s 9. For details about the sampling strategy of the showcase study, Hellmich, C. (2008). Creating the ideology of al Qaeda: From hyp- see Armborst (2013, 64). ocrites to Salafi-Jihadists. Studies in Conflict & Terrorism, 31, 10. Popular alternatives to this software are ATLAS.ti (Bell, 2013; 111-124. Gibbs, 2007), QDA Miner, WordStat, InVivo, or Ethnograph. Jansen, J. J. (1997). The dual nature of Islamic fundamentalism. There are also a number of open source products such as the Ithaca, NY: Cornell University Press. Coding Analysis Toolkit (CAT). Kelle, U., & Kluge, S. (2010). Vom Einzelfall zum Typus. 11. In the showcase study, two methods for code validation were Fallvergleich und Fallkontrastierung in der qualitativen used: second coder and automated text mining. The results are Sozialforschung [From case to type. Case comparison in quali- discussed in Armborst (2013). tative research]. Wiesbaden, Germany: Springer. Krippendorff, K. (1995). On the reliability of unitizing continuous data. Sociological Methodology, 25, 47-76. References Krippendorff, K. (2004). Measuring the reliability of qualitative Angus, D., Rintel, S., & Wiles, J. (2013). Making sense of big text analysis data. Quality & Quantity, 38, 787-800. text: A visual-first approach for analysing text data using Lohlker, R. (2013). Jihadism: Online discourses and representa- Leximancer and Discursis. International Journal of Social tions (Vol. 2). Vienna: Vienna University Press. Research Methodology, 16, 261-267. doi:10.1080/13645579. Mayring, P. (2000). Qualitative content analysis. Qualitative 2013.774186 Sozialforschung, 1, 1-10. Armborst, A. (2013). Jihadi violence: A study of al-Qaeda’s media. Neuendorf, K. A. (2017). The content analysis guidebook. CA, Berlin, Germany: Duncker & Humblot. Thousand Oaks: Sage. Attride-Stirling, J. (2001). Thematic networks: An analytic tool for Oleinik, A. (2011). Mixing quantitative and qualitative content qualitative research. Qualitative Research, 1, 385-405. analysis: Triangulation at work. Quality & Quantity, 45, 859- Bell, D. (2013). Book review: Susanne Friese, Qualitative data 873. doi:10.1007/s11135-010-9399-4 analysis with ATLAS.ti. Qualitative Research, 13, 382-384. Pennebaker, J. W. (2011). Using computer analyses to identify lan- Beutel, A., & Ahmad, I. a. D. (2011). Examining Bin Ladin’s state- guage style and aggressive intent: The secret life of function ments: A quantitative content analysis from 1996 to 2011. words. Dynamics of Asymmetric Conflict, 4, 92-102. Bethesda, MD: Minaret of Freedom Institute. Pennebaker, J. W., & Chung, C. K. (2008). Computerized text anal- Bos, W., & Tarnai, C. (1999). Content analysis in empirical social ysis of Al-Qaeda transcripts. In K. Krippendorf & M. A. Bock research. International Journal of Educational Research, 31, (Eds.), The content analysis reader (pp. 453-465). Thousan 659-671. Oaks, CA: Sage Conway, L. G., III, Gornick, L. J., Houck, S., Towgood, K. H., Rantala, K., & Hellström, E. (2001). Qualitative comparative analy- & Conway, K. R. (2011). The hidden implications of radical sis and a hermeneutic approach to interview data. International group rhetoric: Integrative complexity and terrorism. Dynamics Journal of Social Research Methodology, 4, 87-100. of Asymmetric Conflict, 4, 155-165. Rieger, D., Frischlich, L., & Bente, G. (2013). Propaganda 2.0: Cozzens, J. B. (2007). Approaching al-Qaeda’s warfare: Function, Psychological effects of right-wing and Islamic extremist inter- culture and grant strategy. In M. Ranstorp (Ed.), Mapping ter- net videos. Munich, Germany: Luchterhand. rorism research (pp. 127-163). New York, NY: Routledge. Sageman, M. (2014). The stagnation in terrorism research. Creswell, J. W., & Plano Clark, V. L. (2011). Designing and con- Terrorism and Political Violence, 26, 565-580. ducting mixed methods research. London, England: Sage. Saldana, J. (2013). The coding manual for qualitative researchers Eveslage, B. S. (2013). Clarifying Boko Haram’s transnational (2nd ed.). Thousand Oaks, CA: Sage. intentions, using content analysis of public statements in 2012. Salem, A., Reid, E., & Chen, H. (2008). Multimedia content cod- Perspectives on Terrorism, 7(5), 47-67. ing and analysis: Unraveling the content of Jihadi extremist Fakis, A., Hilliam, R., Stoneley, H., & Townend, M. (2014). groups’ videos. Studies in Conflict & Terrorism, 31, 605-626. Quantitative analysis of qualitative information from inter- Sedgwick, M. (2004). Al-Qaeda and the nature of religious terror- views: A systematic literature review. Journal of Mixed ism. Terrorism and Political Violence, 16, 795-814. Methods Research, 8, 139-161. Sedgwick, M. (2012). Jihadist ideology, Western counter-ideology, Fereday, J., & Muir-Cochrane, E. (2006). Demonstrating rigor using and the ABC model. Critical Studies on Terrorism, 5, 359-372. thematic analysis: A hybrid approach of inductive and deduc- Smith, A. E. (2003). Automatic extraction of semantic networks tive coding and theme development. International Journal of from text using Leximancer. Paper presented at the Proceedings Qualitative Methods, 5, 80-92. of the 2003 Conference of the North American Chapter of Friese, S. (2013). Atlas.ti 7 user guide and reference. ATLAS.ti the Association for Computational Linguistics on Human Scientific Software Development, Berlin. http://atlasti.com/de/ Language Technology June 2003, Demonstrations-Volume 4. handbuecher/ Edmonton, Canada. Armborst 11 Smith, A. E., & Humphreys, M. S. (2006). Evaluation of unsuper- Symonds, J. E., & Gorad, S. (2008). The death of mixed meth- vised semantic mapping of natural language with Leximancer ods: Research labels and their casualties British Educational concept mapping. Behavior Research Methods, 38, 262-279. Research Association Annual Conference, Edinburgh. https:// Smith, A. G. (2004). From words to action: Exploring the relation- www.leeds.ac.uk/educol/documents/174130.pdf ship between a group’s value references and its likelihood of Torres, M. R., Jordán, J., & Horsburgh, N. (2006). Analysis and engaging in terrorism. Studies in Conflict & Terrorism, 27, evolution of the global Jihadist movement propaganda. 409-437. Terrorism and Political Violence, 18, 399-421. Smith, A. G. (2008). The implicit motives of terrorist groups: How Wiktorowicz, Q. (2004a). Framing jihad: Intramovement fram- the needs for affiliation and power translate into death and ing contests and al-Qaeda’s struggle for sacred author- destruction. Political Psychology, 29, 55-75. ity. International Review of Social History, 49(Suppl. 12), Smith, A. G., Suedfeld, P., Conway, L. G., III, & Winter, D. G. 159-177. (2008). The language of violence: Distinguishing terrorist from Wiktorowicz, Q. (2003). Islamic activism: A social movement nonterrorist groups by thematic content analysis. Dynamics of theory approach. Bloomington, IN: Indiana University Asymmetric Conflict, 1, 142-163. Press. Snow, D. A., & Benford, R. D. (1988). Ideology, frame resonance, Wilson, J. (1973). Introduction to social movements. New York, and participant mobilization. International Social Movement NY: Basic Books. Research, 1, 197-217. Snow, D. A., & Byrd, S. (2007). Ideology, framing processes, and Author Biography Islamic terrorist movements. Mobilization: An International Quarterly, 12, 119-136. Andreas Armborst is a criminologist and head of the National Stockwell, P., Colomb, R. M., Smith, A. E., & Wiles, J. (2009). Use Center for Crime Prevention in Bonn, Germany. Previously he has of an automatic content analysis tool: A technique for seeing been A Marie Curie Fellow at the School of Law, University Leeds, both local and global scope. International Journal of Human- and a researcher at the Max Planck Institute for Foreign and Computer Studies, 67, 424-436. International Criminal Law.

Journal

SAGE OpenSAGE

Published: Jun 16, 2017

Keywords: content analysis; concept mapping; qualitative content analysis; mixed methods; communication studies; communication; social sciences; human communication; Jihadi ideology; Jihadism

References