{"id":16693,"date":"2022-12-22T14:59:23","date_gmt":"2022-12-22T13:59:23","guid":{"rendered":"https:\/\/surveyinsights.org\/?p=16693"},"modified":"2023-01-05T12:38:19","modified_gmt":"2023-01-05T11:38:19","slug":"an-overview-of-the-scales-characteristics-for-10-well-established-face-to-face-social-science-surveys","status":"publish","type":"post","link":"https:\/\/surveyinsights.org\/?p=16693","title":{"rendered":"An overview of the scales&#8217; characteristics for 10 well-established face-to-face social science surveys"},"content":{"rendered":"<h1>1. Introduction<\/h1>\n<p>Since decades, surveys have been the main source of data in many studies (Saris and Gallhofer, 2014). How surveys are designed matters 1) for respondents, a bad experience participating in one survey can lead to break-off and\/or discourage respondents to participate in future surveys; and 2) for researchers, since it affects the representativeness (who participates, who breaks-off) and data quality (item non-response, non-differentiation, etc.), and may therefore affect the substantive conclusions reached (Saris and Gallhofer, 2014).<\/p>\n<p>Therefore, a lot of literature has been produced to help researchers designing high quality surveys (e.g. Schuman and Presser, 1981; Sudman and Bradburn, 1982; Alwin and Krosnick, 1991; Alwin, 2007; Dillman, 2011; Saris and Gallhofer, 2014). This literature discussed the numerous choices that have to be made when designing a questionnaire (number of answer categories, use of labels, etc.), and used empirical data (mainly survey experiments) to study the impact of some of these decisions on data quality, measured in different ways.<\/p>\n<p>Previous literature also focused on providing practical recommendations and specifying best practices on how to design questionnaires to maximize data quality, representativeness, and respondents&#8217; satisfaction with their survey participation.<\/p>\n<p>Even when these recommendations are clear and there is large agreement about what the best practices are, we can find discordance between what is done in practice and what the literature recommends. For instance, many studies recommend to avoid the use of agree\/disagree (from now on A\/D) items, in which a statement is presented, and respondents are asked to what extent they agree\/disagree with this statement (e.g. Krosnick, 1991; Alwin, 2007; Saris et al., 2010; Revilla and Ochoa, 2015; H\u00f6hne, Revilla and Lenzer, 2018). However, Revilla (2017) found that 39.2% of the surveys implemented in an opt-in online panel in Spain included A\/D items. Thus, there is a gap between the academic literature recommendations and the way questions are designed in practice, which might be due to the necessity to do trade-offs between data quality, comparability (across waves or countries), and costs.<\/p>\n<p>In this paper, our main goal is to provide information about the response scales used in practice by 10 well-established social science surveys conducted by face-to-face, mainly in Europe and America. We focus on five aspects: the response scales&#8217; evaluative dimension (A\/D or item specific), the number of answer categories, the use of grids, of check-all-that-apply format (CATA), and of fixed-reference points. We selected these aspects because 1) the agreement in the literature about what is preferable is high and 2) their impact on data quality seems to be important (Smyth et al. 2006; Saris and Gallhofer, 2007; DeCastellarnau, 2018). Section 2 presents the main recommendations that the literature provides for each of these five aspects. Then, Section 3 presents the data used in this study, Section 4 explains how the analyses were done, and Section 5 reports the main results. Finally, Section 6 concludes.<\/p>\n<h1>2. Recommendations from the literature<\/h1>\n<h2>2.1 Avoid A\/D scales<\/h2>\n<p>The first aspect is the scales\u2019 evaluative dimension. The literature differentiates A\/D and item-specific (IS) questions. The format of A\/D questions is always the same: first a statement is presented (e.g. &#8220;I am satisfied with my life&#8221;) accompanied by a dis\/agree scale, commonly featuring different levels of intensity. In contrast, IS questions are usually formed by a request for an answer (e.g. &#8220;Are you satisfied with your life?&#8221;) and an answer scale matching the concept under evaluation (e.g. dissatisfied to satisfied).<\/p>\n<p>A\/D questions are quicker to design, since all questions use a similar scale, whatever the concept to be measured. Only the statement needs to be formulated. A\/D questions often present the request for an answer only once, followed by several items for which the respondents need to say to what extent they agree\/disagree.<\/p>\n<p>However, the literature recommends to avoid A\/D questions and use instead IS ones, for different reasons. First, the cognitive process is more complex for A\/D than IS questions (Saris et al., 2010; Revilla, Saris and Krosnick., 2014; Kunz, 2017). Second, the use of A\/D questions leads to weariness (H\u00f6hne, Schlosser and Krebs, 2017) because the format remains the same except for the statement to be judged. Furthermore, acquiescence bias (a tendency to agree with any statement) is expected with A\/D scales (Schuman and Presser, 1981). Moreover, results differ depending on the statement used (e.g. &#8220;I am satisfied with my life&#8221;, versus &#8220;I am dissatisfied with my life&#8221;). Finally, A\/D questions have lower measurement quality (Saris et al., 2010; Revilla and Ochoa, 2015).<\/p>\n<h2>2.2 Use 6 to 11 answer categories for IS scales and maximum 5 for A\/D scales<\/h2>\n<p>The decision about the number of scale points depends on the type of variables and the scales\u2019 evaluative dimension (A\/D or IS).<br \/>\nConcerning the type of variables, following Dillman (1978), we distinguish between questions measuring behaviors, attitudes, beliefs and attributes: behavioral questions are about people\u2019s actions (what they have done, currently do, or plan to do). Attitude questions are about what people (dis)like requiring them to indicate whether they have positive or negative feelings. Belief questions are about what people think is true or false eliciting their perceptions of past, present, or future reality. Finally, attributes questions are related with personal or demographic characteristics.<br \/>\nOne of the most important rules to design an effective answer scale is to make it complete (Krosnic and Presser, 2010): the answer scale has to cover all possible options. When measuring attributes or behaviors, the adequate number of scale points depends on the object measured. For example, if the interest is in which region the respondent lives, all regions in a given country have to be offered as response categories. When it comes to attitude or belief questions, we have to differentiate between A\/D and IS questions to decide about the scale length (Revilla et al., 2014).<\/p>\n<p>Literature concerning attitude or belief IS questions suggests using six to 11 points (Alwin and Krosnick, 1991; Alwin, 1997; Asensio and Revilla, 2021; Revilla et al. 2014; Saris and Gallhofer, 2014; Revilla and Ochoa, 2015). On one hand, the theory of information (Garner, 1960) states that for bipolar concepts, while 2-point scales measure the direction of an attitude or belief, 3-point scales (and longer scales with odd numbers) also measure the neutral middle point, and scales with more points additionally measure the intensity of the attitude or belief. Moreover, Alwin and Krosnick (1991) suggest that enough answer categories are needed to allow scale differentiation. With too few categories, people with different true values may have to answer the same as there are no more option to choose from. On the other hand, the literature does not recommend more than 11 points because more categories can lead to ambiguity and non-discrimination between categories (Schaeffer and Presser, 2003; Krosnick and Presser, 2010).<\/p>\n<p>For A\/D scales, even though such scales are not recommended (see section 2.1), Revilla et al. (2014) found that five points is better than seven and 11 points. Furthermore, for A\/D scales, having five points instead of more, reduces the extreme response style bias, understood as the tendency to disproportionally use the extreme response categories in a rating scale (Weitjers, Cabooter and Schillewaert, 2010).<\/p>\n<h2>2.3 Avoid grid format<\/h2>\n<p>Most questions are presented using an item-by-item format. However, when a set of questions shares the same response scale, it is possible to use a grid format. Then, the main request for an answer (e.g. \u201cHow much do you trust the following institutions?\u201d) and the response scale (e.g. 0 \u201cNo trust\u201d to 5 \u201cComplete trust\u201d) are usually repeated only once for a series of items (e.g. the parliament, the legal system, the police).<\/p>\n<p>In visual modes of data collection (e.g. web or face-to-face using showcards), the items are usually presented in rows, while the common set of response options is usually presented in columns (Couper et al., 2013, p. 322).<br \/>\nGrid format allows the information (Revilla, Toninelli and Ochoa, 2016) to be condensed, which can be timesaving for respondents. Indeed, completion times are lower for grids than item-by-item formats (Couper, Traugott and Lamias, 2001; Bell, Mangione and Kahn, 2001; Tourangeau, Couper and Conrad, 2004). Furthermore, higher inter-item correlation has been found when using grids (Tourangeau et al., 2004). This is often seen as an indicator of higher quality, as it improves the reliability (Ferketich, 1991).<\/p>\n<p>However, many studies recommend avoiding grids for different reasons (Poynter, 2001; Wojtowicz, 2001; Dillman, 2011). First, shorter completion times seem to be due to respondents not putting enough efforts into answering the questions, and not to the task being easier. For instance, Couper et al. (2013) suggested that shorter completion times may be consequence of choosing the same answer category for all items which is a suboptimal responding strategy known as non-differentiation or straightlining. Other studies showed that grid formats tend to increase the number of missing items (Iglesias, Birks and Torgerson, 200; Manfreda, Batageli and Vehovar, 2002), which in turn, can also reduce completion times. Toepoel et al. (2009) showed that this item-missing trend systematically increased with the number of items inside the same grid. Second, the higher inter-item correlations in grids, which was sometimes considered as an indicator of higher quality, seems to be due to higher non-differentiation and systematic measurement errors (Peytchev, 2005). Third, respondents\u2019 satisfaction with their survey experience decreases when grids are used (Thorndike et al., 2009; Toepoel et al., 2009). This, in turn, can lead those respondents to refuse to participate in future surveys. Finally, the use of grids increases respondents\u2019 break-off rates (Puelston and Sleep, 2008).<\/p>\n<h2>2.4 Avoid check-all-that-apply formats<\/h2>\n<p>Check-all-that-apply (CATA) is a question format where respondents are asked to select from a list all the options applying to them (Smyth et al., 2006). For instance, \u201cPlease, indicate all the devices you use to go online: PC, tablet, smartphone\u201d. The alternative to this format is usually called \u201cforced-Choice\u201d: in this case, the respondents are asked to provide an answer (typically yes or no) for each item in the list (Smyth et al., 2006). For instance, \u201cDo you use the following devices to go online: PC? Yes\/No; Tablet? Yes\/No; Smartphone? Yes\/No\u201d. However, it is possible to add an option \u201cprefer not to answer\u201d or even to let respondents continue without providing an answer when using a \u201cforced-Choice\u201d format.<\/p>\n<p>In oral modes like face-to-face it might be difficult to distinguish between CATA and forced-choice. However, when the face-to-face interviews use showcards, then the distinction is clearer.<\/p>\n<p>CATA items are common in surveys mainly for two reasons: completion times are shorter (Sudman and Bradburn, 1982) and the design is efficient, in the sense that it allows respondents to select several options (Smyth et al., 2006). However, the literature recommends avoiding CATA format because, first, its nature encourages weak satisficing of the respondents, who tend to report less items than in forced-choice formats (Smyth et al., 2006; Jaeger et al., 2014). Even if the higher reporting in forced-choice formats could be partly due to acquiescence bias, forced-choice has a higher external validity (Revilla, 2015). Lau and Kennedy (2019) results based on undesirable but common events (lost job or being arrested) also suggest that respondents report less items than they should in the CATA format. Second, the longer response times for forced-choice formats can be positive, if due to a more careful answer process. Finally, the meaning of not selecting an option in the CATA format is unclear: it could be that the option effectively does not apply to the respondents, that respondents are neutral, or that they overlooked it (Sudman and Bradburn, 1982). In contrast, forced-choice formats allow better differentiation because options are marked negatively (Smyth et al., 2006).<\/p>\n<h2>2.5 Use fixed reference points<\/h2>\n<p>Fixed reference points are labels that \u201cset no doubt about the position of the reference point on the subjective scale in the mind of the respondent\u201d (Saris and Gallhofer, 2014, p.110). For example, words as \u201ccompletely\u201d or \u201cextremely\u201d define an abstract object to its maximum value, whereas other words such as \u201cslightly\u201d or \u201csomewhat\u201d can be interpreted in different ways by the respondents.<br \/>\nIn behavioral questions, it is often possible to use fixed reference points for each answer category, for instance when measuring a frequency (e.g., \u201conce a week\u201d, \u201ctwice a week\u201d, etc), or a duration (e.g. \u201c0h00 to 5h00\u201d, \u201c5h01 to 10h00\u201d, etc.), instead of using vague quantifiers (e.g. \u201coften\u201d or \u201ca lot\u201d). For attitude and belief questions, fixed reference points are usually possible for the end points and, for bipolar concepts, the midpoint. Attribute questions usually use fixed reference points for all answer categories (e.g. regions, education, income).<\/p>\n<p>The literature recommends the use of fixed reference points whenever possible. First, when fixed reference points are not provided respondents might interpret the answer categories differently. Then, the assumption of equality of the response function does not hold, i.e. respondents with different true values can chose the same answer category, and respondents with similar true values can chose different answer categories (Saris and De Rooij, 1988). These variations in the response function can be prevented to a large extent by using at least fixed reference points for the end points of the scale (Saris et al., 1988; Batista-Foguet and Saris, 1988).<\/p>\n<p>Second, when the response function has been equalized due to the use of fixed reference points, the measurement error that may arise from its variation is reduced (Scherpenzeel, 2003). In line with this, Revilla and Ochoa (2015) found that the use of two fixed reference points in the end points slightly increases measurement quality.<\/p>\n<h1>3. Data<\/h1>\n<p>We selected 10 well-established surveys, recognized for their quality and contribution to the production of knowledge across the world. All surveys cover broad social sciences topics and are implemented as face-to-face interviews with showcards across different countries, mainly in Europe, Latin America, and the USA.<\/p>\n<p>We focus on face-to-face surveys because they are usually considered the ones with highest data quality and because most of the studies that studied the impact of changing the response scales and allowed to formulate recommendations were conducted face-to-face. Moreover, we focus on Europe and America because literature concerning survey methods has been mostly produced using data from these areas. It is not clear whether the current recommendations from the survey methodology literature apply in a similar way for other areas using different languages, alphabets, and linguistic rules (e.g. Arab or Asiatic countries).<br \/>\nThe selected surveys are the following: Las Am\u00e9ricas y el Mundo (AMERICAS), European Social Survey (ESS), Eurobarometer (EURO), European Values Study (EVS), General Social Survey (GSS), Proyecto de Opini\u00f3n P\u00fablica de Am\u00e9rica Latina (LAPOP), Latinobar\u00f3metro (LATINOBAR), Quality of life in European Cities \u2013 Flash Eurobarometer (QEC), Survey of Health, Ageing and Retirement in Europe (SHARE), and World Values Survey (WVS). Table 1 provides information about each of them.<\/p>\n<p><strong>Table 1. Overview of the key characteristics of the selected surveys and their questionnaires<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_1.pdf\">To the table<\/a><\/p>\n<p>Most of these surveys exist for many years. However, we are interested in the current state of their questionnaires, to compare it with the current recommendations of the literature. Thus, we analyzed the last questionnaire available in each case when this research was conducted (2019).<\/p>\n<h1>4. Analyses<\/h1>\n<p>First, we retrieved the last questionnaires available from the surveys&#8217; websites. From those questionnaires, we coded the following characteristics for each question:<\/p>\n<ul>\n<li>Type of variables according to Dillman\u2019s (1978) distinction, i.e. attributes, behaviors, attitudes or beliefs. This was necessary as some recommendations apply only to some question types.<\/li>\n<li>Scales\u2019 evaluative dimension: for behaviors, attitudes, and beliefs, we coded if the questions used an A\/D or IS format. Some questions were also open-ended, asking for a number or a text answer. In those cases, we coded the question as a text or number question. We report the percentage of A\/D items out of all behavioral, attitude, and belief questions in each survey, as well as the percentage of A\/D items when excluding behavioral questions, since the use of A\/D is often not adapted for behaviors.<\/li>\n<li>Number of answer categories: for closed questions (A\/D and IS), we studied the number of answer categories. Only the answers predefined in the scale were counted. For example, in the LAPOP questionnaire, for the question \u201cHow do you think your economic situation will be next year?\u201d (S8), we counted two answer categories (Better\/Worse) even though spontaneous answers such as \u201cAs good as now\u201d were accepted as valid answers. We report the percentage of attitude and belief questions (since these are the questions for which the literature recommendations hold) with a determined number of scale points, (from two to \u201c12 or more\u201d). We distinguish between A\/D and IS since the literature provides different recommendations for both.<\/li>\n<li>Use of grid formats: if two or more questions were presented in rows (as depicted in the questionnaires documents) with the same scale in columns, we considered that set of questions as belonging to a grid. We report, for each questionnaire, the total number of grids, the mean and maximum number of items per grid, and the percentage that grid items represented out of all questionnaire items. All surveys used showcards implying that respondents got a visual stimulus for answering those grids. However, this visual stimulus varied across surveys and questions: in some cases, it was the scale that applied to all items in the grid; in others, the list of items for which a determined scale applied; still in others, both.<\/li>\n<li>Check-all-that-apply format: if two or more items were presented together in one multiple choice question where respondents had to select all the items applying to them, we considered that set of items as being a CATA. The showcards normally presented all the items for a determined CATA question. We report, for each questionnaire, the number of CATA, the mean and maximum number of response options in it, and the percentage that CATA items represented out of all questionnaire items.<\/li>\n<li>Fixed reference points: here we focused on scales where the survey designer has the choice between fixed and non-fixed reference points for at least some of the answer categories, such as scales measuring the intensity of an abstract object or the frequency of a behavior. We do not report the number of fixed reference points because the maximum possible vary across questions: for many behavioral questions, each category can be a fixed reference point, whereas, for attitude or belief questions, commonly only the extremes and middle neutral category (for bipolar scales) can be fixed reference points. Therefore, we report the percentage of attitude, belief, and behavioral items with no fixed reference points, at least one fixed reference point but not all possible ones, and all possible fixed reference points. Attributes are not considered since there is usually no choice.<\/li>\n<\/ul>\n<h1>5. Results<\/h1>\n<h2>5.1 A\/D and IS questions<\/h2>\n<p>First, Table 2 shows, per survey, the percentages of A\/D items out of all survey items, and out of all attitude and belief items because to measure attributes and behaviors, A\/D items are not appropriate.<\/p>\n<p><strong>Table 2. Presence of A\/D items<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17863\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_2.png\" alt=\"\" width=\"550\" height=\"276\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_2.png 906w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_2-300x150.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/12\/Table_2-768x385.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p>Table 2 confirms that A\/D items are right now a reality in well-established face-to-face surveys. Indeed, no survey has less than 5% of A\/D questions overall, and less than 8% when focusing on attitude and belief questions. However, their presence varies across the surveys: it ranges from 5.8% in the ESS to 15.7% in the WVS overall, and from 8.7% in the ESS to 27.4% in SHARE when focusing only on attitudes and beliefs.<\/p>\n<h2>5.2 Number of answer categories<\/h2>\n<p>Tables 3 and 4 present the proportions of scales with different numbers of answer categories for attitude and belief closed-ended questions, both for IS and A\/D scales.<\/p>\n<p><strong>Table 3. Percentage of scales with different numbers of answer categories in attitude\/belief IS questions<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17849\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_3.png\" alt=\"\" width=\"550\" height=\"224\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_3.png 932w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_3-300x122.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_3-768x313.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p>Table 3 shows that, in the 10 surveys studied, most IS attitude or belief questions use between two and six answer categories.<\/p>\n<p>The election of seven answer categories is less than 5% in eight out of the 10 surveys, and zero in two of them. The LAPOP is the only survey using seven categories to a moderate extent (15.7%). Similarly, the use of 11 answer categories is lower than 5% for all surveys except the ESS (30.2%). Nine answer categories are even less common, with a maximum of 8.3% in the WVS. Ten answer categories are commonly used in the QEC (50.0%), the WVS (38.0%) and the EVS (26.5%), but other surveys (e.g., EUROBAR, never use them).<\/p>\n<p><strong>Table 4. Percentage of scales with different numbers of answer categories in attitude\/belief A\/D questions<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17850\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_4.png\" alt=\"\" width=\"550\" height=\"219\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_4.png 944w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_4-300x119.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_4-768x306.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p>Moving to A\/D questions, Table 3 shows that the number of answer categories are more in line with the recommendations of the literature: more than five answer categories are only used in the LAPOP (100% of the A\/D questions use seven answer categories), in the WVS (15% of the A\/D questions use 10 answer categories) and in the QEC (41.7% of the A\/D scales have 10 points). However, the WVS data was collected between 2010 and 2014, and the QEC in 2015. These are the two oldest questionnaires. Thus, they were prepared before a lot of studies about A\/D scales were published. All other surveys use A\/D scales of five points or less. Many surveys use the same number of answer categories for (almost) all the A\/D scales.<\/p>\n<p>The most used number of answer categories in practice for A\/D scales is four, which means that most A\/D scales have no middle neutral category (\u201cneither agree nor disagree\u201d) trying to obtain a substantive answer by pushing people to choose one side.<\/p>\n<h2>5.3 Grids<\/h2>\n<p>Third, Table 5 provides information regarding the usage of grids in the 10 surveys studied: the number of grids in each questionnaire, the maximum number of items in a grid, the average number of items per grid and the percentage of total survey items belonging to grids.<\/p>\n<p><strong>Table 5. Grid usage in the 10 surveys<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17852\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_5.png\" alt=\"\" width=\"550\" height=\"250\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_5.png 870w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_5-300x137.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_5-768x350.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p><em>Notes. Because the available questionnaires for GSS and SHARE are presented in scripting files, we could not infer whether some of the questions belong or not to a grid.<\/em><\/p>\n<p>Grid usage is very common in practice. However, important differences exist across surveys: the number of grids varies between 11 (LAPOP) and 50 (QEC). Overall, items belonging to grids represent between 43.3% (ESS) and 77.7% (QEC) of all survey items.<\/p>\n<p>Moreover, the grids are quite large, with an average number of items per grid varying between 4 and 7, but a maximum number of items per grid that goes up until 23 (AMERICAS). Overall, most surveys use a lot of grids, and furthermore, these grids sometimes include many items.<\/p>\n<h2>5.4 Check-all-that-apply (CATA)<\/h2>\n<p>Table 6 presents, per survey, the total number of CATA, the maximum and mean number of items in a single CATA, and the percentage of CATA items over the total survey items.<\/p>\n<p><strong>Table 6. CATA usage in the 10 surveys<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17853\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_6.png\" alt=\"\" width=\"550\" height=\"238\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_6.png 920w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_6-300x130.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_6-768x332.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p>The use of CATA format in the analyzed questionnaires is very limited. Most surveys have few or no CATA format. Thus, CATA items represent a small percentage of the total questionnaire, ranging from 0% (GSS) to 5.7% (SHARE).<br \/>\nEven though few CATA items are used, some of those are especially large, with a maximum of 36 items in a single CATA (EUROBAR). This particular case refers to a nationality item. Thus, all the choices are possible nationalities of the respondent. Considering this, the problems regarding the use of CATA explained in Section 2.4 may not apply.<\/p>\n<p>Moreover, information about the mean number shows that this is exceptional: questionnaires using CATA have a mean number of items per CATA of 7 (WVS) to 9 (ESS, EVS and SHARE), except the EUROBAR (mean = 18).<br \/>\nFinally, the residual use of forced-choice formats in the analyzed questionnaires suggests that the questionnaires simply do not require to use this kind of formats.<\/p>\n<h2>5.5 Fixed reference points<\/h2>\n<p>Table 7 shows the proportions of scales without fixed reference points, with at least one fixed reference point but not all possible ones and with all possible fixed reference points.<\/p>\n<p><strong>Table 7. Usage of fixed reference points in the 10 surveys<\/strong><\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-17854\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_7.png\" alt=\"\" width=\"550\" height=\"243\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_7.png 906w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_7-300x132.png 300w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2022\/01\/Table_7-768x339.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/a><\/p>\n<p>Except the Latino Barometer and SHARE, all other questionnaires have more scales with at least one fixed reference point than none. However, the surveys use more often some but not all possible fixed reference points, except for the Eurobarometer, the Latino Barometer and the QEC.<\/p>\n<p>Again, large differences across surveys exist. For instance, while the QEC is using all possible fixed reference points to a moderate extent (47.2% of the items), most other surveys use them in less than 25% of the cases. Particularly, the AMERICAS and SHARE rarely use all fixed reference points (1.3% and 3.0%, respectively).<\/p>\n<h1>6. Conclusion and discussion<\/h1>\n<p>This paper provides information about the response scales used by 10 well-established social science face-to-face surveys, focusing on five aspects: (1) For the scales\u2019 evaluative dimension, all questionnaires use A\/D items in a non-negligible proportion of items: between 5.8% and 15.7% depending on the survey when considering all survey items; between 8.7% and 27.4% when considering only the attitude\/belief items. (2) For the number of answer categories, attitude\/belief IS questions often propose six or less answer categories and A\/D five or less. (3) Regarding the use of grids, results vary largely across surveys: some including many grids and sometimes with many items. (4) In line with the literature recommendations, we found overall low proportions of CATA (lower than 2% in all surveys, except SHARE).<\/p>\n<p>(5) Finally, eight of the 10 analyzed questionnaires included more items with at least one fixed reference point than without any. However, we still observed an important number of scales without any fixed reference point, and even more without all possible fixed reference points. The variations across surveys are large.<\/p>\n<p>This study has different limitations. First, it focused on only five response scale characteristics. These were selected because there is general consent in the literature about them and it has been shown that they have an important impact on data quality. Further research could consider other aspects. Moreover, according to Alwin (2007) the polarity of the measured concept is key for understanding the scales\u2019 length. This suggests that a division between bipolar and unipolar concepts should be done in order to establish an optimal length for a scale. However, this aspect is not taken into account in most of the current literature about scale length. Therefore, we have not included this distinction, but further research distinguishing between unipolar and bipolar scales when studying the number of answer categories would be useful. Besides, the use of different formats depends a lot on the concepts to be measured. For instance, CATA formats might be used little in the surveys not because researchers designing the questionnaires take into account the literature recommendations, but simply because the concepts they want to measure in the surveys studied are not adequate for CATA formats. The low number of forced-choice questions that could be alternatively asked using CATA formats suggest that this is indeed the case. Finally, mainly for the CATA items, the way these items were implemented was not always clear, in particular due to the presence of an interviewer. Even if showcards were used, the administration of the CATA items might have been in practice very similar to a forced-choice format.<\/p>\n<p>Even if the study has limitations, the results can help researchers by shedding light on the current survey practice in key social science surveys, also showing how it relates to the current state of the art. In particular, the results suggest that there are some deviations from the literature recommendations observed in practice in the 10 surveys considered. Thus, researchers should not simply copy questions from these surveys when designing their own questionnaire. Instead, they should evaluate not only their quality before deciding to use them, but also the survey\u2019s design, content and context. Although quality is an important aspect to consider, compromises are made to meet the aim of the surveys &#8211; i.e. making a question comparable across countries, targeting a specific population sub-group or complementing another data base.<\/p>\n<p>A key question then is why scales that are not recommended by the literature are still used in practice? This might be a problem of dissemination of the results. However, well-established surveys as the ones considered in this paper usually count with international experts who do know the literature recommendations and best practices. Thus, we believe that other reasons play an important role. First, even if there is high agreement in the literature for the recommendations studied in this paper, it is always important to confront the possible recommendations to the specific case study of interest. Final decisions have to be taken on a case by case basis because what is usually recommended might not be the best solution in some situations. These recommendations are often interacting with each other (e.g. the number of points depends on the scale evaluative dimension) and not all interactions were tested in the literature (e.g. polarity and number of answer categories). Moreover, all these surveys have been running for years, and often try to maintain data comparability across waves to allow longitudinal analyses. Therefore, they minimize changes in their questionnaires. Furthermore, we have to take into account that not all the surveys have the necessary resources to continuously be up-to-date with the literature nor to apply all the recommendations in practice. There is usually a trade-off between data quality, comparability (across time or countries), and costs. Still, it is crucial that surveys keep improving their questionnaires using the latest (evidence-based) recommendations from the literature as often as possible.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Introduction Since decades, surveys have been the main source of data in many studies (Saris and Gallhofer, 2014). How surveys are designed matters 1) for respondents, a bad experience participating in one survey can lead to break-off and\/or discourage respondents to participate in future surveys; and 2) for researchers, since it affects the representativeness [&hellip;]<\/p>\n","protected":false},"author":3175,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[41],"tags":[852,494,619,851,853],"class_list":["post-16693","post","type-post","status-publish","format-standard","hentry","category-questionnaire_design","tag-face-to-face-mode","tag-practical-implementation","tag-questionnaire-design","tag-response-scales","tag-social-science-surveys"],"acf":[],"_links":{"self":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/16693","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/users\/3175"}],"replies":[{"embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16693"}],"version-history":[{"count":25,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/16693\/revisions"}],"predecessor-version":[{"id":17825,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/16693\/revisions\/17825"}],"wp:attachment":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}