{"id":6899,"date":"2015-11-20T13:30:09","date_gmt":"2015-11-20T12:30:09","guid":{"rendered":"http:\/\/surveyinsights.org\/?p=6899"},"modified":"2015-11-24T12:59:56","modified_gmt":"2015-11-24T11:59:56","slug":"what-do-web-survey-panel-respondents-answer-when-asked-do-you-have-any-other-comment","status":"publish","type":"post","link":"https:\/\/surveyinsights.org\/?p=6899","title":{"rendered":"What do web survey panel respondents answer when asked \u201cDo you have any other comment?\u201d"},"content":{"rendered":"<h2>1. Introduction<\/h2>\n<p>\u201cDo you have any other comments?\u201d This or a similar question is often asked near the end of a web survey and routinely in web survey panels such as the LISS panel (Longitudinal Internet Studies for the Social Sciences). We call such an open-ended question a \u201cfinal comment\u201d. Final comments are the respondents\u2019 only opportunity in the survey to give feedback about the survey or to say anything else that is on their mind after filling out the questionnaire. But what are respondents trying to communicate?\u00a0 Nobody really knows. \u00a0Analyzing respondents\u2019 final comments is potentially very useful because they may shed additional light on data quality or other aspects of survey operation. This is particularly important in long running probability survey panels such as the LISS panel and the Dutch Immigrant panel.<\/p>\n<p>There is an extensive literature on open-ended questions (e.g., Emde &amp; Fuchs, 2012; Geer, 1991; Holland &amp; Christian, 2009; Zuell, Menold, &amp; K\u00f6rber, 2015). To our knowledge there is no literature specifically related to final comments. While some researchers may peruse final comments studies make no systematic attempt to analyze them (Aldridge &amp; Rowley, 1998; Bell &amp; Tang, 1998; Kingston, Carver, Evans, &amp; Turton, 2000). \u00a0\u00a0Borg and Zuell (2012) analyze 75,000 write-in comments from a survey of 25,000 employees. They find that 40% of employees provide write-in comments and most are negative in tone. Employees with low job satisfaction are more likely to write comments. Negative comments tend to be longer.<\/p>\n<p>Final comments may be particularly important in web survey panels. Because of the longitudinal dimension web survey panels have an opportunity to react to previous comments in the subsequent survey waves.<\/p>\n<p>In this paper we categorize a random sample of final comments from the LISS panel and the Dutch Immigrant panel, two probability-based web survey panels in the Netherlands.\u00a0 In section 2 we introduce the two survey panels and how final comments are categorized. Section 3 gives results and section 4 concludes with a discussion.<\/p>\n<p>&nbsp;<\/p>\n<h2>2. Data and methods<\/h2>\n<p>We categorize final comments from the LISS and Immigrant panels. We first describe these panels and then the categorizations.<\/p>\n<p>&nbsp;<\/p>\n<h3>2.1. Data<\/h3>\n<p>The LISS panel is an open-access Internet panel based on a probability sample of households drawn from the Dutch population register in 2007. Households that could not otherwise participate were provided a computer and Internet connection.\u00a0 In 2009 and again in 2010\/2011 refreshment samples were drawn.\u00a0 Respondents are paid an incentive of 15 Euro per hour (and proportionally less for shorter surveys).\u00a0 The number of respondents in the LISS panel has varied over time with attrition and replenishment. Between 6,000 and 10,000 respondents participate in monthly Internet surveys.<\/p>\n<p>The Immigrant panel is an open-access Internet panel proportionally representative of the Dutch immigrant population with an additional Dutch control group. It was drawn from the population register by Statistics Netherlands in 2010. Almost 20% of the Dutch population are 1<sup>st<\/sup> or 2<sup>nd<\/sup> generation immigrants.\u00a0 Broadly speaking, the immigrant panel contains equal numbers of 1<sup>st<\/sup> generation immigrants, 2<sup>nd<\/sup> generation immigrants and Dutch members. Among immigrants, immigrants from western countries form the largest group. Major non-western immigrant groups in the panel are persons with Moroccan, Turkish, Surinamese and Antillean origin. The immigrant panel uses the same incentive structure as the LISS panel.\u00a0 The immigrant panel has about 1400 respondents.<\/p>\n<p>Both panels ask the same final comment question in Dutch: \u00a0\u201cDo you have any remarks about the questionnaire?\u201d The original Dutch version of this question and the routing are given in Appendix A. There are no differences in terms of how the question is asked or in the size of the answer box in either panel. \u00a0Of course, the overall length of the questionnaires varies.<\/p>\n<p>&nbsp;<\/p>\n<h3>2.2. Methods<\/h3>\n<p>Final comments were categorized by up to three raters into one of nine non-overlapping categories as described in a manual created for this purpose. \u00a0The categories were developed from a sample of responses. Initially, categories were developed based on frequently occurring comments in that sample. After the quality of the initial categorization scheme proved too low, the categorization scheme was revised and the present scheme with positive, neutral, and negative comments was developed. Because negative comments were numerous and thought to be potentially important, the category \u201cnegative comments\u201d had a number of subcategories. \u00a0The data were then re-categorized based on the improved categorization scheme (Krippendorff, 2013) which is reported here.<\/p>\n<p>The raters were students at the University of Waterloo in Canada who were fluent in both Dutch and English. The nine categories are: positive comments, neutral comments, trivial comments and six types of negative comments. The six types of negative comments were unclear questions, difficult questions, the survey was too long, respondent perceived that the question(s) did not apply to him\/her, a programming or technical error, and \u201cother negative comment\u201d. \u00a0 Positive comments could not be split into subcategories as there were too few of them.<\/p>\n<p>This paragraph describes negative comment categories in more detail.\u00a0 The category \u201cdifficult questions\/survey\u201d often directly contains a Dutch word for difficult (\u201clastig\u201d or \u201cmoeilijk\u201d).\u00a0 A typical example is \u201cI find it difficult and I don\u2019t think I did a good job in filling in the answers\u201d. The negative category \u201ctoo long\u201d refers to relative survey length relative to the incentive rather than absolute survey length: If the incentive corresponds to a 15 minute survey length, respondents may complain when the survey is 30 minutes long. The negative category \u201ctechnical error\u201d was intended to catch web survey programming errors. For example, a question asks for number of hours and the question\u2019s input validation refuses to accept \u201c0.5\u201d hours as an answer. Since 0.5 hours is a valid answer this is considered a survey programming error.\u00a0 Another example, a survey asked questions about an image. However, some respondents reported the image would not load. While this is not necessarily a programming error this is also technical problem. \u00a0A third example, an older respondent tried to enter the year 1942 but the input validation refused to accept that year as valid.\u00a0 While not certain, from the context of the comment it seemed reasonable that 1942 should have been be a valid input. \u00a0The negative category \u201cunclear\u201d refers to something being unclear including questions and answer choices. A typical example is \u201cThe questions asked were unclear\u201d. \u00a0The negative category \u201cdoes not apply to me\u201d applies when the respondent feels that question(s) or the survey do not apply to him or her. A typical example is \u201cI didn\u2019t like the questions about foreign roots. My father was born in Indonesia, but I am not part of the Indonesian community. An answer choice ` does not apply to me\u2019 would have been useful\u201d.\u00a0 \u00a0The negative category \u201cother negative comment\u201d was meant to include all comments that did not specifically fit into one of the other categories. It includes a diverse set of comments including comments like \u201cI&#8217;ve never experienced something so ridiculous. Just nonsense!\u201d but also more specific negative comments such as comments related to missing answer choices in multiple choice questions.<\/p>\n<p>Examples of positive comments are \u201cthis is interesting\u201d and \u201cit made me think about the topic\u201c. Trivial comments refer to comments without content like \u201ctffff\u201d, \u201c\u2014\u201c\u00a0 or \u201cNo comment\u201d. Neutral comments comprise a wide range of comments: \u00a0comments related to the survey topic (\u201cI think politicians are [\u2026 ]\u201c), comments containing personal information (\u201cI was on vacation last week and couldn\u2019t answer\u201d, \u201cI had to go to the hospital\u201d), requests or questions for the survey panel, and clarification of answers given earlier in the questionnaire.\u00a0 Neutral comments were not further divided into different types for two reasons: subcategories of negative comments were considered to be more important than subcategories of neutral comments. Also, during pre-testing we found that the categorisation of neutral comments was more difficult than that of negative comments.<\/p>\n<p>Comments were categorized into a single category. In the rare case that two categories applied to a comment, the category corresponding to the comment\u2019s main theme was chosen. If ambiguity persisted, the first theme mentioned was chosen.<\/p>\n<p>For the LISS panel, a random sample of 1700 comments made in surveys between April 2007 and September 2013 were categorized: 450 comments were categorized by 3 raters, 400 comments were categorized by 2 raters, and the remaining 850 comments were categorized by a single rater.\u00a0 For the immigrant panel, a random sample of 850 comments made in surveys between the panel\u2019s inception in 2010 and September 2013 were categorized. 450 of these comments were categorized by two raters.<\/p>\n<p>Inter-rater reliabilities \u201cKappa\u201d \u00a0(Fleiss, Levin, &amp; Paik, 2003) were then computed. For the LISS panel where 3 raters were available individual inter-rater reliabilities could be broken down by category.\u00a0 For computing frequencies different ratings of the same final comment had to be reconciled into a single \u201cgold standard\u201d rating. Where ratings differed, the rating of the most experienced (\u201cexpert\u201d) rater was chosen.<\/p>\n<p>For the LISS panel, the inter-rater agreement was 0.481; for the Immigrant panel the inter-rater agreement was 0.495.\u00a0 This indicates a moderate strength of\u00a0 agreement (Fleiss et al., 2003). Table 1 gives kappa values by individual category.\u00a0 Inter-rater agreement was high for question\/survey difficulty, positive comments, and trivial comments.\u00a0 Inter-rate agreement was very low for whether the comment pointed out a technical error or not and for the category \u201cdoes not apply\u201d.<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\">\n<table width=\"296\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\"><strong>Category<\/strong><\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\"><strong>kappa<\/strong><\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">difficult questions\/ survey<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.736<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">positive comment<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.733<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">trivial comment<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.721<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">neutral comment<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.545<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">too long<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.514<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">Other negative comment<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.393<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">unclear\u00a0 questions\/survey<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.368<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">questions do not apply to me<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.193<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\">technical error<\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.028<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"232\"><strong>combined<\/strong><\/td>\n<td valign=\"bottom\" nowrap=\"nowrap\" width=\"64\">0.481<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Table 1: Kappa value for the LISS panel overall and by individual category. Categories are sorted by the kappa value.<\/p>\n<p>Because most subcategories of negative comments had a lower than average inter-rater reliability, we combined all subcategories of negative comments and computed kappa again. \u00a0For the categorization into positive, neutral, trivial and negative comments the kappa values rise to 0.559 in the LISS panel and 0.547 in the Immigrant panel.<\/p>\n<p>&nbsp;<\/p>\n<h2>3. Results<\/h2>\n<p>We first comment on the frequency and distribution of comment types. Most respondents did not make a final comment. In the LISS panel 3.6% of surveys contain a final comment. In the immigrant panel 5.7% of surveys contain a final comment, an increase by a factor of 1.6.<\/p>\n<p>Figure 1 shows the distribution of different comment types by panel.\u00a0 Both panels contain few positive comments (1.8% in both panels). The remainder of the comments are split roughly equally between neutral comments (LISS panel 50.2%; Immigrant panel 54.5%) and different types of negative comments (LISS panel 43.5%; Immigrant panel 47.7%). Among negative comments, in both panels \u201cother negative comments\u201d is the largest component, followed by \u201cunclear\u201d and \u201cdifficult\u201d.\u00a0\u00a0 Trivial comments, questionnaires that are \u201ctoo long\u201d, comments about technical errors, and \u201cdoes not apply to me\u201d complaints occur less frequently.<\/p>\n<p>The categories \u201cunclear\u201d and \u201cdifficult\u201d relate to problems with the questionnaire. This is also true to some extent \u201cother negative comments\u201d which includes problems with answer choices.\u00a0\u00a0 Comment types unrelated to the questionnaire include technical problems (\u201cerror\u201d), the questionnaire is too long (\u201ctoo long\u201d) and the questions do not apply to me (\u201cnot apply\u201d). Overall, problems with the questionnaire outweigh problems unrelated to the questionnaire.<\/p>\n<p>Final comments in the immigrant panel are significantly more often about unclear questions than in LISS panel (11.2% vs 6.6%, 0.5% vs 2.0%, Chi squared=14.0, 1 d.f., p=0.000).\u00a0 Because respondents in the immigrant panel make more comments to begin with, the absolute number of comments related to unclear questions is 2.7 times larger (11.2%\/6.6% * 5.7%\/3.6%= 2.7) in the immigrant panel. \u00a0Also, final comments in the immigrant panel significantly less often complain about survey length (chi squared=8.7, 1 d.f., p =0.003). \u00a0Accounting for the larger number of comments in the immigrant panel, the absolute number of comments complaining about survey length in the LISS panel is 2.7 times (2.00%\/0.47%* 3.6%\/5.7% =2.7) larger than that of the immigrant panel. (It is a coincidence that both factors are 2.7).<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2015\/04\/figure_prevalence.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-6922\" src=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2015\/04\/figure_prevalence.png\" alt=\"\" width=\"775\" height=\"564\" srcset=\"https:\/\/surveyinsights.org\/wp-content\/uploads\/2015\/04\/figure_prevalence.png 775w, https:\/\/surveyinsights.org\/wp-content\/uploads\/2015\/04\/figure_prevalence-300x218.png 300w\" sizes=\"auto, (max-width: 775px) 100vw, 775px\" \/><\/a><\/p>\n<p>Figure 1: Distribution of comment types and approximate 95% confidence intervals for the Immigrant and LISS panels. \u00a0The bars of the six negative comment categories have the same colour. The percentages sum to 100% in each panel.<\/p>\n<p>&nbsp;<\/p>\n<p>Finally, what respondents are not commenting on can be just as revealing as what respondents are commenting on.\u00a0 There were no comments that might have required follow-up by a psychologist (e.g. comments revealing suicidal tendencies) or other professionals (e.g. threats).\u00a0 Privacy concerns were scarcely mentioned.\u00a0 Very few respondents inappropriately used final comments to direct questions to the panel administration (\u201cCan you call me, my payment didn\u2019t go through\u201d). While respondents used final comments in very different ways, nearly all answers were very thoughtful.<\/p>\n<p>&nbsp;<\/p>\n<h2>4. Discussion<\/h2>\n<p>To our knowledge this is the first study to systematically categorize final comments. In both the LISS and immigrant panels we found that neutral and negative comments far outweigh positive comments. Among negative comments, those related to questionnaire wording (survey difficulty and something being unclear) were more prevalent than negative comments unrelated to the questionnaire wording (survey length, technical errors and questions perceived to be not being applicable to the respondent.)\u00a0 The category \u201cother negative comments\u201d was most prevalent. This speaks to the large diversity among the comments and to the difficulty of categorizing comments into a small number of categories.<\/p>\n<p>Like all studies this study also has limitations. First, the kappa values measuring interrater reliability are lower than generally accepted. However, to our knowledge there is no literature on what respondents say in final comments. While not perfect, this categorization is the first one and a categorization based on a moderate kappa value is preferable to no information at all. There are many studies with moderate kappa values in the literature (Goodman, 2007; Liu et al., 2014; Paik et al., 2004) . Additionally, to the extent that a moderate-kappa x-variable is used in a subsequent regression, measurement error can be accounted for in analyses. This is briefly discussed below.<\/p>\n<p>Second, final comments were categorized in two Dutch language web survey panels and therefore cannot findings cannot necessarily be generalized to other panels.\u00a0 However, both panels are high quality panels based on a probability sample of the target population. The two panels focus on very different populations (Dutch population vs. an immigrant population). It is surprising how similar the distributions of different comment types are.<\/p>\n<p>The categorization of final comments was challenging. The inter-rater reliability of categorization was particularly low for \u201ctechnical error\u201d and \u201cquestions do not apply to me\u201d. \u00a0\u00a0Raters\u2019 difficulty to ascertain a \u201ctechnical error\u201d is particularly striking. Raters had no survey programming experience. \u201cTechnical error\u201d had a lower reliability because non-expert raters tended to also include final comments reporting alleged errors.\u00a0 For example, some respondents referred to allegedly incorrect question wording as an error. A preference for different question wording is not a technical error and should instead have been classified as \u201cother negative comment\u201d.\u00a0 The category \u201cdoes not apply to me\u201d had a lower reliability because raters sometimes inferred the meaning of \u201cdoes not apply\u201d from the text without the respondent directly stating the questions did not apply to them. \u00a0For future categorization of final comments, the categories \u201ctechnical error\u201d and \u201cquestion\/survey does not apply to me\u201d should be removed.<\/p>\n<p>What are the implications of this analysis? First, panel owners should be relieved to hear that no mission-critical information appear to have been missed in the final comments. Second, researchers might want to consider additional pretesting of questions to be fielded in the immigrant panel as respondents were far more likely to perceive questions as unclear. Third, for the future categorization of final comments, an initial assessment of whether a comment is positive, neutral or negative appears useful. If desirable, these three categories can then be broken down into subcategories. Fourth, for using final comment categories as x-variables in regression analysis, analysts might want to incorporate measurement error into the regression model rather than using a single gold standard categorization. This is particularly important because the moderate value of kappa indicates a non-trivial amount of measurement error. \u00a0One popular approach for generalized linear regression models with x-variables subject to measurement errors is the SIMEX method (Carroll, K\u00fcchenhoff, Lombard, &amp; Stefanski, 1996). SIMEX has been implemented in software packages like Stata (Hardin, Schmiediche, &amp; Carroll, 2003) and R (Lederer &amp; K\u00fcchenhoff, 2006).<\/p>\n<p>In this paper we have investigated what respondents in the LISS and immigrant panels say in final comments. It is unclear whether the content of final comments correlates with measures of data quality. For example, respondents who are negative might be more likely to attrit.\u00a0 This and related questions will be addressed in a follow-up paper.<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<h2><strong>Appendix A <\/strong><\/h2>\n<p><strong>\u00a0<\/strong>This appendix gives the exact wording of the final comment question in both the LISS panel and the immigrant panel.\u00a0 The original questions in Dutch is:\u00a0 \u201cHebt u nog opmerkingen over deze vragenlijst?\u201d or in English translation \u201cDo you have any remarks about the questionnaire?\u201d The routing (for both the LISS and Immigrant panel) in all questionnaires is as follows:<\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p><strong>opm<\/strong><strong><\/strong><\/p>\n<p>Hebt u nog opmerkingen over deze vragenlijst?<\/p>\n<p>1 Ja<\/p>\n<p>2 Nee<\/p>\n<p>&nbsp;<\/p>\n<p><em>if opm = 1<\/em><\/p>\n<p><strong>evaopm<\/strong><\/p>\n<p>U kunt uw opmerking hieronder invullen.<\/p>\n<p><em>open<\/em><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Introduction \u201cDo you have any other comments?\u201d This or a similar question is often asked near the end of a web survey and routinely in web survey panels such as the LISS panel (Longitudinal Internet Studies for the Social Sciences). We call such an open-ended question a \u201cfinal comment\u201d. Final comments are the respondents\u2019 [&hellip;]<\/p>\n","protected":false},"author":417,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[322,324,323],"class_list":["post-6899","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-final-comment","tag-further-comment","tag-open-ended-question"],"acf":[],"_links":{"self":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/6899","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/users\/417"}],"replies":[{"embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6899"}],"version-history":[{"count":84,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/6899\/revisions"}],"predecessor-version":[{"id":7524,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=\/wp\/v2\/posts\/6899\/revisions\/7524"}],"wp:attachment":[{"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6899"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6899"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/surveyinsights.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6899"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}