Stefanie Wulff and Ryan K. Boettger (with the help of research assistant Chad Hammock): Collaborative Research: Evaluating a data-driven approach to teaching technical writing to STEM majors
(research project; funded by NSF #1708360/#1708362)
Overview. This research project seeks to improve the quality of writing instruction for undergraduates majoring in science, technology, engineering, and mathematics (STEM). Understanding writing disciplinary differences has become increasingly relevant as instruction moves from literature-based composition courses in English departments to include technical writing and content-based courses taught by scholars in different disciplines. One effect of these changes is that students need to write in a way that conforms to the practices of a discipline they may not (yet) be familiar with. However, STEM undergraduates have little access to customized, discipline-specific writing instruction. A solution to this problem is engaging students in a form of data-driven learning (or DDL) that teaches them how to write in their discipline rather than apply generalist writing principles that contradict how professionals actually communicate. An interdisciplinary team of researchers will develop a series of DDL instructional units for STEM undergraduates in both multi-major writing-intensive courses as well as STEM-focused content courses in physiology and ecology. Unit content and students' application of the instruction will be validated through peer review and revised via a control-group quasi-experimental design. Results and instructional materials will be disseminated through publications, workshops, and publicly available web tutorials.
Intellectual Merit. Introductory technical writing courses provide a great service to STEM departments, but it is not uncommon for instructors to have 20 different majors represented in their classroom. This project includes an innovative combination of characteristics designed to help writing and discipline-specific instructors customize their curriculum to meet the needs of all students: (i) It introduces modern corpus-linguistic methods that make large-scale studies possible, covering more text types and more language features, rather than case studies of a small number of individuals, classes, or texts. (ii) The DDL environment provides STEM students an accessible forum for applying these techniques and learning to overcome writing deficiencies that are prevalent in their disciplines. (iii) The project's personnel encompass content, language, and methodological expertise and represent three disciplines: biology, linguistics, and technical communication. (iv) The effects of DDL will be assessed in four diverse populations at a major public institution that reflects the global demographics and instructional challenges for teaching technical writing. The inclusion of multiple instructional settings will address how DDL transfers to diverse STEM settings and influences how students learn technical writing.
Broader Impacts. The proposed project advances discovery and understanding of how STEM students learn to write in their disciplines. Additionally, the project fosters new interdisciplinary collaborations focused on a fundamental component of STEM education—technical writing. STEM undergraduates need customized writing instruction and enhanced communication skills to prepare for the workforce. To help these students and their instructors, the team will disseminate the following for public use: (i) the Technical Writing Project (TWP), an online corpus of student technical writing previously compiled by the lead researchers; (ii) materials for the instructional units; and (iii) a series of web tutorials for audiences engaged in STEM writing practices on how to use the TWP and the instructional materials for individual and classroom learning purposes. The team will also disseminate the research findings through conference presentations, workshops, and peer-reviewed research within linguistics, technical communication, and STEM education. These venues attract academics and practitioners as well as national and international audiences.
Ethan Kutlu: Factors impacting native speakers’ FAS judgments
(Ph.D. dissertation project)
In this dissertation, we are aiming to understand and identify factors that can affect a rater's judgments while hearing foreign accented speech. Many L2 learners face daily discrimination as their speech may be accented and thus considered incomprehensible. In comparison to a regional accent, which is generally found to be more acceptable, foreign accented speech (FAS) is often regarded as problematic (Gluszek & Dovidio, 2010). Since the early 1970s, FAS has been examined in the (related) fields of linguistics, second language acquisition, and more recently, social psychology (Munro & Derwig, 1995; Ferguson et al. 2010; Van Engen & Peelle, 2014). Meanwhile, linguistic studies in accentedness and speech perception agree that speech perception is variable, and that humans can identify sounds even with minimal acoustic cues (Hillenbrand, Clark & Baer, 2011). This raises two questions: What makes FAS different from other kinds of speech variation? Why is FAS judged so negatively by so many native speakers?
Alexandra Lavrentovich: Using classifier features to determine cross-linguistic influence on the developmental trajectory of English morphemes
(Ph.D. dissertation project)
One prevailing position in second language acquisition (SLA) research is that learners of another language (L2) follow a predictable, fixed path in the acquisition of morphosyntactic structures (Goldschneider & DeKeyser, 2001; VanPatten & Williams, 2007), regardless of their dominant language (L1) background (Ellis, 1994; Ortega, 2009). For example, grammatical morpheme studies propose the following so-called natural order for English learners (Krashen, 1987). However, recent literature reviews, experimental studies, and corpus approaches have cast doubt on the fixed nature of developmental sequences (Hulstijn et al., 2015; Weitze et al., 2011; Murakami & Alexopoulou, 2016). For example, Lukand Shirai (2009) find Japanese and Spanish learners of English show different hierarchies of accurate use of three morphemes, which may be explained by the presence or absence of the equivalent morpheme in the L1. In a longitudinal corpus study, Murakami (2016) shows individual variation and non-linearity in trends for accurate use across proficiency levels. Hence, L1 background and proficiency can reorganize the predicted order of morpheme acquisition. Aligning with the current research, this dissertation investigates cross-linguistic influence in the developmental trajectory of English grammatical morphemes. The research aims to model the dynamic and emergent nature of morpheme production by using a longitudinal learner corpus and computational methodology. The research has the following goals:
1. Quantitatively detect the under/overuse of grammatical morphemes between learners with different L1 backgrounds and qualitatively examine what underlies these patterns to determine cross-linguistic influence.
2. Model the absence and presence of grammatical morphemes for individual learners across different proficiency levels to determine the extent of individual variation in morpheme accuracy development.
The data will come from the EF-Cambridge Open Language Database (EFCamDat), a 33-million-word longitudinal corpus of English learner scripts written by students enrolled in a virtual learning environment (Geertzen et al., 2014). From the data, I include Chinese, Spanish, Portuguese, Arabic, Russian, and German learner groups as they are the most represented in the corpus accounting for over 70% of the data (Alexopoulou et al., 2015; Nisioi, 2015). The learners pass through 16 proficiency levels in the online curriculum that correspond to the language proficiency guidelines A1 through C2 set forth by the Common European Framework of Reference.
The main goal is to determine how cross-linguistic influence (CLI) might reorganize the predicted morpheme order at different proficiency levels of a learner’s developmental trajectory. To demonstrate L1 influence, I will follow criteria from a detection-based approach (Jarvis & Crossley, 2012) which uses frequency differences between English writing patterns and the selected L1 backgrounds. The criteria for determining CLI are as follows: (1) intragroup homogeneity: where learners with the same L1 show similar morpheme developmental trajectories; (2) intergroup hetereogeneity: where learners with different L1 backgrounds show different trajectories; (3) cross language congruity: where learners use an English pattern that is similar to one they have in their L1; and (4) intralingual contrasts: where learners differentially use an English feature depending on how congruent that feature is in their L1. One way to meet the criteria is to carry out a Native Language Identification (NLI) task where a machine classifier identifies a learner’s L1 based solely on the learner’s Englishwriting. An NLI analysis identifies the specific English features most likely to be affected by the L1 which we may not detect from more subjective, manual, surface-level analyses (Crossley, 2012). Acomputational approach to CLI has the advantage of being able to deal with a large quantity of very similar data points (e.g., the distribution of function words across all learner essays) and estimating the probability of a learner's L1 given subtle patterns in the data (e.g., the overuse or underuse of function words). I will use a support vector machine classifier with features such as part-of-speech n-grams and function words. The findings from the classification task will be used to determine patterns of the presence or absence of specific linguistic features between L1 groups and how these patterns may change across proficiency levels. To further explore longitudinal factors and individual variation, I will use generalized additive mixed models on individuals in the corpus.
The intellectual merit of this research will be in its triangulation of learner corpora, computational methods, and qualitative analysis to show how differences between learners can be approached in a data-driven way. The study looks at the emergence and distributional frequencies of grammatical morphemes for English learners with different L1 backgrounds across increasing proficiency levels. The NLI approach improves on manual comparisons or learner case-studies because we can use larger data sets, make more objective decisions for where L1-specific language transfer effects may occur, and perform more semi-automatic analyses on other available corpora. There’s also evidence that classifiers outperform human experts in detecting L1 background (Malmasi et al., 2015). The broader impact of this research is to exploit the findings on cross-linguistic transfer and individual variation in hypothesis-making in SLA and pedagogy. For example, the NLI task contributes to SLA research by adding quantitative data to known transfer effects that an otherwise manual inspection could miss and may help with hypothesis-making as to why these transfer effects exist. For direct applications in language teaching and learning, L1-specific transfer effects can be used informatively to tailor instruction, feedback, and methods in the classroom and curriculum as well as be applied to language teaching technology.
Stefanie Wulff and Stefan Th. Gries (with the help of lab volunteers Anna Bjorklund, Steven Critelli, Erica Drayer, Corinne Futch, Hali Lindsay, and Noah Rucker): Cognitive determinants of oral and written blend formation
In this research project, we aim to take a first step towards addressing this gap by conducting an experimental study in which native speakers of English are asked to blend source words together. The source word stimuli will be systematically controlled for the different cognitive determinants mentioned above. In a crucial extension of our previous research with Dylan Attal (see below), we will elicit blends both orally and in writing from our participants. The results will be statistically evaluated both monofactorially (means, interquartile ranges, and exact tests) and multifactorially by means of a linear model that identifies which factors contribute to an increasing distance of the chosen cut-off points to the ideal ones as determined by the predictors (Gries 2006). The findings of this study stand to make a valuable contribution to our understanding of subtractive word formation processes by providing us with first clues regarding an online production and comprehension model of blending and by informing our understanding of the differences between creative and conscious word formation processes such as intentional blending compared to involuntary and unconscious word formation processes such as speech errors.
Stefanie Wulff and Stefan Th. Gries (with the help of research assistant Corinne Futch): Particle placement in L2 learner language
(research project; funded by a Language Learning Small Research Grant)
In this project, we are carrying out the first large-scale- corpus-based analysis of particle placement in learner language. Particle placement is a word-order alternation that involves the variable position of the particle in English transitive phrasal verbs (The squirrel picked up the nut vs. The squirrel picked the nut up). While researched intensively in native language, the present study presents the first large-scale, corpus-based account of particle placement in learner language, including data from three L1 backgrounds (Chinese, German, and Spanish) as well as native English speakers; data from the spoken and written modes; and a statistical model integrating a large number of variation parameters known to influence alternations in general, especially under-researched phonological constraints.