Protocol Analysis in Engineering Design Education Research: Observations, Limitations, and Opportunities

Background: One of the most popular methods for studying the cognitive processes of design and problem-solving activity is Protocol Analysis (PA). As such, PA has been widely used in engineering design education research. Purpose: The aim of this work is to describe how PA has been used in engineering design education contexts, understanding the range of research questions that can be addressed by the method as well as providing some commentary on the strengths, limitations, and future directions of the method. Scope/Method: We conduct a systematic review of the literature following the PRISMA method. A search combining key terms – protocol analysis, design, engineering, student – and their variants in the Scopus database resulted in 126 articles, which were further reduced to 45 through two rounds of abstract and full-text screening. The main inclusion criteria was that the work use PA as the method to investigate design activities in an engineering educational setting. Conclusions: The use of PA has significantly contributed to understanding the cognition of students engaged in design activities and to improving engineering design education. Technological advances enable new effi ciencies in protocol collection and analysis, offering promising new directions in the use of PA in more authen tic learning environments.


Introduction
Design is central to engineering practice. As such, since the mid 1990s there has been a considerable amount of research advocating for the teaching of design to engineering undergraduates and for the implementation of new teaching practices that would enable students to develop the skills necessary to take on the complex and dynamic problems of the world .
Improving engineering design education practices requires deep insight into student design behaviours and cognition. One of the most popular methods for studying cognitive processes is Protocol Analysis (PA). First disseminated by Anders Ericsson and Herbert Simon in their 1984 book Protocol Analysis: Verbal Reports as Data, PA has been widely used in psychology studies to understand the cognitive processes of participants while performing a variety of tasks. Participants' verbalizations are recorded, transcribed, segmented, and coded by human judges. Coded segments then form the basis for statistical analyses. Verbal reports "can be of the greatest value in providing an integrated and full account of cognitive processes and structures" (p. 373).
PA is a tool capable of supporting the assessment of new teaching practices by providing insights into the design practices of students and the effects of any interventions. As such, it has been widely used in research on engineering design teaching and learning. The most notable contributions to the development of PA as a way to characterize the students' design processes can be attributed to the work of Cynthia Atman and her colleagues Adams et al., 2003;Atman & Bursic, 1998;Atman et al., 2005;Cardella et al., 2002;Cardella et al., 2006;Cardella et al., 2008;Chimka & Atman, 1998). Multiple other researchers have also used PA to investigate various elements of the design cognition of students. Some of the most notable contributions come from John Gero and colleagues, who have completed a number of protocol studies using the Function-Behaviour-Structure ontology (Kannengiesser & Gero, 2004). Overall, these studies have explored the design cognition of different types of novice designers and showed that PA can be a flexible method that can be adapted to each individual researcher's needs through the application of different coding schemes to protocol data. Recent publications provide evidence that this method continues to be relevant and useful for studying engineering students' design practices (Baily & McFarland, 2018;Dixon & Buckner, 2019;Patel et al., 2018).
In a recent comprehensive overview of three decades of research, Atman explains that the motivation for this work has been to develop descriptive models of how engineers do design (Atman, 2019). These can be used alongside existing prescriptive models to inform better ways of teaching engineering design. Overall, the developed data sets and analyses demonstrate the usefulness of the PA method, the statistical power of the data generated from protocols, and, most importantly, the impact of these studies on improving engineering design education. Cash (2018) highlights the importance of rigour in design research for answering key research questions and producing real impact in design teaching and practice. Given the widespread use PA has found in engineering design education research, a comprehensive review of this research can facilitate a critical assessment of the methodology and contribute to improved rigour of future protocol studies in engineering education. As such, in this paper we systematically review the use of PA to study engineering design processes, taking a special interest in novice designers in learning contexts. Our goal is to not only understand how PA has been used but also assess its usefulness and limitations and to identify opportunities for improving the method and/or extending its use in new design learning contexts. Specifically, this study has three objectives:

Aims
1. Describe how the PA methodology has been used and adapted in engineering design education research, focusing on protocol creation and analysis. 2. Evaluate the range of research questions in engineering design education that PA has been used to answer. 3. Identify any drawbacks and limitations to using PA in this setting and reflect on opportunities and new research directions The rest of the paper is organized as follows. First, we describe the systematic literature review methodology that was followed (section 2). We then provide a detailed overview of how verbal protocols have been created and analyzed (section 3) and the main research questions the methodology has been used to answer (section 4). Finally, section 5 offers our commentary on the limitations of the method and future directions of PA in engineering education.

Method
Systematic reviews offer a method to synthesize the available knowledge on a variety of topics. The goal for a systematic review should be to review relevant literature in a way that is reproducible and transparent (Lame, 2019). In this study, we conducted a systematic review of the literature following the PRISMA model (Moher et al., 2009), as highlighted in Figure 1.
The search was conducted in the Scopus database in April 2020. As the focus is on the method of PA, the searches included the "protocol analysis" term in conjunction with nine other keywords. To evaluate the effectiveness of PA as a method to understand design processes of students, it was essential to include both the word "design" and "problem solving", as these are common terms used to describe the types of activities and problems related to the research objectives. In order to flag relevant records from an educational context, words like "student" and "novice" were also added to the query. The final query was (("protocol analysis" AND (design* OR "problem solv*") AND (teach* OR educat* OR learn* OR novice* OR student* OR junior*) AND engineer*)).
Given the focus of this study on the use of PA in engineering design education research, inclusion criteria (ICs) were developed, such that: IC 1 : In order to get the most accurate and useful information, only peer-reviewed journal articles, conference papers, book sections, and books were sought out. In instances when a study was published as a conference paper first and then later as journal article, the former record was excluded. IC 2 : PA was the main research method used in the study. IC 3 : The study used a design or complex problem solving task. IC 4 : The study was conducted in an engineering education setting, where participants were design novices, belonging to one or more of these categories: high school students (in the context of pre-engineering courses), undergraduate students, graduate students.
This initial search produced a total of 126 records, which were screened based on their abstracts alone. Both authors conducted this screening independently first. Inter-rater agreement between the two was very high (Cohen's kappa = 0.86, p < 0.01), and any disagreements were later jointly discussed and resolved. This screening reduced the dataset to 59 records. This set also included any records the authors could not definitively exclude based on their abstracts alone.
Once the abstract-based screening was completed, the first author alone completed a full-text scan of the 59 records. This scan flagged 10 records that did not meet all inclusion criteria. Four more records could not be found through our institution's library and were also removed. The remaining 45 records -20 journal articles and 25 conference papers -were included and were used for the qualitative synthesis. A detailed overview of all 45 records is provided in Table 1. For each record, the table summarizes the purpose, design task used in the study, characteristics of study participants and sample size, and the coding scheme and number of coders performing the protocol analysis.

Creating and analyzing verbal protocols
In this section we review how PA has been used to study student and novice engineering designers, focusing in particular on methodological considerations including how data is typically collected (protocol creation) and analyzed (coded) and how study results are visualized.

Participant selection
Participant recruitment is based on a number of factors, including research objectives and study design. In the studies included in this review, participants ranged widely in their design experience and included high school students (4 studies), engineering students in undergraduate and graduate programs (36 and 7 studies respectively), engineering faculty members (2 studies), and practicing engineers with a number of years experience working in industry (6 studies). For example, Dixon and Bucknor (2019) were interested in the design heuristics used by novices and experts, so both engineering students (novices) and professional engineers (experts) were recruited. In order to produce robust results, efforts are made to  collect protocols from a large number of participants, although this is difficult and time consuming. In the reviewed studies, PA has been used with as few as two (Kavakli & Gero, 2002) and as many as 128 participants (Morozov et al., 2007). A typical range found in the reviewed literature was about 10-30 participants. A further criterion applied when selecting participants is their specific engineering discipline. Participants might be required to have background knowledge in a particular engineering domain (electrical, mechanical, civil, etc.). In the reviewed studies, mechanical engineering students were the most common, explicitly mentioned in 13 of the 45 articles. Since many of the reviewed articles used first-year engineering students, many of those studies did not specify the discipline. This is likely a result of engineering programs with common first years -a program design in which first-year engineering students do not declare the major of their engineering studies until second year, after completing a common first year.

Design task
Another critical aspect of designing studies that investigate novice design cognition and behaviour is the selection of a suitable design task/problem. Akinci-Ceylan et al. (2018) provide a thorough discussion of how they developed the problems used in their protocol studies. In their case, a research team developed ideas for an ill-structured problem related to civil engineering, narrowed the scope down to four problems, and further evaluated them using a survey. The two problems that were eventually selected for their study were deemed to have high relevance to the discipline and a sufficiently large number of solutions.
The process described above offers an example of how a research team might spend time developing the problems used in a protocol study of students' design processes. In many of the reviewed papers, however, a number of "standard" problems have also been used, as described below: • As a more general problem-solving activity, the Midwest Flood Problem (MWF) presents participants with the following description and task: Over the summer, the Midwest experienced massive flooding of the Mississippi River. What factors would you take into account in designing a retaining wall system for the Mississippi? (Adams et al., 2003;Cardella et al., 2008;Coso et al., 2010;Morozov et al., 2007). This problem is meant to direct participants to a particular goal, encouraging them to think of constraints and factors of a proposed solution (Morozov et al., 2007). • The jar opener task has participants develop a potential jar opening device that can assist people who only have the use of one hand to open jars Lemons et al., 2009Lemons et al., , 2010aLemons et al., , 2010b. In some studies, participants were able to build physical prototypes of their jar opener using LEGO™ pieces ). • The ping pong problem asks participants to design a device capable of launching a ping pong ball accurately at a target. The goal is to have the ping pong ball land as close to the middle of the target as possible while also maintaining significant flight time (Adams et al., 2003;Atman et al., 2005;Cardella et al., 2002;Cardella et al., 2008). The design should be accompanied by a detailed diagram and any calculations that were performed . • The playground design problem asks participants to come up with a design of a playground while meeting a number of requirements Atman & Bursic, 1998;Cardella et al., 2006;Cardella & Tolbert, 2014;Chimka & Atman, 1998;Mentzer et al., 2015). Examples of those requirements include the safety of the playground, the longevity of the design, cost, and compliance with accessibility regulations . • The street crossing problem has participants work on developing a cost-efficient and safe method to help students cross a busy intersection on campus. Participants are typically presented with an intersection they are familiar with, and they are asked to include any relevant diagrams and assumptions they made during their design (Adams et al., 2003;Cardella et al., 2002;Cardella et al., 2008;Roberts et al., 2007). • In the window problem participants design a device which could provide assistance to a person with a disability to open a stuck, double-hung window without relying on electric power Kannengiesser & Gero, 2017;Lammi & Gero, 2011, Lammi, 2011Song & Becker, 2014;Wells et al., 2016;Williams et al., 2012). This specific task can be attempted by fairly novice designers.
Finally, in one study participants were asked to improve an already existing product (Dixon & Bucknor, 2019). In this example, participants were provided with a description of an existing hybrid motorcycle and asked to generate some conceptual improvements to the motorcycle so that it would, for example, have a greater carrying capacity while still maintaining enough power to climb steep hills.

Study logistics
Reviewed studies provided insight into the general methodological steps that are used in a protocol study. Once participants are recruited and a design task is selected, collecting the protocols can begin. Typically, the experiment is set up in a quiet room with minimal distractions (Ball et al., 2004;Carberry et al., 2009, Lammi, 2011Lemons et al., 2010a;Patel et al., 2018;Song & Becker, 2014). This ensures that participants are able to focus on the design task and the recordings are of sufficient quality for accurate transcription. An important decision in study design is whether verbal reports will be concurrent or retrospective (Ericsson & Simon, 1984, p. 16). In concurrent verbal reports (also known as talk aloud or think aloud reports) participants verbalize what they are doing as they complete a task. In contrast, retrospective verbal reports ask participants after they complete their task, sometimes with a video prompt of the activity, to describe exactly what they were doing at that moment. In an ideal situation, retrospective protocols are taken immediately after the activity, so participants can more accurately describe their actions. Concurrent verbal reports were the most common in the studies we reviewed, with only four of the 45 using retrospective reports.
In almost all of the reviewed studies, verbalizations were both audio and video recorded, and were sometimes supplemented with additional materials (e.g., screen capturing software, sketches, white board use, etc.) that were created during the session Cardella & Tolbert, 2014;Kavakli & Gero, 2002;Lammi, 2011;Patel et al., 2018;Sriram et al., 2015;Williams et al., 2013). Multiple cameras, positioned at various angles, and individual microphones are used to gather the most comprehensive recording of the session (Kannengiesser & Gero, 2017;Lemons et al., 2009;Mentzer et al., 2015;Wells et al., 2016;Williams et al., 2012;Williams et al., 2013). Using multiple video cameras and recording devices also provides redundancy, safeguarding against potential technological issues that may be encountered when using a single device (Roberts et al., 2007;Wells et al., 2016).
The study design may require participants to work on the design task individually or in groups. The former was the most common study design in our review (36 of 45 studies). In concurrent protocol studies where participants are working individually, they are usually asked to first describe their actions as they complete a simple task (i.e., solving a puzzle or a simple mathematics problem), in order to familiarize themselves with the think aloud protocol Atman & Bursic, 1998;Christiaans and Dorst, 1992;Dixon & Bucknor, 2019;Moore et al., 2014;Sutcliffe & Maiden, 1992). In cases where participants are working in groups, this step is unnecessary as pairs and groups have been found to naturally promote authentic verbalisations among participants (Wells et al., 2016).
Participants are then provided with the description of the design task and any other information (e.g., location of prototyping tools, information seeking procedures, etc.) they may need in order to successfully complete the task. Experimental sessions can range in length from 25 minutes (Daly et al., 2018) to as long as three hours (Bailey & MacFarland, 2018) depending on the design task used, although participants do not need to use all the time allotted (Mentzer et al., 2015).
A facilitator or observer is typically present for the duration of the activity and has a number of potential responsibilities. Their most common purpose is to prompt participants to continue speaking if they fall silent during the activity (Akinci-Ceylan et al., 2018;Atman & Bursic, 1998). This ensures that as much of the thought process and cognitive effort as possible is captured by the recordings in order to be analyzed later. In Sriram et al. (2015), an observer was present in order to ask questions and take observations at regular intervals. Mentzer et al. (2015) used a facilitator who would, when asked by the participants, provide relevant information about the task. Similarly, Roberts et al. (2007) had a faculty team mentor present at tables for groups to interact with and ask questions of.

Protocol analysis
Once audio/video recordings are collected, the next step in the process is transcription. Ericsson and Simon (1984) recommend that all verbalizations of a session are recorded and later transcribed verbatim to keep the data in its most raw form (p. 4). While transcribed text facilitates the analysis, it is widely accepted that access to the audio/video recording throughout the analysis can be beneficial as it allows researchers to capture the most comprehensive account of the design session. For example, Wells et al. (2017) used a video to provide a time-stamped recording of the entire session which was later used for easy access to particular parts of the design session. High-quality transcripts together with source audio/video data form the dataset on which PA can be performed.

Segmenting and coding protocols
The first step in the analysis is typically segmentation, which is the process of breaking the verbal text into units (or segments) that can be coded using a pre-defined scheme . Depending on the approach taken, segmentation can occur either before or in conjunction with the application of codes. Akinci-Ceylan et al. (2018) used the criteria of segmenting based on utterances, which were defined as a string of words followed by a period of silence. Transcripts have also been segmented based on the function a team was working on or the structure of the group (Bailey & McFarland, 2018). In protocol studies where participants are working together, segmentation can be a difficult process because team members may be talking over one another and a number of incomplete ideas can be found in the transcripts. As a solution, Cheong et al. (2014) used 10-second units of the transcript to create codable segments.
Once segmentation of the transcripts is complete, the coding scheme can be applied to the data. When the intended coding scheme is the one derived from the Function-Behaviour-Structure (FBS) ontology (Gero & Kannengiesser, 2004), the application of codes to the text by the first coder determines its segmentation, with subsequent coders then applying codes to the segmented text (and re-segmenting if necessary). Codes are usually assigned to the segments by two or more independent coders, but this task can also be completed by a single coder (e.g., Lane and Seery (2011)). Codes are then checked for reliability between coders (a specific agreement level is usually reached) and any discrepancies are resolved in an arbitration session, resulting in one final set of coded segments (Williams et al., 2012).
Many of the reviewed studies (14 of 45) implemented the coding scheme first described in . This coding scheme was based on a content analysis by Moore et al. (1995) and included the following 10 codes, each mapping onto a different design activity (Atman & Bursic, 1998): • Need -identify basic needs (purpose, reason for design) • Problem Definition (PD) -define what the problem really is, identify constraints and criteria, read the problem statement, information sheets, and questions. • Gather information (GATH) -searching for and collecting needed information • Generate Ideas (GEN) -develop possible ideas for a solution, brainstorm, list different alternatives • Modeling (MOD) -describe how to build an idea, how to make it, measurements, dimensions, calculations • Feasibility Analysis (FEAS) -determine workability, verification of workability, does it meet constraints, criteria, etc.
• Evaluation (EVAL) -comparing alternatives, judgement about various options • Decision (DEC) -select one idea or solution among alternatives • Communication (COM) -define the design to others, write down a solution or instructions • Implementation -produce or construct a physical device, product, or system The codes have been shown to effectively capture the elements of the design process of students in a number of empirical studies Adams et al., 2003;Atman & Bursic, 1998;Bailey & McFarland, 2018;Cardella et al., 2002;Cardella et al., 2006;Cardella et al., 2008;Chimka & Atman, 1998;Coso et al., 2010;Roberts et al., 2007). In practice, the first and last codes are often excluded from the coding process because the problem statements include the need for the design and the conceptual solutions are rarely ever carried out to implementation. Atman and Bursic (1998) used this coding scheme to identify the design steps, in addition to two other coding schemes which included the activity (reading, assumptions, constraints, calculations, and other) and particular objects of interest to their study. Using more than one coding scheme allows for multiple comparisons of design activities and other elements present in the session (e.g., the co-occurrence of design evaluation and reading). Mentzer et al. (2015) used this coding scheme and grouped codes together to create a problem scoping stage, a developing alternative ideas stage and a project realization stage that can be used to make more general claims.
Another set of the reviewed studies (9 of 45) use the FBS ontology to analyze the protocols (Cash & Maier, 2016;Gero et al., 2013;Kannengiesser & Gero, 2017;Lammi & Gero, 2011;Moore et al., 2014;Song & Becker, 2014;Wells et al., 2016;Williams et al., 2012;Williams et al., 2013). According to this ontology the activity of designing is a transformation of a set of requirements and functions into a set of design descriptions (Gero & Kannengiesser, 2004). The FBS coding scheme uses six codes, each associated with a different design issue relating to the function, behaviour or structure of the object. The codes are typically denoted as follows: Requirements (R) -which come from outside the designer, Function (F) -the teleology of an object or what it is for, expected Behaviour (Be) -what you expect the behaviour to be, Behaviour derived from structure (Bs) -the actual behaviour of the artefact, Structure (S) -representing the components of an artefact and the composition, and finally the Description (D) -any documentation of designing. In addition to the six design issues, several moves between design issues -described as design processes -can also be examined: Formulation (R → F, F→ Be), Synthesis (Be → S), Analysis (S → Bs), Documentation (S → D), Evaluation (Be ↔ Bs), Reformulation I (S→S), Reformulation II (S→Be) and finally Reformulation III (S→F). Further, the FBS ontology can be combined with another approach -linkography (Goldschmidt, 1990) -to graphically represent the interconnections among the various design issues/processes (or "design moves") across the entire design session (Gero, 2010;Kan & Gero, 2017).
In many of the examined studies (27 of 45), researchers employed other coding schemes, developed and adapted to the context of their research. Dixon and Bucknor (2019) used a coding scheme to identify local, transitional, and process heuristics for their study on the use of design heuristics by experts and novices. Bailey and McFarland (2018) used a combination of Atman's coding scheme and their own. The later identifies the structure of the design team (i.e., working individually or working as a group) during a design session in order to characterize the role of prototyping in design teams. Cheong et al. (2014) used a coding scheme that identifies types of analogies (entity, function, strategy) and different design activity modes (problem analysis, biological phenomenon, existing solution, new solution, and evaluation). These codes were used to understand the conceptual design process when students were encouraged to use biological analogies as the foundation for their solutions. Lemons et al. (2010a) developed a coding scheme to identify processes of idea generation, evaluation, metacognitive strategies, clarification, and model building limitations. These codes were assigned to protocols of eight students working on a model building task, a study which sought to identify the benefits of model building for students. Finally,  used an extension of the collaborative stimulation model  to characterize the shared design entities and questions asked of collaborators. Their codes identified when collaborators were prompting memories, correcting, and clarifying the design ideas of other group members. Overall coding schemes must accurately map to the data you have collected. As such, there are infinite combinations of codes that can be developed depending on the context of the study.

Analyzing and visualizing coded protocols
One of the benefits of using PA is to generate quantitative results from qualitative data (Atman, 2019). Frequencies of codes are used to generate timelines for design sessions, in order to show how much time designers are spending in a particular design phase. In addition, statistical analyses can be used on coded protocol data to test hypotheses. Examples of these include comparisons between two different participant groups using t-tests (Cardella et al., 2002), correlation analyses to determine the relationship between quality of solutions and other design behaviour that is captured by protocol data (Cardella et al., 2002), and linear regression models to check the effect of potential confounding variables on the experimental condition .
Protocol data requires significant manipulation in order to perform statistical analyses. This process is time-consuming and often tedious, so programs have been developed to streamline this process. Using the macro functionality in Excel, Roberts et al. (2007) developed I-PACE, a program which allowed for real-time coding of a video recording. Another tool -MacSHAPA -was used in early protocol studies to assist with data analysis (Sanderson, 1995). The program offers easy linking between video segments and associated spreadsheet cells to increase the speed of data analysis. It also generates descriptive statistics of the protocol data including the total time spent in each activity, the number of transitions, and timelines. More recently, Pourmohamadi and Gero (2011) introduced LINKOgrapher, which is an analysis tool to study design protocols based on the FBS coding scheme. LINKOgrapher has been used to reduce the time and effort needed to analyse coded design protocols. Chimka and Atman (1998) provide a description of four ways in which protocol data can be graphically represented. First, design timelines provide a temporal overview of the designers' process and the transitions between design activities. Similar to timelines, cumulative time charts show when designers divide their time equally between design steps and/or when a particular step overtakes the process. Individual curves for cumulative time charts composed of many different lines indicate a higher transition rate of design activities. Stacked bar charts can be used to make comparisons between different aspects of the design process by showing the percentages of the total time spent in each phase of the design process. Finally, three-dimensional bar charts are good representations of design activity and another activity (e.g., what the student is physically doing like reading or writing). Other good examples can be found in Adams et al. (2003) and Cardella et al. (2008).
Protocols coded with the FBS ontology (Gero & Kannengiesser, 2004) can be analyzed with a number of measures of meta-level design behaviour. For example, the Problem-Solution (P-S) index can characterize the overall cognitive patterns of a design session as designers move between the problem and solution spaces . It can be calculated for both the design issues and the syntactic processes (links between issues) as a ratio of the codes related to the problem and the codes related to the solution. If a protocol is divided into time segments with equal number of design moves, a P-S index time series graph can be created to identify the dynamic element of design sessions.

Attributes of novices' design processes
PA has been used to answer a wide range of research questions related to student design processes. A considerable number (25 of 45) of the reviewed studies use PA to understand how novice designers behave at a specific stage of the design process -be it problem formulation, idea generation, modeling, and testing, etc. -and to investigate how these processes might be affected under different conditions or interventions.
Some of the research (8 of 45 studies) has focused on the problem formulation stage and the impact of the design task description and complexity on students' design work. For example, Lemons et al. 2010b were interested in learning if students can effectively interpret a design task. Protocols were coded and analysed to follow the path each student took to design their solution, finding that the interpretation of the problem would greatly affect their ability to produce a useful solution. Another study investigates if students are implicitly and explicitly decomposing the problem during the idea generation phase (Liikkanen & Perttula, 2009). Codes were assigned to protocols that identified instances of recognition, implicit decomposition, searching memory, explicit decomposition, analogical inference, evaluation, and output. The study found that students rarely needed to explicitly decompose a problem, as they were able to successfully and implicitly decompose the problem in a previous step. A related factor under investigation has been design problem complexity. For example, Cardella et al. (2002) evaluated students' demonstrated design processes under different problem complexity conditions and found that more complex problems may be related to an increased level of evaluation of solutions generated by the students. Similarly, Williams et al. (2013) used the FBS coding scheme to investigate how small changes (e.g., changing a single word) in the description of the design task affect the design process of student designers.
Following problem formulation, students transition to the concept generation phase of the design process. Moore et al. (2017) aimed to determine which kind of thinking (Type I or Type II) has the greatest influence on idea generation. They hypothesised that Type I processes (those processes associated with fast, impulsive, and intuitive thinking) would produce more novel ideas than those of Type II processes (those processes associated with slow, analytic, and methodical thinking). Using the FBS and two other coding schemes, they associated design actions with one of the two thinking types, and found that it is through a combination of Type I and Type II thinking that designers generate novel ideas, rather than one single process on its own. Gero et al. (2013) compare the design cognition of student designers using three different idea generation techniques (brainstorming, morphological analysis, and TRIZ), which differed in their degree of structuredness. In their study, pairs of students participated in an activity where they were asked to use only one of the techniques to generate solutions. Using the FBS coding scheme, they found that the structuredness of the technique affects the design cognition of student designers, with the more structured techniques (TRIZ) influencing the amount of time students spend reasoning about the problem compared to reasoning about their solution. Pertulla and Liikane (2006) examine a similar research question -does showing examples affect the idea generation phase of students? PA was used to evaluate the connections (i.e., following up on an idea, modifying an idea, or combining ideas) between ideas generated by individuals who were shown four examples of a design prior to the activity, compared to individuals who were not. Their analysis showed no significant differences between the two groups, aside from the number of new categories that were evaluated.
Two of the reviewed studies use PA to investigate the use of analogies in design. Ball et al. (2004) look at the role of spontaneous analogising with professional engineers and graduate engineering students. Their research used PA to identify the portion of time that study participants devoted to using schema-driven and case-driven analogies as they designed an Automated Rent-A-Car facility. In another example, Cheong et al. (2014) assess the use of biological analogies in creating a solution to an open design problem. The students were given a biological analogy and asked to design a solution, using the analogy as the foundation.
Protocol analysis has also been used to study how students use prototyping, modelling, and sketching in their design process. For example, Lemons et al. (2010a) use both content analysis and protocol analysis to identify the benefits of model building on students' design solutions. They find that models help students generate and evaluate their ideas, expose flaws, and act as visual aids for their solution. In their study, students were explicitly encouraged to build models of their solutions (using LEGO blocks) to an open-ended design task. Initial content analysis helped create the coding scheme for the protocol analysis, which was used to identify students' idea generation, metacognitive strategies and other cognitive actions. Cardella et al. (2006) investigate the use of external representations (problem statements, diagrams, equations, etc.) during a design task. In this case, each protocol was coded for instances of looking at a provided diagram, writing, reading their own writing, looking at sketches, or making calculations. Results were used to understand how students were interacting with these important components of designing. Taking an interest in the role of prototyping in student design teams, Bailey and McFarland (2018) use PA to capture student design teams' characteristics -what phase of the design process they were on (using Atman's coding scheme) and the extent of collaboration in the team -as they worked on a design task. They find that student teams typically work together to think of multiple solutions and then split up to develop those solutions further.
As designers are rarely working alone, some research has also examined the collaborative element of the design process in students. Studying senior mechanical engineering students working in groups,  investigate and measure the relationship between collaborative stimulus and cognitive processes. Their coding scheme identifies instances of prompting, seeding, correcting, and clarification of ideas to the group working on the design challenge. They find that participants' past experience influences the process of developing shared designed components, and how those components are explored together in the group. Along a similar line, Cash and Maier (2016) use the FBS coding scheme and additional codes to categorize the kinds of gestures used during a design session. Analysis of recordings and protocols of graduate students designing a remote-controlled camera cradle shows that gestures contribute to a shared understanding of a concept or idea. Gestures were coded as being reflective (towards themselves) or directed (towards others), with both kinds having an influence on how the group generated ideas and eventually made important design decisions.

Comparisons with and within novice designers
Many of the reviewed studies (16 of 45) aim to understand how novice designers compare to each other (based on, for example, discipline or academic level) and to expert (or professional) designers. Using PA, differences have been identified in a variety of studies that highlight important gaps in novices' understanding of design. Based on these differences, educators are able to make changes to curriculum and learning activities to improve novice performance in a way that resembles expert design thinking (Mentzer et al., 2015).
One of the earliest examples of a protocol study using this design is the work by , which investigated if reading a design textbook influenced students' design process. While the study found no differences in the students' solution quality, those who read the design text did spend more time solving the problem, generated more solutions, and transitioned between design stages more than those who did not. Atman and colleagues later published an in-depth case study of four participants of their protocol studies, who participated both as first-year students and later on as seniors (Cardella et al., 2008). The case studies provided rich insight into the effect of an undergraduate engineering education on the students' design processes. In a similar fashion, Williams et al. (2012) use two different design tasks to investigate the differences that might exist in the same set of students before and after they complete an introductory design course. Using the FBS coding scheme to code their protocols, they found significant differences between the performances of students, specifically related to the number of instances of identified functions and developed descriptions of design artefacts.
Some studies (4 of 45) in particular have focused on the design behaviour of first-year engineering students compared to that of high school students, with the goal of assessing the needs of incoming students and the effectiveness of existing design training at both levels. For example, Lammi and Gero (2011) compare the design cognition of mechanical engineering undergraduates to high school seniors and find the latter were able to generate more ideas but did not analyze them further. In contrast, the undergraduate students placed more emphasis on the process of design analysis. Using a previous dataset (from one of Atman's studies), Mentzer et al. (2015) make a similar comparison between both high school seniors and experts, as well as between high school seniors and high school freshmen. The researchers found the total number of references to a variety of design activities and identified several gaps and opportunities for potential improvements in engineering education, including interventions at the secondary education level. More recently, Wells et al. (2016) use PA to show that there is no significant difference in the design cognition of high school students who had formal pre-engineering education compared to those that did not.
Finally, some studies (6 of 45) have directly compared student and expert designers. For example, Song and Becker (2014) compare novices and experts during the problem decomposition phase and find that experts are able to better decompose and recompose the problem. Their study also highlights the need for students to be given the opportunity to work collaboratively in design projects, as first-year students were observed to struggle with this work environment. More recently, Dixon and Bucknor (2018) aim to understand the use of design heuristics used by both novel and expert designers. The protocols were analysed to reveal the types and frequencies of design heuristics that were used during the task.

Overview of review findings
Our review of the literature described PA as an adaptable method that can accurately capture the design processes followed by engineering students. In the reviewed studies, the use of PA has contributed to significant insights for engineering education, including how engineering students approach a design problem (Cardella et al., 2002;Lemons et al., 2010a;Liikkanen & Perttula, 2009;Williams et al., 2013), how they generate ideas and model/prototype design solutions (Bailey & McFarland, 2018;Ball et al., 2004;Cardella et al., 2006;Cheong et al., 2014;Gero et al., 2013;Lemons et al., 2010aLemons et al., , 2010bMoore et al., 2017;Pertulla & Liikane, 2006) and how they work collaboratively in design teams (Cash & Maier, 2016;. The method has also been used to evaluate the effectiveness of specific teaching interventions, design courses, or even entire undergraduate degrees Cardella et al., 2008;Williams et al., 2012). These and other studies have served to compare the design processes of engineering students in different disciplines, as well as with high school students and expert designers (Dixon & Bucknor, 2018;Lammi & Gero, 2011;Mentzer et al., 2015;Song & Becker, 2014;Wells et al., 2016). In particular, Atman and colleagues' data sets have shown that, when collected correctly, a number of diverse perspectives can be used to analyse protocol data, providing both quantitative and cased based accounts of design (Atman, 2019;Cardella et al., 2014), advancing our understanding of students' design cognition.
We note that a considerable amount of research related to novice designers using PA has been done by a small number of networks of collaborators. This is also evidenced by the two dominant approaches to analyzing protocols -Atman's (Atman & Bursic, 1998) and FBS (Gero & Kannengiesser, 2004). As such, many of the reviewed studies are based on similar datasets, problems, and student populations. We thus recognize these datasets, authors, and their findings are over-represented in our sample of papers and consequently in our discussion. This realization leads to a number of questions and future research opportunities. First, what is the overall research methods landscape in engineering design education research? And how prominent is PA in this landscape? Second, and following from these questions, we wonder if there are instances when PA is used but another research method might be more suitable, and vice-versa.
While answering the questions posed above is beyond the scope of this review, we nevertheless can comment on a number of limitations that PA poses for engineering design education researchers, as well as identify some research opportunities to address those limitations.

Limitations in collecting and analyzing verbal protocols
Despite efforts to improve the efficiency of collecting and analyzing verbal protocols, it still remains a lengthy and time consuming process (Lemons et al., 2010a). As such, PA is not an appropriate tool for routine student assessment or for large populations in a single study (Atman & Bursic, 1998). Researchers may opt to limit the number of participants in their study population, thus reducing the studies' statistical power (Coso et al., 2010;Daly et al., 2018;Moore et al., 2014;Patel et al., 2018). In addition, conclusions from the data become less generalizable, requiring further research efforts to validate the claims.
Protocol studies are often hosted in small, quiet rooms away from distractions. Unfortunately, this constraint, which ensures quality audio/video for later transcription, also limits the diversity of study designs used for large group settings. For example, hackathon-like design activities, which expose students to ill-structured and open-ended problems (Rennick et al., 2018), offer a significant opportunity to observe student design behaviour (Hurst et al., 2019); however, their format poses a significant number of challenges to a protocol study, mainly the number of students in conjunction with a loud and hectic team design environment, prohibiting the ability to collect useful verbal recordings.
Because of constraints related to conditions under which verbal protocols can be collected, the design tasks studied with PA may not be representative of those engineering novices commonly engage with. As noted earlier, many of the reviewed studies used problems that were related to mechanical engineering. These problems do not necessarily require participants to use an interdisciplinary approach to problem solving, which is more typical in an authentic setting. Further, setting time limits on activities does not always allow participants to progress through an entire design process (Roberts et al., 2007). Although there is an attempt at making these problems as realistic as possible, there is still a call for future investigations into more complex design situations (Cash & Maier, 2016). Professional and practicing engineers may work on design projects over long periods of time, so design sessions that are limited to a few hours may not be representative of actual design behaviour. Similarly, using an experimental design in which participants are encouraged to use a specific process, for example by providing prototyping material to them (Bailey & McFarland, 2018), automatically influences their design approach. As such, in many of the reviewed studies the interventions used may prohibit the ability to observe natural design behaviour.

Opportunities and future research directions
Researchers can take advantage of technological advancements that can improve the efficiency of collecting, processing, and analyzing verbal protocols. Smartphones and tablets have the ability to record both audio and video files, which can later be sent to researchers and used for analysis. This reduces the need for the research team to facilitate recording sessions and increases the amount of data that can be collected in a single setting. As technology becomes more accessible, there is potential for students themselves to capture their own design process. Further, many transcription services now use AI to turn audio and video files into transcribed text. Using such tools limits the number of hours required to transcribe protocol data manually and significantly speeds up data processing. Finally, advances in natural language processing provide exciting new opportunities for automating (at least in part) the coding of protocols, as shown for example by Nespoli et al. (n.a.).
PA studies in engineering design education research have used a number of common coding schemes, most commonly Atman's (Atman & Bursic, 1998) and FBS (Gero & Kannengiesser, 2004), but there have also been cases when researchers developed new coding schemes or combined and adapted existing ones. Gero (2010) argued for the implementation of common coding schemes (as discussed in section 3), which would allow for comparisons across different research studies. Although Mentzer et al. (2015) did not use the FBS ontology argued for by Gero, their work is an example of how this approach can be used. By utilizing Atman's coding scheme on their own protocol data and a previous data set collected and coded by Atman, they showed that a comparison across different data sets is possible. In future research studies, using this approach will only provide more in depth findings from protocol studies, as we will build on the available research rather than starting anew.
Finally, although the research that has been done using PA has yielded a number of important insights, there is a need for triangulation of those results with other methods (Atman, 2019). Hay et al. (2017) note that one challenge for the field moving forward is to test results of protocol study findings with other methods that are more suited to large population sizes (e.g., interviews, surveys, etc.). Using this approach, we can ensure that we are understanding the processes used by students and practitioners alike in the most comprehensive way possible.

Conclusions
PA is widely used and is generally regarded as one of the best ways to study design processes. In this paper we have used a systematic literature review process to understand the use of PA as a method to understand how novices design. We have outlined the methodological approach of PA studies and provided examples of common coding schemes used and research questions that have been investigated using the method. We have also described the method's strengths and limitations and highlighted opportunities for improving data collection and processing and for applying the method to new learning contexts and environments.
Overall, PA has been shown to be an adequate approach to understanding the cognition of students engaged in design activities. The insights from these studies have been used to improve design engineering education and we are confident that future studies using this method will continue to expand our knowledge in this domain.