Introduction Collaboration and cohesion are critical underpinnings of teamwork. This is especially true for high reliability teams such as long duration (LD) space flight crews that have to perform in extreme environments. Effective teamwork is essential for minimizing errors and supporting team performance, and it is reflective of good psychosocial adaptation to the stresses of LD space missions. Specific Aims. This ground-based research extended prior NASA supported research to address: PRD Risk of Performance Decrements Due to Inadequate Cooperation, Coordination, Communication, and Psychosocial Adaptation within a Team. IRP Gap - Team2: Given the context of long duration missions, what are the best tools to effectively monitor and measure task performance, teamwork, and psychosocial performance? The overall goals of the research were to: (a) collect benchmark data to provide an evidentiary foundation for the dynamics of teamwork during long duration missions in isolated, confined, and extreme (ICE) environments that serve as analogs for space flight and (b) extend validation and development of a wireless sensor technology designed to monitor collaborative teamwork, team functioning, and psycho-social health. Both goals are extensions of previously funded NASA research (NNX09AK47G). With respect to goal (a), although there is substantial research documenting a relationship between team cohesion and team effectiveness, that research foundation is based on cross-sectional, static data; there is remarkably little research that documents the dynamics of collaboration, teamwork, and cohesion over lengthy time periods (Cronin, Weingart, & Todorova, 2011; Kozlowski & Ilgen, 2006). With respect to goal (b), a key component of the system is a monitoring technology—a wearable badge housing a sensor network—that assesses the frequency, duration, distance, and quality of collaboration as team members work together. The research extended the development of the monitoring, measurement, and regulation system by accomplishing three specific aims to: (1) benchmark the dynamic interaction and collaboration patterns of teams in ICE environments using longitudinal diary studies with teams deployed in NASA analog environments, (2) extend validation research of the monitoring technology to capture successively more complex and naturalistic team interaction dynamics, and (3) initiate the development of feedback and data fusion models to display and integrate multi-modal data streams (i.e., collaboration frequency, duration, and distance; physiological arousal) that can be used to diagnose the effectiveness of collaboration and teamwork. Note that in the Task Book version of this report, Figures and Tables that document primary findings are excluded. However, those details of the research are contained in the project final report. LD Benchmarking Data for ICE Teams Overview. The first specific aim for the project was to benchmark the collaboration patterns of teams in an ICE analog environment using experience sampling methodology (ESM; daily diary reports of team functioning). As noted previously, although there is considerable research on the relationship between team cohesion and other “team processes” that reflect the quality of member interactions, most of the data are collected using cross-sectional designs and are therefore static. It is likely that collaboration patterns and interaction quality vary as a team forms, develops, and interacts over lengthy time frames. Long duration teams have to manage work-related problems and occasional social friction (Tekleab, Quigley, & Tesluk, 2009), yet little is known about such dynamics because the majority of team process research has been conducted at just one or two time-points (Casey-Campbell & Martens, 2009; Cronin et al., 2011). Our prior research collected ESM data from two science team missions during their 2010-2011 and 2011-2012 deployments to the Antarctic (6 weeks camped on the ice) (Kozlowski, Biswas, & Chang, 2013). Those preliminary data indicated that patterns of cohesion varied over time across teams, and that different triggering events were associated with either positive or negative changes in team members’ interaction reports. Those findings indicated that team cohesion (and other “team process”) perceptions were sensitive indicators of variation in the quality of team functioning. Our goal for this project was to continue the research effort to collect individual differences (i.e., personality factors), team process, and team outcome data from ICE analog teams to better understand the factors that influence the collaboration patterns in these teams, and how changes in collaboration relate to the team performance. Benchmarking data of team interaction dynamics are important for calibrating the interpretation of data streams that will be captured by the monitoring technology (Specific Aim 2) and feedback (Specific Aim 3) systems. Thus, we continued the ESM data collections with Antarctic science teams during their 2012-2013 deployment. Methods and findings from this research are reported below. In addition, we submitted a proposal to the Australian Antarctic Division (AAD) for parallel research to be conducted at Mawson, Davis, and Casey AAD Stations for winter over missions. AAD station teams offered the potential for long duration missions (Austral winter) and a larger sample of individuals and teams. That research was approved by AAD. We initiated data collection under this grant and our collaboration with AAD is ongoing. Findings from the AAD LD ICE research will be reported under a superseding NASA grant. Methods. Eight people took part in the 2013-2014 Antarctic mission (six members and two leaders). Prior to deployment, the scientific team members completed a pre-expedition survey that included items assessing their background and individual differences (e.g., personality, teamwork skills). During the six-week mission when the team deployed to the Antarctic ice, members completed daily diary surveys that asked them to reflect on their feelings and thoughts with regard to their team and personal experiences. Finally, members completed a post-expedition survey to evaluate their overall experience after they returned from the mission. A full report of these data has been provided to the Antarctic science team leaders. In the findings reported below, we concentrate on the pattern of daily cohesion ratings reported by team members across their deployment to the ice fields, as these data are most relevant to NASA. Findings. Cohesion was relatively high and stable throughout the course of the mission. There was some significant variability in cohesion over the first few days, although it quickly leveled out. The relatively high and stable pattern of cohesion (after the first week) is particularly interesting given reports in the open-ended comments of multiple conflicts occurring among teammates. This suggests that cohesion may provide a “buffer” such that teammates remain committed to one another and to their mission even when interpersonal issues arise. We are currently investigating the potential causes of this “buffer” effect through quantitative and qualitative analyses. The team cohesion pattern of early volatility followed by strength and stability was a somewhat atypical pattern relative to our prior data collections, although it is consistent with theoretical frameworks in the literature. By comparing the personal characteristics, expectations, outcomes, and daily experiences of this team with those of prior Antarctic science teams for whom we have collected data, we will be able to learn more about why certain teams experience strong, stable cohesion and others do not. This work is ongoing as we need to construct a large enough data base to determine which types of team cohesion profiles are generalizable and which are unique to a particular team and experience. Since we collect data for one team (or two at most) per season with this group, the process of constructing a normative data profile takes time. Concurrent correlations between cohesion and the other factors assessed by the ESM ratings were examined. On days when team members reported stronger (overall) cohesion, they also tended to report greater physical and mental workload, fairer workload distribution and more helping among teammates, better adaptability and coordination, more liking of one another, stronger performance, less social conflict, and more active and positive (happy) affect. Understanding these correlates of cohesion helps to identify potential areas for improvement. In addition to these bivariate analyses, as the data base develops to include more teams we will examine lagged correlations to determine which daily diary variables predict and /or result from team cohesion. Team Interaction Technology Validation Overview. The second specific aim was to extend the validation research of the monitoring technology. Validation is a process (not a single evaluation event) that is coupled with development of the technology, measurement calibration, and multi-stream data interpretation. We engaged in a validation process that was designed to subject the technology system to increasingly more complex patterns of team interaction dynamics, with evaluation focused on the effectiveness of the monitoring technology to capture critical aspects of team interaction and collaboration. Prior research conducted a rigorous laboratory validation of the monitoring technology and its ability to assess collaboration among team members in a highly structured setting (i.e., interactions were largely prescribed by the research design). This provided the foundation for validation evidence by demonstrating that the monitoring technology could reliably capture prescribed patterns of team interaction. The goal of this phase of validation reported here was to evaluate the ability of the technology to capture more complex patterns of interaction. This was accomplished using rigorous laboratory research to systematically manipulate factors designed to influence team interaction patterns by selectively stressing team members and then evaluating the ability of the technology to capture shifts in interaction dynamics. Methods. Initial validation evidence demonstrated that the badge technology was a highly reliable and accurate instrument for capturing team member interaction characteristics. Moreover, a time pressure manipulation designed to differentially induce stress on interactions between teams (i.e., experimental vs. control) was effectively detected by badge indicators (Kozlowski et al., 2013). This next phase of validation was designed to differentially induce stress on interactions both within teams (i.e., the interactions between certain team members were selectively stressed) and between teams (i.e., the number of team members whose interactions were stressed was differential across conditions). The team interaction task employed was one that we had used in the prior validation efforts. It was an adaptation of the NASA Space Flight Resource Management task “Moon Base” to serve as the simulation for collaborative interaction among members. To provide an appropriate research platform, the simulation was redesigned to provide a task context that necessitated frequent structured interactions to facilitate the validation efforts. In essence, team members were required to interact with other members to exchange “resource” tokens to acquire a specified pattern across members. Each interaction had a time limit to make the exchange challenging (i.e., 25 seconds). The goal of the task was for team members to accomplish the necessary team-level resource redistribution within a designated time limit (i.e., 20 minutes) so they could successfully return from the mission. In the event that team members did not successfully redistribute resources within the time limit, the mission failed. The task was designed to be difficult to accomplish within the allotted time window. If the team was unable to fully redistribute resources to accomplish “launch,” the team failed. The redesigned simulation was designated “Mars Base.” The validation experiment included three between team conditions that differentially stressed team members. We used a short cognitive test that was delivered prior to selected resource exchanges to induce stress. The designated team member had to count backward by 7 from a random starting number (e.g., 312) 12 times within 25 seconds to pass the test and continue with the resource exchange. A test failure precluded a successful resource exchange for the dyad, required the team members to repeat the exchange, and thus added additional time pressure, hence stress, to accomplishing the team goal. Approximately 78% of tested interactions failed to achieve a resource exchange. Condition 1 tested a single team member 3 times with each of the other 2 members. Condition 2 tested two team members 3 times each with the other team member. Condition 3 tested three team members 2 times with each of the other 2 members. Thus, the total number of tests per condition was constant at 6, but the pattern of tests for dyads was unique to each condition. The experiment involved 133 teams of three team members each (n = 399) across the 3 conditions. Data Structure. The experimental design was complex, with nested levels of data. Note that teams are nested under condition, but that team and condition carry redundant information (i.e., the condition manipulation directly causes patterns in the dependent variables at the team level). Reported results focus on key validation relationships. • Level 3 – Condition o 1 person stressed with the cognitive test (2 people not) * Alpha stressed 3 times when approaching Bravo, 3 times when approaching Charlie o 2 people stressed with the cognitive test (1 person not) * Bravo and Charlie stressed 3 times each when approaching Alpha o 3 people stressed with the cognitive test (0 not) * Each person was stressed 2 times randomly during experiment • Level 3 – Team Dependent Variables o Time to task completion (where all resources distributed) o How many resources were exchanged o How many interactions occurred o By what interaction did the team “give up” • Level 2 – Person Dependent Variables o Baseline Heart Rate (BHR; Mean) BHR Variability (BHR SD) o Control variables (e.g., GPA, Gender, Age) • Level 1 - Interaction Sequence (Time) Dependent Variables o Affect (Positive Affectivity [PA] and Negative Affectivity [NA]) * High PA (Very happy, excited, enthusiastic, interested) • Positive valence, high activation * Low PA (Very sad, low spirited, dissatisfied) • Negative valence, low activation * High NA (Very nervous, tense, anxious, stressed) • Negative valence, high activation * Low NA (Very relaxed, calm, composed, comfortable) • Positive valence, low activation o Heart Rate (HR Mean) HR Variability (HRV) o Cognitive test (pattern based on condition) o Interaction initiation o Resource exchange success o Interacting Dyad (Alpha/Bravo, Alpha/Charlie, Bravo/Charlie) Findings. Intensive longitudinal data create complex structures that entail layers of nesting or dependencies among the observations. To account for the dependency, data were analyzed using multilevel random coefficient modeling (MRCM) implemented in R. Given the nested data structure and the intensive repeated measures design, an initial null model was run to determine the pattern of variance decomposition. Results indicated that there was significant and substantial within and between person variance, thus necessitating MRCM analyses. Specifically, differences between individuals accounted for: • PA - 28% of the variance, F(391) = 9.992, p < .001; • NA - 41% of the variance, F(398) = 17.57, p < .001; • HR - 67% of the variance, F(384) = 44.61, p < .001; • HRV - 46% of the variance, F(384) = 16.77, p < .001. The first set of MRCM analyses examined the extent to which the patterns of cognitive testing, which was the manipulation that varied across the three conditions, had effects on participants’ affect reports and physiological indicators. The results indicated that cognitive testing was effective in causing participant stress. Specifically, it had a positive effect on reports of NA (negative valence, high activation; “very nervous, tense, anxious, stressed;” t = 13.13, df = 8177, p < .001) and a negative effect on PA (low PA - negative valence, low activation; t = -22.05, df = 8013, p < .001) such that when a test was given individuals reported being more “sad, low, spirited, dissatisfied.” Thus, the cognitive testing had the desired effects of inducing greater NA and lower PA such that individuals’ psychological reactions evidenced stress. In addition, cognitive testing also had a positive effect on HR SD (t = 4.43, df = 7744, p < .001) as a physiological indicator of stress. The relationship with HR Mean was not significant. The next set of MRCM analyses focused on the relationship between HR Mean and HR SD, and affect – PA and NA. This was an important step in badge validation in that it is desirable to use the physiological indicators to infer positive and negative psychological states. For this set of models, repeated observations were nested within the individual, but not the team as this was confounded with the condition. An autoregressive component to account for repeated measures was incorporated in the model. Results show that HR Mean was significantly and positively associated with both positive and negative affect, while HR SD was positively associated with negative affect (although this relationship was not significant). These results suggest that higher heart rates are associated with increased arousal and happiness. It is unclear whether this is due to the complex relationship between PA and NA observed in this study (i.e., individuals varied considerably in how similar they rated PA and NA). As individuals engaged in more interactions (IntNum), they had significantly lower negative affect (i.e., they became less “stressed, tense, anxious” over time), but they did not significantly decrease their positive affect. This suggests that they were more consistent in the sadness/happiness level, but the anxiety they felt initially declined over time which is consistent with a practice effect. Follow up analyses investigating the effect of the three conditions on the three team members indicated that when Alpha was in the condition where they were the only one tested, HR SD was higher compared to those in the other conditions (t = 2.01, df = 128, p < .05). One would expect that being the only person tested on the team, frequently failing, and thus putting the team goal in jeopardy to be more stressful than the other conditions. Thus, this finding is consistent with expectations. Follow up analyses on the other positions (Bravo and Charlie) did not yield a consistent interpretation. This is likely because, compared to Alpha, the Bravo and Charlie positions did not entail the same degree of isolation (being the only team member stressed) and extremity (enduring all 6 cognitive tests). In sum, there is promising evidence that the physiological indicators have the potential to provide information relevant to inferring team member psychological states. Finally, that last set of analyses examined the relationship between the psychological states and physiological indicators and team effectiveness. Those results indicated that experience (IntNum) was positively associated with resource exchange (t = 23.661, df = 7965, p < .001), meaning that individuals were more effective at completing the task within the time window as they gained experience. This is an expected learning or practice effect. Performance was, however, negatively impacted by the cognitive presence of a cognitive test (t = -61.500, df = 7965, p < .001). In addition, higher PA (i.e., happy, excited, enthusiastic, interested) was associated with higher performance (t = 9.996, df = 7965, p < .001), whereas higher NA (i.e., nervous, tense, anxious, stressed) was associated with lower performance (t = -4.112, df = 7965, p < .001). Thus, the psychological states evidenced the expected association with task performance. There was an interaction between PA and NA such that as NA increased, the relationship between PA and performance was enhanced (t = 2.800, df = 7965, p < .01). This suggests that when NA was low (i.e., relaxed, calm, composed, comfortable), PA did not have as strong of a relationship with performance; however, when NA was high (i.e., nervous, tense, anxious, stressed), PA had a stronger positive relationship with performance. In other words, PA evidenced a buffering effect on performance in the presence of high NA. This is an important finding because it suggests that countermeasures designed to boost PA could have a long term buffering effect on NA – performance relationships. With regard to the interaction between the cognitive test and the affect variables, cognitive test did not interact with NA (t = -.324, df = 7963, p = .745). However, cognitive test did interact with PA (t = 5.631, df = 7963, p < .001), indicating that when a cognitive test was given, the relationship between PA and performance was stronger. This indicates that individuals with higher PA (i.e., happy, excited, interested) were more likely to obtain a successful resource exchange, whereas those lower in PA (i.e., sad, low spirited, dissatisfied) were less likely to succeed. Again, this provides important evidence for the potential for PA to be a key countermeasure for teams under duress. Team Interaction Dashboard This project has been developing a team interaction monitoring technology – a wireless “badge” – that is designed to capture dynamic team collaboration patterns and to infer the status of team psycho-social health. At the start of this phase of work, we had demonstrated proof of concept in prior laboratory research. To accomplish the third specific aim, we initiated the development of a feedback “dashboard” to display multiple real-time data streams, including collaboration patterns, as well as team members’ physiological arousal information (e.g., heart rate, voice intensity) that are collected a sensor package. The sensor package is roughly the size of a smart phone, although the sensors could be sewn directly into clothing. The sensors monitor the intensity of physical movement, vocal intensity, heart rate, and face-time (interaction distance) with teammates who are also wearing sensors. Over time, one can identify the sequence, frequency, duration, and degree of arousal associated with patterns of interactions among team members. The dashboard provides an interface between potential users (e.g., team leader, team members) and the multimodal data streams collected by the badge technology. This dashboard allows the user to monitor six different types of data collected by the monitoring badge at different timeframes, including (a) physical distance between badges, measured in centimeters and sampled once every 200 * N ms, where N is the number of badges (including the base station); (b) heart rate, measured in beats per minute and sampled once every 50 ms; (c) acceleration, as the combined accelerometer reading (both X- and Y-axis) of the badge, measured in mg per second and sampled once every 100 ms; (d) acoustic intensity, measured in ADC counts and sampled once every 100 ms; (e) light, measured in ADC counts and sampled once every 50ms; and (f) temperature, measured in Fahrenheit and sampled once every 50 ms. In addition to displaying multi-modal feedback, the long term goal is to develop a data fusion model to provide an integrated psychological interpretation of the multi-modal data stream. This will provide an integrated assessment of the frequency and quality of team collaboration, and derive meaning from the multi-modal data for determining the effectiveness of team interaction. Although full development of the fusion model was beyond the scope of this short project, we initiated identification of statistical / data analytic models needed to accomplish fusion. There are a variety of candidate analyses that can be applied to the type of intensive, multimodal, longitudinal time series data collected by the badges. One promising candidate examines the phenomena as a network of relationship behaviors (e.g., Borsboom & Cramer, 2013). Another is the vector autoregressive model for handling multivariate time series data (DeShon, 2012). Besides description and forecasting, this approach is also useful for examining the causal impacts of “shocks” to specific variables in the time series and the effects of those shocks on the other variables in the system. Relational events analysis (Butts, 2008) is “a highly flexible framework for modeling actions within social settings which permits likelihood-based inference for behavioral mechanisms with complex dependence…. [and which can examine] base activity levels, recency, persistence, preferential attachment, transitive / cyclic interaction, and participation shifts with the relational event framework” (p. 155). Moreover, such data are amenable to a range of machine learning and data mining techniques as a way to identify non-linear patterns arising from the multimodal data that are predictive of effective team functioning. When fully developed, the integrated data fusion model will monitor team collaboration (real time and post-processing) to evaluate the status of team processes and to provide developmental feedback for team members concerning their interaction styles. Research Impact and Conclusion Team cohesion is not just a critical factor for astronaut teams and ground crews; cohesion is important to the effectiveness of all teams and especially those that operate in critical, high reliability settings. Of the many team process factors that support team effectiveness, team cohesion is the most studied with over a half century of research. Yet, remarkably, very little is known about the characteristics that promote its development and maintenance. For example, we know that experience working together is associated with cohesion formation and maintenance, but what are the mechanisms? Teams that do not cohere replace problematic members or disintegrate so experience only reveals those teams that survive, but that does not tell us why or how. This research stream, which is uncovering the dynamics of collaboration, cohesion, and effective team functioning and creating technologies to monitor team cohesion and guide interventions to restore it, has the potential for wide utility in aviation, military, medical, industrial, and other environments where society depends on the effective performance of high reliability teams. Report References Borsboom, D., & Cramer, A. O. J. (2013). Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9, 91-121. Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38(1), 155-200. Casey-Campbell, M. & Martens, M. L. (2009). Sticking it all together: A review of the group cohesion-performance literature, International Journal of Management Reviews, 11, 223-246. Cronin, M. A., Weingart, L. R., & Todorova, G. (2011). Dynamics in groups: Are we there yet? The Academy of Management Annals, 5, 571-612. DeShon, R. P. (2012). Multivariate dynamics in organizational science. In S. W. J. Kozlowski (Ed.), The Oxford handbook of organizational psychology (Vol. 1, pp. 117-142). New York, NY: Oxford University Press. Kozlowski, S. W. J. (in press). Advancing research on team process dynamics: Theoretical, methodological, and measurement considerations. Organizational Psychology Review. Kozlowski, S. W. J., Biswas, S., & Chang, C.-H. (2013). Developing, maintaining, and restoring team cohesion. Final Report, National Aeronautics and Space Administration (NNX09AK47G). Houston, TX. Kozlowski, S. W. J., & Ilgen, D. R. (2006). Enhancing the effectiveness of work groups and teams (Monograph). Psychological Science in the Public Interest, 7, 77-124. Tekleab A.G., Quigley N.R., & Tesluk P.E. (2009). A longitudinal study of team conflict, conflict management, cohesion, and team effectiveness. Group & Organization Management, 34, 170-205.