Research Foci, Findings, and Conclusions
1. NASA Analogs Longitudinal Data Collections:
The primary focus of the analog data collection was to cultivate access to analog teams (i.e., teams operating in isolated, confined, and extreme environment for long duration) to conduct benchmarking research to examine dynamic relationships between individual differences, team collaboration, cohesion, and conflict, and team effectiveness.
2. Technology Development and Prototyping:
The primary focus of the monitoring technology research stream was to develop hardware and software to enable wireless, unobtrusive monitoring of team member collaboration and interaction during team task performance.
3. Validation of the Sensor Technology and Metrics:
The primary focus of the validation research stream was to first demonstrate the value of behavioral metrics as a means to assess team cohesion and collaboration. This step allowed us to establish the potential of the behavioral monitoring technology as an assessment platform. Subsequent work used team simulation as a research platform to establish the reliability, validity, and utility of the monitoring technology (i.e., wearable badge) as a team interaction measurement system.
NASA Analogs Longitudinal Data Collections
Overview. The goal of the NASA analog data collection effort is to benchmark team functioning in long duration missions conducted in isolated, confined, and extreme (ICE) environments. As noted at the beginning of this report, there is a substantial amount of research examining the relationship between team cohesion and effectiveness, but the vast majority of the research is based on static data (i.e., cross-sectional research designs). Although such data are useful, they do not allow inferences regarding the dynamics of team cohesion and team effectiveness. Thus, as part of our effort to develop protocols for effective team functioning, it is necessary to establish benchmarks regarding team cohesion fluctuations (i.e., variation in mean levels, degree of member consensus), potential cohesion cycles, and the susceptibility of cohesion to shocks due to events internal and external to the team. Such data are necessary for the development of protocols to determine whether a team is functioning within normative parameters or is exhibiting anomalous patterns indicating the need for countermeasure activation.
As noted in progress reports, the initial effort -- spearheaded by a subset of the original research team (Hough, Schmitt, & Locke) -- to gain access to scientific teams operating in a NASA analog environments were met with challenges. That initial effort was ultimately unsuccessful.
Antarctic Science Search Teams. Responsibility for developing this line of analog was shifted back to the primary research team under the direction of the Principal Investigator. Through subsequent efforts to explore potential analog sites and facilitation provided by our NASA element scientist, we gained access to scientific teams who operated in Antarctica during the summer to search, collect, and catalog samples. Starting with the 2010 season, we have been collecting data every season through this analog site. The general study design is that prior to deployment, the scientific team members complete a pre-expedition survey that included items assessing their background and individual differences traits (e.g., personality). During the six-week mission, members complete daily diary surveys that asked them to reflect on their feelings and thoughts with regard to their team and personal experiences. Finally, members complete a post-expedition survey to evaluate their overall experience after they returned from the mission.
The 2010-2011 season of data collection included two teams that covered different areas for searching and collecting samples. The blue team consisted of 4 members and the red team consisted of 4 members. Cohesion varied considerably over the duration of the mission for both teams. In addition, there were substantial differences in cohesion variability across the two teams. Moreover, there was considerably more change in cohesion over time in the red team versus the blue team. Specifically, while the blue team’s cohesion stayed relatively consistent, the systematic team’s cohesion was more volatile throughout the duration of the six-week mission. We coded specific incidents that may be triggers of cohesion variability for the two teams based on team members’ open-ended responses on their daily diary. Overall, positive triggers like engaging in fun activities together and being productive as a team; and negative triggers like stress / fatigue, bad weather conditions, and uneven distribution of work among team members may be key factors influencing team cohesion over time. Those factors were associated with peaks and valleys of team cohesion.
We were fortunate to have the opportunity to collect another round of data from this source in 2011-2012. Another team of scientists participated in a 48-day mission camped on the ice in Antarctica. We gathered similar data as we did last year in an effort to extend our benchmarking results. That is, using pre-expedition and post-expedition surveys, as well as daily diaries for team members and leaders, we assessed a number of key variables including: Individual differences (i.e., demographics and educational experience), daily thoughts and reactions regarding taskwork and teamwork, and member satisfaction and performance. As in the previous data collection effort, we used qualitative and quantitative measures in order to gain the most detail possible.
Consistent with our findings in 2010-2011, this team’s cohesion also varied substantially over time. While cohesion ratings tended to be favorable overall, the pattern became more stable toward the latter half of the season. In addition, members tended to disagree about how cohesive the team was nearer the beginning of the mission, whereas toward the middle of the mission, they started consistently converging that the team was cohesive. In order to examine the underlying causes of this pattern of cohesion, we read through team members' open-ended diary responses to identify its potential influencers. One important influencer of cohesion, particularly during the first half of the mission, was the boredom and frustration experienced as a result of being confined to tents for long stretches of time due to weather. This prevented team members from completing their sample collection mission and from working together toward common goals. Many times, it seemed this resulted in sub-teams working on separate tasks or individual team members engaging in activities on their own. This may be one reason why members provided relatively weaker ratings of cohesion – and disagreed about how cohesive their team was – during the first half of the mission.
To further investigate the variables related to the team's cohesion pattern over time, we calculated cross-correlations between average daily cohesion (among team members) and each of the other teamwork constructs we assessed in team members’ daily diary surveys. Team members tended to rate the team as more cohesive on days when they also rated more physical workload (i.e., there was more mission-related work to do), a more even distribution of work among members, more liking of team mates, greater team coordination, more helping of team mates, and better team performance, as well as stronger happy affect. Overall, our benchmarking research demonstrates that teams are unique ecosystems that experience variation in psycho-social functioning over time.
Station-Based Winter-Over Teams. Finally, in addition to continuing to collect data from the scientific teams performing sample collection missions during the summer (2012-2013 data were collected for a related NASA grant and 2013-2014 data collection is in progress), we have also explored other analog sites for benchmarking team cohesion among members operating in ICE settings. As part of our ongoing effort to explore other analog sites, we submitted a proposal to the Australian Antarctic Division (AAD) to collect data on long duration team functioning. That proposal was approved and we collected data during the 2012-1013 winter-over deployment from winter-over teams at Mawson, Davis, and Casey Stations. This ongoing research assesses daily teamwork processes over a period of nine months to a year (i.e., approximately 270 to 360 measurement periods). Those data are just now (post-award) coming to conclusion and another data collection for the 2013-2014 winter-over deployment is being initiated. Results of this effort will be reported at the conclusion of data collection in related NASA grant final reports.
Technology Development and Prototyping
Wireless Sensor Monitoring Technology Development. The goal of the monitoring technology development effort was to build hardware and software to enable unobtrusive monitoring of team member collaboration and interaction during team task performance (e.g., Biswas & Quwaider, 2008). We have developed a prototype wearable sensor system using multi-modal data to capture teamwork interaction and process dynamics. The sensor system consists of two primary components, namely a wearable sensor array and a laptop computer / receiver that records the data streams. The wearable sensor is a small 6 cm x 10 cm x 1.5 cm package, weighing approximately 25 grams that contains a sensor subsystem, processor, and radio subsystem.
The sensor package can be clipped on, worn as a badge, or sewn directly into clothing. Once activated, the technology monitors its wearer’s intensity of physical movement, verbal activity level, heart rate, and face-time (distance between persons facing one another) with other sensor-wearing individuals. Data are sent to a nearby (within 50 meters) laptop computer server using a 400MHz wireless link between individual sensor packages and an access point that is connected to the computer via USB link. Team interaction analytics can be performed on the laptop either in real-time or via post-processing.
Several modalities are available in the monitoring badge for sensing dynamic team member interaction. For example, face time detects whether a sensor-equipped team member is facing another sensor-equipped team member at any given time and measures the distance between them. Dyadic face time between any number of individuals comprising a team, or a larger social entity, can be simultaneously detected and reported to the access point. Face time detection currently works up to a distance of 7 meters, although most collaborative interactions are likely to be much closer. Obviously, distance per se is not sufficient to characterize the nature of an interaction and, therefore, this one data stream has to be filtered (e.g., with regard to distance and interval) and combined with other data (e.g., movement, vocalization, arousal) to produce meaningful information. Physical movement captures the level of physical motion of a sensor-equipped individual. These data are collected from a dual-axis accelerometer integrated within the sensor subsystem. The sensor measures acceleration in the range of -2g to +2g along each of the two axes. Vocal activity captures an individual’s vocalizations, which are parameterized as the duration, interval, and intensity of the acoustic signal recorded by a tiny microphone attached to the sensor package. Note that vocal activity does not include speech and word recognition, which are too computationally heavy for the wearable sensors used presently. A number of correlational algorithms are implemented for distinguishing the vocal activity of a sensor-equipped individual from acoustic energy created by other sources. Finally, the sensor collects physiological data via a heart rate monitor which captures beat frequency and sends it to a computer server in real time via the wearable sensor package. Heart rate information is sent to a logger unit in the sensor node via a 5.4KHz radio link. The processor in the sensor node performs a certain amount of pre-processing to the data before sending it to the computer server through the 400MHz radio link.
Prototyping Face Time Interaction Sensing. Having developed a working badge prototype, the first step of research evaluation was to determine whether the sensor modalities functioned such that basic patterns of team member interaction could be inferred from the sensor data streams. To accomplish this initial prototyping, we developed a highly structured sequence of dyadic interactions for a three person team. The purpose of the tight structure was to enable relatively straight forward evaluation of the extent to which the sensor data corresponded with the structured sequence of dyadic interactions. Thus, prior to the phased validation (see next section), the sensor data modalities were first evaluated by a team of graduate students from Psychology and Engineering Departments at Michigan State University. Three members took turns to interact closely with each other following a predetermined script during a 30-minute session.
Results show that team members A and B engaged in several minutes of proximal collaboration as they worked to solve the problem, whereas A and C exhibited relatively little collaboration. In addition, members B and C had about the same total collaboration time, but only a small cluster of it was proximal and most of it was more distal. The face time data generated by the monitoring devices matched with the scripted interaction pattern, thereby providing initial prototyping evidence for the accuracy for this modality of sensor data.
In addition to the development of the basic sensor modalities that capture face time interactions, sensor extensions have focused on developing audio processing algorithms for speaker detection. The key challenge of developing such algorithms is to be able to perform experimental acoustic diarization for social conversation monitoring in the presence of low acoustic data sampling rates, noise, and variable inter-personal distances. The key idea is to collect acoustic sensor data from strategically placed wireless sensors over conversing human subjects’ bodies, and to collate and process that data using an Acoustic Comparator based Diarization (ACD) algorithm in order to identify the speaker in a conversation at any given time. Through experiments with body mounted sensors, it is demonstrated that ACD algorithms can be used for effective speaker identification with high precision. Controlled experiments using human subjects were carried out for evaluating the performance of the developed ACD algorithms.
We have conducted additional validation studies to evaluate the algorithms by varying ambient noise level and distance between participants. The overall algorithmic accuracies for different inter-personal distance and ambient noise conditions showed that for up to 2 meter distance and with ambient noise, the ACD approach can achieve accuracy of around 84% and above.
Validation of the Sensor Technology and Behavioral Metrics
Prototyping Behavioral Metrics. Our initial research focused on developing and validating behavioral metrics for assessing team cohesion and identification, and exploring individual differences as predictors for team performance. Two specific studies were conducted. Study 1 examined familiar and unfamiliar teams who worked on a complex team simulation twice a week across a five week period. We assessed members’ individual differences at the beginning of the five-week period, and measured self-reported team cohesion and identification and performance twice weekly from members. Complete data from 19 three-member teams were collected. We tested relationships between members’ individual differences and indicators of team functioning, and examined the fluctuations in team identification over time and its relationship to team performance. Results showed that members’ personality (i.e., extraversion, agreeableness) and team efficacy (i.e., collective task confidence) had significant, positive relationships with members’ identification with their team and team cohesion. Moreover, results showed that team identity was far less consensual and fluctuated over time for teams composed of members who were unfamiliar with each other at the beginning of the five-week session.
Finally, teams with high team identity performed better than teams with low identity (Cohen’s d ranged from -.09 to 1.11 across the 10 data collection days), and performance varied more for teams whose members reported low versus high identification with their teams.
These findings accentuate the importance of examining the dynamics of markers of team functioning, such as cohesion and member identification with the team. In addition, fluctuations in indicators of team processes had important implications for team effectiveness.
Study 2 was based on a team task simulation—MARSim (Mars Acquisition of Resources Simulation). The simulation was designed to emulate a 3-person team with distinct role responsibilities (i.e., life-science, geology, and geography); the team had to navigate a rover to collect resources/specimens relevant to their distinct role. In addition, the team had to compile a collection of specimens to a target proportion within their allotted time. To accomplish their mission, they had to work collaboratively to determine a collection strategy and navigate a course, while balancing each individual responsibility to meet their overall team objective.
We used MARSim as an experimental platform to develop, prototype, and validate behavioral metrics as indicators of team cohesion. Specifically, we assessed behavioral cohesion or collaboration effectiveness indicators through task-based measures (e.g., coherent responses, response latencies) as well as with self-report measures at the individual and team level. Complete data were collected from 42 three-person teams who performed the MARSim task. Results indicated that team cohesion (self-reports) accounted for 5% of team performance variance above and beyond experience (trial accounted for 27% of the variance). Importantly, the behavioral indicators of cohesion accounted for an additional 17% of performance variance beyond experience and self-reported cohesion. The incremental variance was explained in team performance by member experience (i.e., trial), self-reported cohesion, and behavioral metrics of team cohesion. These findings provide evidence for the viability of assessing team cohesion / collaboration behaviorally.
Sensor Technology: Phased Validation. The research findings substantiated the dynamic nature and value of behavioral indicators of team cohesion and interaction. In addition, the findings from our initial prototyping of the sensor technology was promising, indicated that the senor array could effectively capture dyadic interaction patterns. Thus, our subsequent research efforts shifted to a phased validation effort of the wireless sensor data streams. The basic logic of this effort is to use simulation tasks that allow us to structure team member interactions (to facilitate validation). Then, as we accumulate support for the validity of the sensor data, to incrementally relax task structure constraining interactions and to introduce manipulations designed to perturb interactions to further validate the ability of the sensor data to capture interaction quality.
We adapted the NASA Space Flight Resource Management (SFRM) task “Moon Base” to serve as the simulation for collaborative interaction among members. To provide an appropriate research platform, Moon Base was redesigned to provide a task context that necessitated highly structured interactions to facilitate the validation efforts. In other words, making team member interactions highly structured enabled us to better align sensor data and video to establish convergence of the sensor and video data streams. The redesigned simulation was designated “Mars Base.”
Phase 1. The first phase of the validation effort was to assess the reliability and consistency of the interaction data streamed by the wireless monitoring technology. To this end, we collected interaction sensor data and video recorded 46 three-person teams engaging in the Mars Base simulation. We then coded the video data from each simulation session to identify the initiation and disengagement of specific pairs of team members who engaged in face-to-face interaction, and the duration of the interaction. Raters who were independent of the research team were trained and then coded the video. The coded video data were then compared to the sensor data collected by the monitoring technology. Based on the distance and acceleration data collected by the badges, face-to-face interactions were identified.
Overall, 1841 interactions were identified by both the monitoring devices and the video coding, indicating a 96.1% agreement between the two independent methods for identifying face-to-face interaction episodes during the Mars Base simulation. The average length of the interactions was 19.88 seconds (SD = 8.64) based on the monitoring device and 17.46 seconds (SD = 7.98) based on coding. The average difference in interaction length was 2.7 seconds (SD = 3.44). This is understandable given that the video system recorded time only in full seconds. As such, coders had to use their judgment about which second was the best estimate of when the interaction began. This could result in a 2-3 second differences in the true interaction length. Careful inspection of discrepancies between the video codes and sensor data indicated that most departures were due to human error related to the video coding.
Finally, the monitoring devices correctly identified 1839 specific pairs of members (out of the total 1841 interactions) who were involved during the face-to-face interactions, resulting in 99.8% accuracy. Based on the acceleration data, the monitoring badges identified the person who initiated the interaction correctly at 99% (1823 out of 1841 interactions). The monitoring badges also identified the individual who disengaged from the interaction correctly at 97% (1795 out of 1841 interactions). Taken together, this first phase validation study demonstrates that the monitoring sensors and algorithms are well calibrated and accurate in identifying face time collaborations between dyads.
Phase 2. The second phase of laboratory research incremented the technology validation effort with respect to affective responses. The study design included a “stressor” manipulation, such that team members were under time pressure to complete the task. We also relaxed the interaction structure constraints so that participants had more control over the sequence of their interactions during the task. Three-person teams were randomly assigned to the experimental (i.e., time pressure to complete the task) versus control group. In addition to the monitoring data, the pair of members who were involved in specific face time collaborations recorded their affective states immediately after each interaction. We collected data from 28 teams (15 in the experimental condition and 13 in the control condition), which yielded a total of 978 interactions (587 in the experimental condition and 391 in the control condition).
Results indicated that there was a significant difference in the affective states reported by participants in the experimental versus control condition (X2(4) = 97.14, p < .001; f = .22). Specifically, participants in the experimental condition were more likely to select anxious and less likely to select bored after their interactions compared to those in the control condition. These self-reported affective data provided support that our stress manipulation was effective.
Heart rate data recorded by the monitoring devices indicated that participants in the experimental condition showed significantly higher maximum heart rates (F(1, 1843) = 4.35, p < .05), higher minimum heart rates (F(1, 1843) = 26.37, p < .001), and average heart rates (F(1, 1843) = 10.25, p < .001). Taken together, these physiological data suggested that consistent with the self-reported affective states, participants who received the stress manipulation experienced higher levels of arousal based on the monitoring data.
Overall, our phased validation effort supported that the monitoring devices and algorithms are well-calibrated to capture face-time collaboration between team members. In addition, the devices can correctly identify the arousal level of participants during the interaction. Our results provide evidence that the monitoring devices can be used as an unobtrusive measure to capture the team collaboration and interaction frequency correctly, and point to the possibility that they can identify the quality of the collaborative episodes.
Team Cohesion is Dynamic. A substantial amount of cross-sectional (i.e., static) research has shown a strong relationship between team cohesion and team performance. Although team cohesion is generally assumed to be relatively stable, little is known about its potential dynamics. Across different types of benchmarking data, including simulation teams (across hours of performance) and analog teams (6 weeks deployed in an ICE environment), we observed considerable variation in team cohesion and team collaboration effectiveness. This convergent evidence across very different teams, tasks, and timeframes makes it evident that the assumed stable relationship between team cohesion and team effectiveness needs to be examined closely and unpacked. This research is developing metrics, tools, and techniques to do just that.
Team Interaction Can Be Assessed Unobtrusively. We have developed and prototyped a wearable wireless sensor technology that captures multiple behavioral data streams in real time. The technology assesses face-time collaboration, physical movement, vocal activity, and heart rate. Prototyping indicates that the data streams have promise for monitoring and diagnosing the nature and quality of team interactions. Additional sensors have the potential to enhance this capability.
Behavioral Cohesion Metrics Are Promising. Initial research demonstrated that behavioral metrics of cohesion accounted for a substantial increment of team performance variance above and beyond that captured by self-reports. Senor data demonstrates reliability and validity. Moreover, the validation phases demonstrate the potential for the sensor data to provide meaningful inferences regarding the nature and quality of team functioning. By examining patterns across these data streams, inferences regarding the nature of team member interactions can be drawn. For example, the technology can assess: Who initiated an interaction? Who disengaged? What was the intensity of vocal activity during the interaction? Over aggregate patterns of interaction, the measurement system can determine aggregate interaction frequencies, durations, and clusters. In combination with normative or baseline data, we anticipate that the sensor system and metrics can discriminate normal team functioning versus anomalous patterns of interaction that can be used to trigger countermeasures. Thus, the combination of benchmark data in ICE analogs, technology development, and validation of metrics show substantial promise for further development as a system for aiding team members to build, maintain, and restore effective collaboration and cohesion for long duration missions.
Beal, D. J., Cohen, R. R., Burke, M. J., & McLendon, C. L. (2003). Cohesion and performance in groups: A meta-analytic clarification of construct relations. Journal of Applied Psychology, 88, 989-1004.
Biswas, S., & Quwaider, M. (2008). Remote monitoring of soldier safety through body posture identification using wearable sensor networks. SPIE Defense and Security Symposium, Multisensor, multisource information fusion: Architectures, algorithms, and applications. Orlando, FL.
Gully, S. M., Devine, D. J., & Whitney, D. J. (1995). A meta-analysis of cohesion and performance: Effects of levels of analysis and task interdependence. Small Group Research, 26, 497-520.
Kozlowski, S. W. J., & Ilgen, D. R. (2006). Enhancing the effectiveness of work groups and teams (Monograph). Psychological Science in the Public Interest, 7, 77-124.
Mullen, B., & Copper, C. (1994). The relation between group cohesiveness and performance: An integration. Psychological Bulletin, 115, 210-227.
NASA. (2008). Behavioral health and performance element: Risk of performance errors due to poor team cohesion and performance, inadequate selection/team composition, inadequate training, and poor psychosocial adaptation. Houston, TX: NASA.