This proposal aims at approaching two of the top technical challenges in the NASA RTA Roadmap using a novel approach. For a high dexterity robot that is equipped with multiple visual and haptic sensors such as the Robonaut 2, the two technical challenges Object Recognition and Pose Estimation and Fusing vision, tactile and force control for manipulation are highly correlated. In many real world situations where objects are occluded, 3D perception itself is not sufficient for accurate object recognition and pose estimation. My research goal is to fuse continuous information acquired from multiple sensors and robot actions to achieve better recognition and understanding of the environment. Although in recent years many computer vision algorithms have had some successes in recognizing real world images, these algorithms are not designed to be directly implemented on robots. There are two major reasons for this. First, these algorithms are trained on static 2D images while robots perceive a continuous change of input. Although motion blur decreases the recognition performance, there are very strong temporal relationships that correlate with the robot's action that most vision algorithms do not exploit. Second, a robot could be equipped with multiple sensors such as laser sensors, cameras, inertial measurement units, and haptic sensors. Although some vision algorithms can take combined sensor data as input, they do not take advantage of cases when one sensor data stream is strongly correlated with another nor do they handle situations when some sensor information is the result of action. In the field of Computer Vision, two types of object models are used to identify objects. One represents the object in 2D and the other represents it in 3D. However, none of these models incorporate information regarding how perception of these objects respond to actions. A robot that uses these kinds of object models knows nothing more than the label of the object. It is clear that humans have a different kind of understanding of objects, and incorporating action into the object model will allow robots to interact with objects and predict action outcomes. While a 3D object model contains information of visual features in their exact 3D position, it is harder to obtain and limited to rigid objects. Specifying how manipulation changes the appearance can be simpler and more descriptive than specifying a full 3D model. Some previous work done in the Laboratory for Perceptual Robotics at the University of Massachusetts Amherst already shown promising results in combining manipulative actions with object recognition. The view-action-view object model concept proposed in this previous work would be the base of my object model. Besides storing object features from one viewpoint, the model also stores all view transitions accessible from the current view through action. This will lead to a graph representation of an object that is composed of viewpoint nodes and action edges. This research proposal will extend these ideas from toy objects that have fiducial markers to real objects and further explore temporal relationships between views. This proposal aims at designing a robotic perception system that combines continuous information from multiple sensors and actions to achieve a better understanding of the environment. The system's goal will be to assist Robonaut 2 in assembly or maintenance tasks that require handling a variety of tools and structures. Having a perception system that can learn about the world through actions is also crucial to a variety of space robotic tasks such as retrieving sample caches, robotic inspection, assembly, servicing, and repair operations in space. Besides space missions, this technique also has a broader impact on many robotic tasks ranging from manufacturing, disaster rescue to taking care of patients and seniors.