Vision for Robotics
By Danica Kragic and Markus Vincze
Contents
1 Introduction 1.1 Scope and Outline
2 4 7 7 9 12 17 18 27 32 35 42 44 48 49 52
2 Historical Perspective 2.1 2.2 2.3 Early Start and Industrial Applications Biological Influences and Affordances Vision Systems
3 What Works 3.1 3.2 3.3 3.4 3.5 3.6 Object Tracking and Pose Estimation Visual Servoing–Arms and Platforms Reconstruction, Localization, Navigation, and Visual SLAM Object Recognition Action Recognition, Detecting, and Tracking Humans Search and Attention
4 Open Challenges 4.1 4.2 Shape and Structure for Object Detection Object Categorization
4.3 4.4
Semantics and Symbol Grounding: From Robot Task to Grasping and HRI Competitions and Benchmarking
54 56 59 64 65
5 Discussion and Conclusion Acknowledgments References
Foundations and Trends R in Robotics Vol. 1, No. 1 (2010) 1–78 c 2009 D. Kragic and M. Vincze DOI: 10.1561/2300000001
Vision for Robotics
Danica Kragic1 and Markus Vincze2
1
2
Centre for Autonomous Systems, Computational Vision and Active Perception Lab, School of Computer Science and Communication, KTH, Stockholm, 10044, Sweden, dani@kth.se Vision for Robotics Lab, Automation and Control Institute, Technische Universitat Wien, Vienna, Austria, vincze@acin.tuwien.ac.at
Abstract
Robot vision refers to the capability of a robot to visually perceive the environment and use this information for execution of various tasks. Visual feedback has been used extensively for robot navigation and obstacle avoidance. In the recent years, there are also examples that include interaction with people and manipulation of objects. In this paper, we review some of the work that goes beyond of using artificial landmarks and fiducial markers for the purpose of implementing visionbased control in robots. We discuss different application areas,
References: [214] M. Vincze, M. Zillich, W. Ponweiser, V. Hlavac, J. Matas, S. Obdrzalek, H. Buxton, J. Howell, K. Sage, A. Argyros, C. Eberst, and G. Umgeher, “Integrated vision system for the semantic interpretation of activities where a person handles objects,” CVIU, vol. 113, no. 6, pp. 682–692, June 2009. [215] M. Vinzce, M. Ayromlou, M. Ponweiser, and M. Zillich, “Edge projected integration of image and model cues for robust model-based object tracking,” International Journal of Robotics Research, vol. 20, no. 7, pp. 533–552, 2001. [216] I. Weiss and M. Ray, “Model-based recognition of 3d objects from single images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 116–128, 2001. [217] F. W¨rg¨tter, A. Agostini, N. Kr¨ger, N. Shylo, and B. Porr, “Cognio o u tive agents — A procedural perspective relying on the predictability of object–action–complexes,” Robotics and Autonomous Systems, vol. 57, no. 4, pp. 420–432, 2009. [218] S. Wrede, C. Bauckhage, G. Sagerer, W. Ponweiser, and M. Vincze, “Integration frameworks for large scale cognitive vision systems–an evaluative study,” in Proceedings of the 17th International Conference on Pattern Recognition ICPR, pp. 761–764, 2004. [219] P. Wunsch and G. Hirzinger, “Real-time visual tracking of 3-d objects with dynamic handling of occlusion,” in IEEE International Conference on Robotics and Automation, ICRA’97, pp. 2868–2873, Albuquerque, New Mexico, USA, April 1997. [220] M. Zillich and M. Vincze, “Anytimeness avoids parameters in detecting closed convex polygons,” in IEEE Computer Society Workshop on Perceptual Organization in Computer Vision at CVPR, 2008. [221] A. Zisserman, D. Forsyth, J. Mundy, C. Rothwell, J. Liu, and N. Pillow, “3D object recognition using invariance,” Artificial Intelligence, vol. 78, no. 1–2, pp. 239–288, 1995.