ActIPret

Interpreting and Understanding Activities of

Expert Operators for Teaching and Education

Project Summary

Project Objectives	The objective of ActIPret is to develope a cognitive vision methodology that interprets and records the activities of people handling tools. Focus is on active observation and interpretation of activities, on parsing the sequences into constituent behaviour elements, and on extracting the essential activities and their functional dependence. By providing this functionality ActIPret will enable observation of experts executing intricate tasks such as repairing machines and maintaining plants. The expert activities are interpreted and stored using natural language expressions in an activity plan. The activity plan is an indexed manual in the form of 3D reconstructed scenes, which can be replayed at any time and location to many users using Augmented Reality equipment.

Long Term Objective	The long term goal is to devise a system that is able to teach and train many users with activities of expert operators. While experts can demonstrate their knowledge to a small group of students and on limited occasions, the proposed system interprets and understands the experts activities and enables the repetitive and user-driven reproduction of the task. Using demonstration alone, the system can store task knowledge in a 3D reconstructed teaching and maintenance manual. Figure 1 exemplifies the envisioned uses of the ActIPret developments. During recording, the expert’s activities are observed and an activity plan with the reconstructed scenes is obtained. During replay, the trainee/user searches for the activities using a conceptual language. The user is then able to choose between two options: (1) she/he replays the sequence from arbitrary view points and depending on the training level (which requires only AR/VR equipment) or (2) she/he uses the ActIPret system in form of a personal teach assistant: the activities executed are compared with the activities recorded and improvements or corrections are suggested by the personal teach assistant, which results in a superior training effect compared to repetition without feedback. Figure 1: Using the ActIPret system to record and retrieve activities

Target Applications	ActIPret is an initial step targeted to improve teaching of persons/trainees in such intricate tasks as open surgery, repairing machines and maintaining plants. The system enables learning by observation and the indexing of specific activities temporarily uncoupled from time and place. In the future, teaching can be done with inexpensive equipment (PC, Head Mounted Display) and use complete ActIPret like systems with trainee supervision and expert documentation capabilities.

Market Potential	Training material: the sequences represent real world examples for teaching trainees at schools/colleges/universities (practical experiences) and employee training at companies Documentation: the teaching material can be indexed based on activities and context to enable long term documentation and user-friendly retrieval Maintenance: the system acts as a long term memory for maintenance of machines and plants over extended periods of time Quality Control: immediate feedback to assist the person during training to obtain correctness of work (personal teach assistant)

Description of the Work	The project is organised into eight interlaced technical work packages to build the cognitive vision framework and its purposive and reactive processing components. In the first year the framework and its constituent parts are designed and a first prototype is implemented. The approach involves associating attentional pragmatic interpretation with specific phases of tasks and context to zoom in on the relevant objects and activities. The four components of visual processing are all task and context-driven and report visual evidence with confidence measures. These components are the extraction of cues and features, the detection of context-dependent relationships between cues/features, the recognition of the objects handled taking into account potential occlusion and the recognition of activities, and the synthesis of behaviours and tasks that bias the context at the other components. These levels of visual interpretation are interlaced with the attentive and investigative behaviours that provide the feedback to purposively focus processing. Robust interpretation results will be achieved with methods to actively seek good viewpoints and to obtain disambiguating information for detection, recognition and synthesis. Robustness is also enhanced using context-dependent information integration between the components.

Milestones and Expected Results	Year 1: Prototype framework implemented; Recognition of single activities and objects with occlusion handling; Conceptual language defining activities. Year 2: Interpretation of one-handed activities with objects: placing a CD in a player; Qualitative description of spatial relations between objects and activities; Conceptual activity description for activity plans. Year 3: Interpretation of two-handed activity sequence s with objects: changing a wheel of a car or another industrial task; Temporal relations between objects and activities; Activity plan synthesis and replay.

The Partnership	ACIN - Institute of Automation and Control (former INFA) at the Vienna University of Technology, A http://www.acin.tuwien.ac.at CMP - Center for Machine Perception at the Czech Technical University, CZ http://cmp.felk.cvut.cz COGS - School of Cognitive and Computing Sciences at the University of Sussex, GB http://www.cogs.susx.ac.uk/lab/vision FORTH - Foundation for Research and Technology - Hellas, Computer Vision and Robotics Laboratory at the Institute of Computer Science, GR http://www.ics.forth.gr/proj/cvrl/ PROFACTOR - Produktionsforschungs GmbH, A http://www.profactor.at

Contact	Markus Vincze, ACIN Tel. : + 43 1 5041446 / 11 E-Mail: vincze@acin.tuwien.ac.at