A developmental approach of imitation to study the emergence of mirror neurons in a sensory-motor controller

Mirror neurons have often been considered as the explanation of how primates can imitate. In this paper, we show that a simple neural network architecture that learns visuo-motor associations can be enough to let low level imitation emerge without a priori mirror neurons. Adding sequence learning mechanisms and action inhibition allows to perform deferred imitation of gestures demonstrated visually or by body manipulation. With the building of a cognitive map giving the capability of learning plans, we can study in our model the emergence of both low level and high level resonances highlighted by Rizzolatti et al.


Introduction
Imitation is an important feature providing individuals with the ability to learn from others but also (and perhaps mainly) to communicate with others.Rizzolatti and al. [6] [13] discovered in monkeys some neurons activated both when a monkey performs an action and when it observes another monkey or a human doing the same action.These mirror neurons were originally found in the area F5 of the premotor cortex.Yet the mirror system spreads over several cortical structures.The Inferior Parietal Lobule (IPL) also presents some neurons with mirror pattern activation whereas in the Superior Temporal Sulcus (STS), some neurons are more sensitive to observation of actions performed by others and thus support learning by imitation features.In monkeys, the mirror neuron activities are related to actions like grasping and placing objects.An observed action can produce activation even if it is partially occluded.It has been hypothesized that the role of mirror neurons is to enable the recognition of actions that are performed by others.The development of this feature would be an adaptation of evolution as it allows better social collaboration and imitation.Several models have considered this hypothesis as a justification to action recognition capabilities and learning by imitation in robots [5].Yet the role of mirror neurons in task learning can be questioned since it is not clear if these neurons are the cause or the consequence of skill learning.Even if some proto-imitation of tongue protrusion was observed in neonates [9], it can not be advocated that it reflects inherited imitation capabilities [8].How can mirror neurons learn associations between visual information and the correct action ?If mirror neurons have to be learned, the imitated actions need to be already known.At the opposite of these models, we will advocate the hypothesis that mirror neurons are a side effect of some sensory-motor learning.They can next be used for social interactions (communication) but not primarily for learning by imitation.
Recently, several works have proposed models that aims at explaining how mirror neurons can emerge from sensory-motor associations learning [10] [12].These works focus on the emergence of action recognition capabilities.In early work [7], we showed how some imitation behaviors can be obtained from simple sensorymotor associations.Without the capability of recognizing the action of others and without even the capability of discriminating others from himself, the robot could already exhibit a low level imitation behavior.
In this paper, we start from a minimal Neural Network [2] to study from a developmental perspective how low level imitation (mimicry), sequence learning and then action selection behaviors can be exhibited, without a priori mirror neurons.Starting from sensorymotor controller that let low level imitation emerge, the model was then enhanced by including a module for sequence learning.The robot can reproduce sequences observed (Sec.2) or demonstrated through physical manipulation (Sec.3).This system can also learn and plan different actions according to different goals.In Sec. 4, we conclude on the "mirror" like properties of the neurons present in the different structures of the model.Mirror neuron activities are induced by reentrant learning first between visual and proprioceptive motor signals, next between objects and actions.This representation is based on [1] .

Learning tasks on the basis of low level visuo-motor associations
Starting from the coupling of a homeostatic system with perceptual ambiguity, we can obtain the emergence of a low level imitation behavior [7].Initially, the robot learns visuo-motor associations between the detected position of its hand in its visual field and the motor configuration of its arm.The associative learning would be done in the supplementary motor area (SMA) and the premotor (PM) cortex (Fig. 1).Because of the limitations of the robot perception, there is an ambiguity between the robot hand and the human hand in front of the robot.The robot can look at the human hand and believe that it is its own hand.As the homeostatic system tries to maintain an equilibrium between the visual information and the motor information, the robot arm is moved into the motor configuration that corresponds to the visual input.An exterior observer sees a robot imitating the gestures of the human partner.Low level imitation can be a side effect of the perceptual ambiguity [2].This visuo-motor controller has been implemented on a robot composed of an electrical arm (Katana from Neuronics AG) and a monocular camera mounted on pan-tilt servo motors (see [3] for details about the model).With this simple model, the robot can only reproduce meaningless gestures directly observed.By adding two other mechanisms, some deferred imitation of a demonstrated task can be obtained.First the robot can inhibit its own Figure 2: In a first phase, the robot inhibits its movement and memorizes the visual sequence of the perceived positions of the hand of a human who demonstrates the action of picking and placing a can.The experiment was realized with a small number of adequately chosen states.movements while it is observing the demonstration of the task.Next, the robot can memorize an observed sequence.The visual field of the robot is categorized into discrete areas or states.The observed sequence is the succession of the different visual states.Given an activated visual state, an internal rehearsal of the memorized sequence predicts the next visual state in the sequence.As a result of the visuo-motor learning, the predicted visual position is transformed into a predicted motor position that attracts the robot arm into the stored motor configuration.The robot reproduces the observed demonstration according to its learning (Fig. 2).The correspondence problem [11] does not limit our system in that experiment.As only the arm end effector position is considered, the visuo-motor map and the regulation system directly find a motor configuration that corresponds to the final desired position.
The prefrontal cortex can detect and maintain the goals of the system thus it can generate the necessary signals to inhibit and select the actions and also to bias the attention of the system (Fig. 1).Novelty detection in the Hippocampus can trigger the encoding of the states and the transitions used to memorize sequences (Fig. 3).Neurons in the visuo-motor map are activated during the observation and the reproduction of the sequences.Their activation pattern is similar to the activation of the "low level resonance" mirror neurons.However, as their activities are not related to goals, these neurons cannot explain "high level resonance" mirror neurons.The only goal of the robot is to reproduce the demonstrated visual sequence (Fig. 2).The robot arm catches the can only because of a reflex behavior that is triggered when an object is detected within reach of the gripper.

Arm gesture planning
In Fig. 4, the robot possesses two different contextual goals associated to two types of objects (red or  green).The focal vision of the robotic system categorizes the visual scene using color detection.When a red can is presented to the visual system, the corresponding goal is activated and stay active until another object is presented.If the new can is green, the red goal will be inhibited and the green goal will stay active.

BIO Web of Conferences
These goals can be associated with sequences of proprioceptive states acquired through passive manipulation of the arm [4].When the robotic arm moves, its proprioception (joint angles, gripper infrared and force sensor values) is categorized into distinct states.Each state is associated to a prototypical joint/gripper configuration.The categorization of the states is based on a recruitment of new states depending on a vigilance threshold.The states are encoded on the weights of the input links of recruited neurons.The information about the newly entered state and the former state are received and processed in the Hippocampus (see Fig. 3).The possible transitions between states are then learned when the arm moves from one state to the next.The transition activity is transmitted to a cognitive map spreading on the prefrontal cortex and the premotor area.A graph of the different transitions is created.The cognitive map encodes information about all possible paths by linking subsequent transitions.
The association between an active goal and the last performed transitions is made when a reward is received by the system.When a goal is set, the associated transition is also activated.There is a diffusion of this activation in the cognitive map from node to node with a max operator on the propagated activities and links with weights w < 1.As a result, transition neurons in the cognitive map have an activity proportional to their distance (in terms of number of transitions) to the transitions directly associated to the activated goals.The map gives the shortest path to the strongest or closest goal.In any given state, the system predicts all possible transitions.The activity in the cognitive map is used to Figure 4: A cognitive map can be used to learn a pickand-place task with different goals.Left: Experimental setup.Right: Projection of the cognitive map in the Cartesian space.The gradient toward the goal place follows the blackness and the thickness of the links between state nodes.The links are bias the selection of a particular transition in the Basal Ganglia (Fig. 3).The transition that has the highest potential, ie. that corresponds to the shortest path to get the reward, is selected.When a transition is selected after competition, the prototypical motor configuration learned with the final state of this transition is given as the target position to the motor regulation (M1 and subcortical structures involved in muscular control).This system has been tested on the same robot than in section 2 to learn a pick-and-place task through body manipulation (Fig. 4).Depending on the detected object (e.g.red can), a gradient of activity is propagated from the transition corresponding to drop the can at the correct place to the current state of the robot.The task can also be adapted on-line.A new sequence can be demonstrated (e.g.putting red cans in place 2 instead of place 1) and rewarded.Each time a reward is received, the current association is reinforced whereas the previous ones decay.The stability of a policy depends on how much it was rewarded.Typically learning can be done with about 2 trials and rule switching can be as fast.
In the cognitive map, the transitions neurons are activated when a goal is recognized.Their activities are similar to the activities of canonical neurons that are activated by just the perception of the object that determines the goal (here, the presence of the red can will activates the different transitions expected to be done).As a result of this activity diffusion in the cognitive map, the action is facilitated: the robot can choose the correct transitions among the possible ones.However, these neurons lack of selectivity as they can activates for affording objects and not only actions.

Conclusion
In this paper, we have followed a developmental approach of imitation validated on a real robot.Without The International Conference SKILLS 2011 000 -p.3 embedding any a priori mirror neurons, our model enables the robot to display low level imitation of meaningless gestures.These gestures can be learned together as sequences.With a reward based associative learning, the sequences can become simple goal directed sequences (e.g.sorting cans according to their colors) still without involving the use of "high level" mirror neurons.Mirror neurons should not be used in robotics to justify some ad hoc skill transfer.
The sensory-motor structures involved in the development of imitative capabilities can explain the mirror neurons properties observed in the premotor cortex of primates.The visuo-motor associative maps display low level resonance that was found in humans [14].In our model, premotor neurons can be activated when a specific object is presented, thus matching the canonical neurons found in the premotor areas.Yet, in our model, there is no neuron selective to the observation of high level actions.In order to learn more complex behaviors, the system should build chunks merging information about both recognizing an object and determining the context of the task i.e. the expected use of the object.According to our model (Fig. 1), the chunks would first be learned at the level of the prefrontal cortex.As the behavior becomes more automated, the management of the control should move onto the Basal Ganglia.
If our approach is correct, mirror neurons are first a side effect of previous learning.Yet, we believe mirror neurons are certainly very important for synchronization and communication during social interactions.Taking into account the resonance between two interacting partners in order to provide a good feedback to modulate learning in an interactive and autonomous way.

Figure 1 :
Figure 1: Model of the brain structures and their connections corresponding to our implemented models for the development of imitative behaviors.The prefrontal (PF) structures are important for the control of the different phases of the behavior (observation, imitation).Mirror neuron activities are induced by reentrant learning first between visual and proprioceptive motor signals, next between objects and actions.This representation is based on[1] .