'
&
$
%
Conditionnement animal
Psychologie expérimentale. Thorndike (1911) :
“Of several responses made to the same situation, those which are
accompanied or closely followed by satisfaction to the animal will,
other things being equal, be more firmly connected with the
situation, so that, when it recurs, they will be more likely to recur ;
those which are accompanied or closely followed by discomfort to
the animal will, other things being equal, have their connections
with that situation weakened, so that, when it recurs, they will be
less likely to occur. The greater the satisfaction or discomfort, the
greater the strengthening or weakening of the bond”
Loi des effets.
Apprentissage par renforcement -Master Mathématiques Vision Apprentissage- Rémi Munos 5
'
&
$
%
Les débuts de l’A/R computationnel
– Shannon 1950 : Programming a computer for playing chess.
– Minsky 1954 : Theory of Neural-Analog Reinforcement Systems.
– Samuel 1959 : Studies in machine learning using the game of checkers.
– Michie 1961 : Trial and error. -> joueur de tic-tac-toe.
– Michie et Chambers 1968 : Adaptive control -> pendule inversé.
– Widrow, Gupta, Maitra 1973 : Punish/reward : learning with a critic in
adaptive threshold systems -> règles neuronales.
– Sutton 1978 : Théories d’apprentissage animal : règles dirigées par des
modifications dans prédictions temporelles successives.
– Barto, Sutton, Anderson 1983 : règles neuronales Actor-Critic pour le
pendule inversé.
– Sutton 1984 : Temporal Credit Assignment in Reinforcement Learning.
– Klopf 1988 : A neuronal model of classical conditioning.
– Watkins 1989 : Q-learning.
– Tesauro 1992 : TD-Gammon
Apprentissage par renforcement -Master Mathématiques Vision Apprentissage- Rémi Munos 6