Reinforcement Learning and Information Access

or

What is the Real Learning Problem in Information Access?


by Rich Sutton
University of Massachusetts
rich@cs.umass.edu


Presented at the AAAI Stanford Spring Symposium on
Machine Learning and Information Access
March 26, 1996

with many thanks to Rik Belew and Jude Shavlik


Conclusions (in advance)


Reinforcement Learning


Classical Machine Learning - Supervised Learning

	situation1  --->  action1     then correct-action1
	situation2  --->  action2     then correct-action2
		      .
		      .
		      .






Reinforcement Learning

	        situation1  --->  action1
	reward2	situation2  --->  action2 
	reward3	situation3  --->  action3 
	                     .
	                     .
	                     .










It's not just a harder problem, it's a real problem


Applications of RL


Key Ideas of RL Algorithms

Value Functions

TD Methods


A Large Space of RL Algorithms


Major Components of an RL Agent







Policy - what to do

Reward - what is good

Value - what is good because it predicts reward

Model - what follows what


Info-Access Applications of RL

Anytime you have decisions to be made Anytime you want to make long-term predictions

Classical IR Querying/Routing/Filtering as RL

	Situation = Query or user model + Documents
	Actions	  = Present document?  Rankings
	Reward	  = User feedback on presented docs

Pro RL:
	Feedback is selective
	and does not exactly fit SL framework

Con RL:
	Feedback does not exactly fit RL framework
	Problem is not sequential

e.g.,
Bartell, Cottrell & Belew, 1995
Boyan, Freitag & Joachim 1996
SchŸtze, Hull & Pederson, 1995

MultiStep Info-Access Problems

But in a sense all these are the same

Learning a complex, interactive, goal-directed, input-sensitive, sequence of steps

That's exactly what RL is good for.

The Multi-Step, Sequential Nature of IA


Imagine an Ideal Info-Access System


Shortcutting


Compare...

The classical context

No way the queries can be used to learn about the docs

The Web

There will always be more readings than writings

Thus, we can learn about the docs

Popularity Ratings, Priors on Documents

Q. How do you decide what to access today?

A. Recommendations:


"Its hard to find the good stuff on the web"

But in classical IR there is no concept of good stuff

Differences and Similarities between Users


Summary