Final Exam, Spring 2013
Instructions You have 8 hours to return the answers to me by email or in my office. You are not allowed to communicate with others about your solutions, approach, ideas and etc. If such an unauthorized sharing is detected you will receive “0” from the final and I will take a disciplinary action according to the Simon Academic Honesty Policy. By returning your solutions to the exam you agree that you will follow the Simon Academic Honesty Policy. Please bear this in mind before you violate the Honesty code. Also, I will not try to decipher your handwriting. If I cannot read your handwriting I will assume your answer is wrong.
The exam consists of 3 questions. Some may be more difficult that others but each has equal weight. Feel free to email me if you need clarification on a question. Good Luck.
Question 1:
Consider the infinite horizon discounted problem with n states. Let Ai denote the available actions in state i. The cost per stage is g(i, u), the discount factor is α and the transition probabilities are pij (u). For each j = 1, . . . , n, let mj = min min pij (u)
(1)
i=1,...,n u∈Ai
For all states i and j and possible actions u ∈ Ai , let pij (u) =
˜
pij (u) − mj
.
1 − n mk k=1 (2)
a. Show that pij (u) are transition probabilities.
˜
b. Consider the discounted problem with cost per stage g(i, u), discount factor α(1 − n mj ) j=1 and transition probabilities pij (u). Show that this problem has the same optimal policies as
˜
the original and that is optimal cost vector satisfies
∗
J =J+
n j=1 mj J(j)
α
1−α
e
(3)
where J ∗ is the optimal cost vector of the original problem and e is the unit vector.
c. What is the advantage of using the transformed MDP?
Solutions:
Fix u and i. We have n n
pij (u) =
˜
j=1
j=1
pij (u) − mj
=
1 − n mk
1−
k=1
1 n k=1 mk
n
(1 −
mj ) = 1.
(4)
j=1
Since pij (u) ≥ 0, for all i,