Questions tagged [value-iteration]

16 questions
112
votes
5 answers

What is the difference between value iteration and policy iteration?

In reinforcement learning, what is the difference between policy iteration and value iteration? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly…
4
votes
1 answer

Dynamic Programming of Markov Decision Process with Value Iteration

I am learning about MDP's and value iteration in self-study and I hope someone can improve my understanding. Consider the problem of a 3 sided dice having numbers 1, 2, 3. If you roll a 1 or a 2 you get that value in $ but if you roll a 3 you loose…
2
votes
1 answer

Is there a clever way to get rid of these loops using numpy?

I'm reaching the maximum recursion depth and I've been trying to use np.tensordot() I couldn't really get an insight into how to use it in this case. def stopping_condtion(a,V,V_old,eps): return np.max(la.norm(V - V_old)) < ((1 - a) * eps) /…
Max
  • 425
  • 1
  • 3
  • 7
2
votes
1 answer

Why is Policy Iteration faster than Value Iteration?

We know that policy iteration gives us the policy directly and hence is faster. But can anyone explain it with some examples.
shmi
  • 23
  • 4
2
votes
1 answer

Is Monte Carlo learning policy or value iteration (or something else)?

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it…
2
votes
2 answers

How to Solve reinforcement learning Grid world examples using value iteration?

I find either theories or python example which is not satisfactory as a beginner. I just need to understand a simple example for understanding the step by step iterations. Could anyone please show me the 1st and 2nd iterations for the Image that I…
Ahasan Ratul
  • 35
  • 1
  • 10
2
votes
0 answers

Modelling profitability of credit card by Markov Decision Process.

This is with reference to a paper published on Modelling the profitability of credit cards by Markov Decision processed.I am trying to implement the same in python using Mdptoolbox but not getting the output in the format expected. My states are the…
1
vote
2 answers

Population growth math issue in c

I have looked this over and am wondering where my math issue is. I believe that it should be calculating correctly, but the floats do not round up, .75 to 1 to add to the count for births/deaths. I am a novice to c. Here is the code I have so…
Lee
  • 11
  • 3
1
vote
1 answer

why are policy-iteration and value-iteration methods giving different results for optimal values and optimal policy?

I am currently studying dynamic programming in reinforcement learning in which I came across two concepts Value-Iteration and Policy-Iteration. To understand the same, I am implementing the gridworld example from the Sutton which says : The…
1
vote
0 answers

Faster accessing 2D numpy/array or Large 1D numpy/array

I am performing prioritized sweeping for which I have a matrix which has 1000*1000 cells (gridworld) whose cells I have to access repeatedly in a while true loop for assignment (I am not essentially iterating over the list but all cells are accessed…
SH_V95
  • 151
  • 1
  • 3
  • 11
0
votes
2 answers

Declare a javascript object between brackets to choose only the element corresponding to its index

I found this sample in a book and this is the first time that I see this notation. Obviously it's a thousand times shorter than making a switch; but what is it? When I do typeof(status) it returns undefined. I would like to understand what it is so…
0
votes
0 answers

RL value iteration, gridworld multi action problem

I am just starting to study reinforcement learning and trying to get my head around the basics. I understand policy eval, policy and value iteration algorithms and can solve a simple gridworld optimisation problem with two terminal states -5 or +5.…
0
votes
1 answer

Are these two different formulas for Value-Iteration update equivalent?

While studying MDP via different sources, I came across two different formulas for the Value update in Value-Iteration algorithm. The first one is (the one on Wikipedia and a couple of books): . And the second one is (in some questions here on…
jaja360
  • 13
  • 1
  • 3
0
votes
6 answers

Iterate through all distinct dictionary values in a list of dictionaries

Assuming a list of dictionaries, the goal is to iterate through all the distinct values in all the dictionaries. Example: d1={'a':1, 'c':3, 'e':5} d2={'b':2, 'e':5, 'f':6} l=[d1,d2] The iteration should be over 1,2,3,5,6, does not matter if it is a…
Krzysztof Słowiński
  • 2,959
  • 5
  • 27
  • 47
0
votes
4 answers

How to avoid creating unnecessary lists?

I keep coming across situations where I pull some information from a file or wherever, then have to massage the data to the final desired form through several steps. For example: def insight_pull(file): with open(file) as in_f: lines =…
1
2