# What is Q function in MDP?

A Q-value function (Q) shows us how good a certain action is, given a state, for an agent following a policy. The optimal Q-value function (Q*) gives us maximum return achievable from a given state-action pair by any policy.

## What is the V function?

The V-function: the value of the state. More formally, the V-function, also referred to as the state-value function, or even the value function, or simply V, measures the goodness of each state. In other words, how good or bad it is to be in a particular state according to the return G when following a policy 𝜋.

There are two important characteristic utilities of a MDP — values of a state, and q-values of a chance node. The * in any MDP or RL value denotes an optimal quantity. Q-value of a state, action pair: The q-value is the optimal sum of discounted rewards associated with a state-action pair.

Q(K, A) only grows to approach that; not infinitely. When it stops growing (has approximated its actual value), the Q(K, A) for other A s can catch up.

From a sampling perspective, the dimensionality of Q ( s, a) is higher than V ( s) so it might get harder to get enough ( s, a) samples in comparison with ( s). If you have access to the transition function sometimes V is good. There are also other uses where both are combined.

In post 2 we extended the definition of state-value function to state-action pairs, defining a value for each state-action pair, which is called the action-value function, also known as Q-function or simply Q.