Reinforcement Learning problem by vassilito

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3). After a period interacting with the environment we have the following values of the Q function: Q1(S1,a1) = -2 … (Budget: $30-$250 USD, Jobs: Algorithm)