Implementing My Own Algorithm In Python
Today, I'm implementing my own reinforcement learning algorithm in Python! In reinforcement learning, which is a fairly new field in AI, we have a model. A model is like a thing that does actions.
Imagine you are the model here, and you're locked inside a town. You can never get out of there, whatever you do. Luckily, there are a lot of restaurants and a lovely place to sleep. You've slept for a while, and now you're hungry. You head to restaurant A and eat the food. It was decent. Not too good, but decent.
Now, the next day, you can either visit restaurant B, which might or might not be better than restaurant A, or you can just decide to stick with restaurant A. This is known as the tradeoff between exploration and exploitation. If you decide to explore, you might get better food if you head over to restaurant B, or you might not. But if you stick with restaurant A, then you'll be getting the same "decent" food always.
But you remember your robotics classes back at school! You create a robot that goes to each restaurant and assigns a quality value to each restaurant. The quality value represents the quality of the food in numbers. The robot visits all the restaurants and tells you which restaurant has the most quality value. You go there immediately, and you simply love the food.
Here, we spin up a new model which always explores. Its goal isn't to eat the food at the best restaurant but to just tell you which the best restaurant is, so you can stuff your face there. With this, you always find the best restaurant without having to taste the disgusting food from the bad restaurants.