Our project will involve training an agent to navigate a environment filled with dangerous tripwire traps. It will be the agent’s goal to navigate the environment safely which will involve jumping over the tripwire traps as necessary to escape. Using Malmo along with some Reinforcement Learning algorithms the agent will take in a state involving a coordinate as input and will produce an action response. The pool of actions currently includes, moving forward, jumping, and waiting. This pool may have more or fewer actions in the future as we learn what creative features we can implement in the agent’s course.
We make our agent smart we will implement Reinforcement Learning algorithms like Q-learning in order for our agent to learn how to successful traverse the environment in front of them.
Our project success will be based upon observing how well our AI minecraft agent is able to avoid tripwire traps. Therefore, the most important metric would be accuracy. The quantitive analysis on measuring accuracy would be by caluclating the success rate of our AI being able to avoid the trapwires presented on a constant environment. The data will be gathered through running 1000+ trials of a constant environment with various numbers of trapwires. The qualitative analysis that we will observed is on whether or not our AI agent can at the very least avoid a trap with only one wire. Another metric that will become important once the agents are accurate is speed. The baseline accuracy will be 0 since the agent hasn’t learned anything yet. The speed will also be very low. An appropriate baseline could be determined by having a human control the character, and seeing the difference in speed. This would allow us to understand whether the AI actually has any practical use. Our moonshoot case would be to create an AI agent that will find the shortest path with the least number of trapwires to escape.
We will monitor to see if the agents will be able to perform the task without any external assistance. We think it will be relatively clear to see how the algorithm is functioning internally by following the agent and tracking its behaviour.
Tuesday, January 19 2021 at 11:45AM
Our group will meet every week on Wednesday 9pm PST.
For our status report, we hope to have a working version of our agent traversing a course with one or two points of interest. These points of interest are where the agent must learn how to intelligently interact with traps that may help or hurt the agent’s chances of reaching the goal.