当前位置:首页|资讯|OpenAI

OpenAI gym specsheet

作者:bili_44957714512发布时间:2024-05-18


Homework 6 specsheet 
-Extra Credit (replaces lowest HW)- 
 
In this homework, we apply a RL framework to environments available at the OpenAI gym. 
 
Mission command approach: As per §4.5 of the Sittyba, we will tell you what to do, not how to do it. 
That is up to you. However, we want you to: 
a) Do this homework yourself. Do not copy answers or code from someone else. 
b) Restrict your methods (for now) to what was covered in the lecture/lab (in other words, basic 
reinforcement learning involving Q-learning, policy gradients, multi-armed bandits, etc.) 
 
Here is what we would like you to do: 
1) Go to https://gymnasium.farama.org/index.html 
2) Pick one of the available environments – we recommend one of the classic Atari 2600 games: 
https://gymnasium.farama.org/environments/atari/ [Make sure to pick one we did not already 
cover in lecture or lab, but you can pick any environment that is not an Atari game too] 
3) Train an agent to achieve a reasonable level of performance in this environment. 
4) Write a brief statement as to how you trained the agent, how you managed the explore / exploit 
tradeoff, and explaining any other choices you might have made. 
5) Also make sure to comment on how the training went – what was challenging for the agent, 
what made training feasible? Explanations of what you couldn't do and why are encouraged with 
emphasis on the “why” 
6) Document the performance of the agent by plotting total rewards as a function of training 
episodes. 
7) Make sure to include your code as a separate file. 
 
Suggestions and recommendations: 
1. Picking a more complex environment will merit more grade points. To check complexity, go to 
https://github.com/openai/gym/wiki/Table-of-environments and look at Observation Space and 
Action Space. We recommend to choose an environment which has Discrete Action space. We 
want to keep grading criteria (in terms of points) flexible to see what students can actually do, 
but as a broad heuristic, something with the complexity of “LunarLander-v2” would be ok, 
something with the complexity of “BipedalWalker-v2” would be good, and something with the 
complexity of “AirRaid-ram-v0” would be excellent. But don’t necessarily pick those specific 
environments. Pick something that sparks joy, for you personally. It will shine through. 
2. Try implementing an algorithm on your own instead of using stable baselines 3. If you use sb 3, 
explain what you did to optimize the model. Try checking how far your model can go by trying 
more complex environments and find the breaking point 
3. You can also use the library NEAT-Python: https://neat-python.readthedocs.io/en/latest/ If you 
decide to use NEAT, experiment on how far NEAT can go and note your observations. 
4. So either a) implement your own algorithm, or b) use SB-3 (and note what you did to optimize 
the model) or c) use NEAT-Python, find the most complex env you are able to solve with NEAT 
and note what leads to better NEAT implementations 
5. Whichever environment you pick, make sure your RL bot is learning the environment reasonably well (as evinced by the plot of total reward over episodes of training). 
WX:codinghelp


Copyright © 2024 aigcdaily.cn  北京智识时代科技有限公司  版权所有  京ICP备2023006237号-1