Preparation:

Install Anaconda, CUDA, etc.
Create new virtual environment with python version 3.6 (mainly due to PyTorch) and activate it:

conda create –name drlnd python=3.6
conda activate drlnd

Now git clone the udacity repo and install dependencies:

git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python

Before install, at the currently moment need to comment out the torch installation by removing line “torch==0.4.0” in the requirement.txt, otherwise you will have error of installing torch. After making the modification, it should be good to install dependencies. Also, don’t forget to install Jupyter Lab, PyTorch and any other things if you think that is necessary

pip install .
pip install jupyterlab
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

Download the Unity Environment (link here), navigate top1_navigation folder and unzip the downloaded file (Banana_sth.zip). Create an IPython kernel for the drlnd environment, and you’re ready to go. Launch the notebook and select the right kernel. File structure should be as shown here.

python -m ipykernel install –user –name drlnd –display-name “drlnd”
jupyter lab

 

Ready for build:

To start: replace that env = UnityEnvironment(file_name=”…”) party with the downloaded Unity Environment, and it should be ready to run the random actions test, to make sure everything is working.

env = UnityEnvironment(file_name=”Banana_Windows_x86_64/Banana.exe”)

Once everything is up and running. It is time to build our agent and model. A good start is just borrow the code from the course (the one in the workplace of Udacity), and modify it to fit the project. Let’s use the one in the exercise as an example. Once open the two scripts, we can see the “*** YOUR CODE HERE ***” block, probably this is a good start to try something.

To save time, let’s just use the code from solution for illustration purpose.

model.py

agent.py

Build the Deep Q-network (DQN) and do some plots:

Yep that’s it. Due to the recent calendar is really full and the main purpose of entering this course is for certificate, I hope this would be enough to submit. However, for interest there are other options could be experimented, eg. Comparison among implementation of a double DQN, a dueling DQN, and/or prioritized experience replay, as well as learning from pixels.