python – Keras代码Q-learning OpenAI健身房FrozenLake出了点问题_python

概述也许我的问题看起来很愚蠢. 我正在研究Q学习算法.为了更好地理解它,我试图将this FrozenLake示例的Tenzorflow代码重新编译为Keras代码. 我的代码： import gymimport numpy as npimport randomfrom keras.layers import Densefrom keras.models import Sequential 也许我的问题看起来很愚蠢.

我正在研究Q学习算法.为了更好地理解它,我试图将this FrozenLake示例的Tenzorflow代码重新编译为Keras代码.

我的代码：

import gymimport numpy as npimport randomfrom keras.layers import Densefrom keras.models import Sequentialfrom keras import backend as K    import matplotlib.pyplot as plt%matplotlib inlineenv = gym.make('FroZenLake-v0')model = Sequential()model.add(Dense(16,activation='relu',kernel_initializer='uniform',input_shape=(16,)))model.add(Dense(4,activation='softmax',kernel_initializer='uniform'))def custom_loss(yTrue,yPred):    return K.sum(K.square(yTrue - yPred))model.compile(loss=custom_loss,optimizer='sgd')# Set learning parametersy = .99e = 0.1#create Lists to contain total rewards and steps per episodejList = []rList = []num_episodes = 2000for i in range(num_episodes):    current_state = env.reset()    rAll = 0    d = False    j = 0    while j < 99:        j+=1        current_state_Q_values = model.predict(np.IDentity(16)[current_state:current_state+1],batch_size=1)        action = np.reshape(np.argmax(current_state_Q_values),(1,))        if np.random.rand(1) < e:            action[0] = env.action_space.sample() #random action        new_state,reward,d,_ = env.step(action[0])        rAll += reward        jList.append(j)        rList.append(rAll)        new_Qs = model.predict(np.IDentity(16)[new_state:new_state+1],batch_size=1)        max_newQ = np.max(new_Qs)        targetQ = current_state_Q_values        targetQ[0,action[0]] = reward + y*max_newQ        model.fit(np.IDentity(16)[current_state:current_state+1],targetQ,verbose=0,batch_size=1)        current_state = new_state        if d == True:            #Reduce chance of random action as we train the model.            e = 1./((i/50) + 10)            breakprint("Percent of succesful episodes: " + str(sum(rList)/num_episodes) + "%")

当我运行它时,效果不佳：成功集数的百分比：0.052％

plt.plot(rList)

original Tensorflow code更好：成功集数百分比：0.352％

plt.plot(rList)

我做错了什么？

解决方法除了将use_bias = False设置为注释中提到的@Maldus之外,您可以尝试的另一件事是从更高的epsilon值(例如0.5,0.75)开始？一个技巧可能只是在达到目标时减少epsilon值.即每次剧集结束时不要减少epsilon.这样你的玩家可以随机地继续探索地图,直到它开始收敛于一条好的路线,然后减少epsilon参数是个好主意.

我实际上在gist中使用Convolutional层而不是Dense层在keras中实现了类似的模型.管理以使其在2000集以下的情况下工作.可能对别人有所帮助:)

总结

以上是内存溢出为你收集整理的python – Keras代码Q-learning OpenAI健身房FrozenLake出了点问题全部内容，希望文章能够帮你解决python – Keras代码Q-learning OpenAI健身房FrozenLake出了点问题所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/1196358.html